Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- 1
- 00:00:00,390 --> 00:00:02,440
- You've seen how regularization can help
- 2
- 00:00:02,610 --> 00:00:04,660
- prevent overfitting, but how
- 3
- 00:00:04,960 --> 00:00:06,230
- does it affect the bias and
- 4
- 00:00:06,460 --> 00:00:08,070
- variance of a learning algorithm?
- 5
- 00:00:08,630 --> 00:00:09,890
- In this video, I like to
- 6
- 00:00:10,020 --> 00:00:11,180
- go deeper into the issue
- 7
- 00:00:11,550 --> 00:00:13,300
- of bias and variance, and
- 8
- 00:00:13,520 --> 00:00:14,450
- talk about how it interacts
- 9
- 00:00:15,070 --> 00:00:15,880
- with, and is effected by,
- 10
- 00:00:16,070 --> 00:00:18,870
- the regularization of your learning algorithm.
- 11
- 00:00:22,180 --> 00:00:23,390
- Suppose we fit a
- 12
- 00:00:23,750 --> 00:00:25,960
- high order polynomial, but to
- 13
- 00:00:26,170 --> 00:00:27,570
- prevent overfitting, we are
- 14
- 00:00:27,690 --> 00:00:29,880
- going to use regularization as shown here.
- 15
- 00:00:30,310 --> 00:00:31,470
- So we have this regularization
- 16
- 00:00:32,280 --> 00:00:33,450
- term to try to
- 17
- 00:00:33,790 --> 00:00:35,680
- keep the values of the parameters small.
- 18
- 00:00:36,120 --> 00:00:37,800
- And as usual, the regularization sums
- 19
- 00:00:38,170 --> 00:00:39,590
- from j equals 1 to
- 20
- 00:00:39,690 --> 00:00:40,880
- m rather than j equals 0
- 21
- 00:00:41,000 --> 00:00:43,530
- to m. Let's consider three cases.
- 22
- 00:00:44,140 --> 00:00:45,990
- The first is the case of
- 23
- 00:00:46,060 --> 00:00:47,300
- a very large value of the
- 24
- 00:00:47,360 --> 00:00:49,650
- regularization parameter lambda, such
- 25
- 00:00:49,890 --> 00:00:51,040
- as if lambda were
- 26
- 00:00:51,190 --> 00:00:52,900
- equal to 10,000s of huge value.
- 27
- 00:00:53,180 --> 00:00:54,500
- In this
- 28
- 00:00:54,770 --> 00:00:55,910
- case, all of these
- 29
- 00:00:56,060 --> 00:00:57,650
- parameters, theta 1, theta 2,
- 30
- 00:00:57,980 --> 00:00:58,710
- theta 3 and so on will
- 31
- 00:00:58,890 --> 00:01:00,790
- be heavily penalized and
- 32
- 00:01:00,970 --> 00:01:03,060
- so, what ends up with most
- 33
- 00:01:03,510 --> 00:01:04,840
- of these parameter values being close
- 34
- 00:01:05,190 --> 00:01:07,400
- to 0 and the hypothesis will be
- 35
- 00:01:07,580 --> 00:01:08,340
- roughly h or x
- 36
- 00:01:08,680 --> 00:01:10,380
- just equal or approximately equal
- 37
- 00:01:10,670 --> 00:01:11,930
- to theta 0, and so we
- 38
- 00:01:12,090 --> 00:01:13,960
- end up a hypothesis that more
- 39
- 00:01:14,200 --> 00:01:15,650
- or less looks like that. This is more or
- 40
- 00:01:15,770 --> 00:01:18,530
- less a flat, constant straight line.
- 41
- 00:01:18,810 --> 00:01:20,720
- And so this hypothesis has high
- 42
- 00:01:21,060 --> 00:01:23,030
- bias and a value underfits this data set.
- 43
- 00:01:23,370 --> 00:01:24,920
- So the horizontal straight
- 44
- 00:01:25,240 --> 00:01:26,210
- line is just not a very
- 45
- 00:01:26,340 --> 00:01:28,200
- good model for this data set.
- 46
- 00:01:28,500 --> 00:01:30,270
- At the other extreme beam is if we have
- 47
- 00:01:30,650 --> 00:01:31,960
- a very small value of
- 48
- 00:01:32,250 --> 00:01:33,710
- lambda, such as if lambda
- 49
- 00:01:34,110 --> 00:01:36,030
- were equal to 0.
- 50
- 00:01:36,120 --> 00:01:37,340
- In that case, given that we're
- 51
- 00:01:37,480 --> 00:01:38,640
- fitting a high order polynomial,
- 52
- 00:01:38,790 --> 00:01:40,090
- this is a
- 53
- 00:01:40,340 --> 00:01:41,990
- usual overfitting setting.
- 54
- 00:01:43,150 --> 00:01:44,390
- In that case, given that we're
- 55
- 00:01:44,590 --> 00:01:45,940
- fitting a high order polynomial,
- 56
- 00:01:46,570 --> 00:01:48,450
- basically without regularization or with
- 57
- 00:01:48,630 --> 00:01:50,570
- very minimal regularization, we end
- 58
- 00:01:50,750 --> 00:01:52,580
- up with our usual high variance, overfitting
- 59
- 00:01:53,210 --> 00:01:54,800
- setting, because basically if lambda is
- 60
- 00:01:55,030 --> 00:01:56,050
- equal to zero, we are just
- 61
- 00:01:56,190 --> 00:01:58,710
- fitting with our regularization so
- 62
- 00:01:58,840 --> 00:02:00,260
- that overfits the hypothesis
- 63
- 00:02:01,100 --> 00:02:02,470
- and is only if we have some
- 64
- 00:02:02,630 --> 00:02:09,120
- intermediate value of lambda that is neither too large nor too small that we end up with parameters theta
- 65
- 00:02:09,620 --> 00:02:11,380
- that we end up that give us a reasonable
- 66
- 00:02:11,770 --> 00:02:12,750
- fit to this data.
- 67
- 00:02:13,290 --> 00:02:14,610
- So how can we automatically
- 68
- 00:02:15,010 --> 00:02:16,580
- choose a good value
- 69
- 00:02:17,180 --> 00:02:18,690
- for the regularization parameter lambda?
- 70
- 00:02:19,700 --> 00:02:23,570
- Just to reiterate, here is our model and here is our learning algorithm subjective.
- 71
- 00:02:24,270 --> 00:02:27,180
- For the setting where we're using regularization, let me define
- 72
- 00:02:28,010 --> 00:02:30,140
- j train of theta to be something different
- 73
- 00:02:31,010 --> 00:02:32,970
- to be the optimization objective
- 74
- 00:02:33,770 --> 00:02:35,400
- but without the regularization term.
- 75
- 00:02:36,140 --> 00:02:38,000
- Previously, in earlier video
- 76
- 00:02:38,350 --> 00:02:39,270
- when we are not using
- 77
- 00:02:39,640 --> 00:02:42,100
- regularization, I define j train of theta to
- 78
- 00:02:42,450 --> 00:02:45,980
- be the same as j of theta as the cost function but
- 79
- 00:02:46,230 --> 00:02:49,340
- when we are using regularization with this extra lambda term
- 80
- 00:02:49,980 --> 00:02:51,040
- we're going to
- 81
- 00:02:51,280 --> 00:02:53,130
- define j train my training set error,
- 82
- 00:02:53,400 --> 00:02:54,410
- to be just my sum of
- 83
- 00:02:54,630 --> 00:02:55,970
- squared errors on the training
- 84
- 00:02:56,210 --> 00:02:57,800
- set, or my average squared error
- 85
- 00:02:57,920 --> 00:03:00,960
- on the training set without taking into account that regularization chart.
- 86
- 00:03:01,840 --> 00:03:03,150
- And similarly, I'm then
- 87
- 00:03:03,310 --> 00:03:04,590
- also going to define the
- 88
- 00:03:05,110 --> 00:03:07,070
- cross-validation set error when the
- 89
- 00:03:07,170 --> 00:03:08,270
- test set error, as before
- 90
- 00:03:08,730 --> 00:03:10,620
- to be the average sum of squared errors
- 91
- 00:03:11,220 --> 00:03:12,890
- on the cross-validation and the test sets.
- 92
- 00:03:14,140 --> 00:03:16,170
- So just to summarize,
- 93
- 00:03:16,720 --> 00:03:17,960
- my definitions of J train and
- 94
- 00:03:18,390 --> 00:03:19,310
- J CV and J
- 95
- 00:03:19,520 --> 00:03:20,720
- test are just the
- 96
- 00:03:20,950 --> 00:03:21,910
- average squared error, or one
- 97
- 00:03:22,310 --> 00:03:23,510
- half of the average
- 98
- 00:03:23,890 --> 00:03:25,500
- squared error on my training validation and
- 99
- 00:03:25,740 --> 00:03:27,670
- test sets without the extra
- 100
- 00:03:28,110 --> 00:03:29,090
- regularization chart.
- 101
- 00:03:29,260 --> 00:03:33,900
- So, this is how we can automatically choose the regularization parameter lambda.
- 102
- 00:03:34,850 --> 00:03:36,500
- What I usually do is may
- 103
- 00:03:36,620 --> 00:03:38,940
- be have some range of values of lambda I want to try it.
- 104
- 00:03:39,120 --> 00:03:40,640
- So I might be
- 105
- 00:03:40,780 --> 00:03:41,950
- considering not using regularization,
- 106
- 00:03:42,930 --> 00:03:44,460
- or here are a few values I might try.
- 107
- 00:03:44,680 --> 00:03:45,640
- I might be considering along because
- 108
- 00:03:46,110 --> 00:03:48,290
- of O1, O2 from O4 and so on.
- 109
- 00:03:48,880 --> 00:03:50,300
- And you know, I usually step these
- 110
- 00:03:50,560 --> 00:03:51,510
- up in multiples of
- 111
- 00:03:51,710 --> 00:03:55,750
- two until some maybe larger value
- 112
- 00:03:55,860 --> 00:03:57,040
- this in multiples of two you
- 113
- 00:03:57,270 --> 00:03:58,790
- I actually end up with 10.24;
- 114
- 00:03:59,060 --> 00:04:01,600
- it's ten exactly, but you
- 115
- 00:04:01,770 --> 00:04:03,030
- know, this is close enough and
- 116
- 00:04:03,650 --> 00:04:05,110
- the 35 decimal
- 117
- 00:04:05,400 --> 00:04:07,620
- places won't affect your result that much.
- 118
- 00:04:07,930 --> 00:04:10,310
- So, this gives me, maybe
- 119
- 00:04:10,630 --> 00:04:12,460
- twelve different models, that I'm
- 120
- 00:04:12,600 --> 00:04:14,340
- trying to select amongst, corresponding to
- 121
- 00:04:14,530 --> 00:04:16,200
- 12 different values of the
- 122
- 00:04:16,510 --> 00:04:18,820
- regularization parameter lambda and
- 123
- 00:04:19,070 --> 00:04:20,300
- of course, you can also go
- 124
- 00:04:20,500 --> 00:04:22,330
- to values less than 0.01
- 125
- 00:04:22,610 --> 00:04:23,800
- or values larger than 10,
- 126
- 00:04:23,900 --> 00:04:26,770
- but I've just truncated it here for convenience.
- 127
- 00:04:27,300 --> 00:04:28,660
- Given each of these 12
- 128
- 00:04:28,990 --> 00:04:30,140
- models, what we can
- 129
- 00:04:30,370 --> 00:04:31,170
- do is then the following:
- 130
- 00:04:32,200 --> 00:04:33,500
- we take this first
- 131
- 00:04:33,880 --> 00:04:35,250
- model with lambda equals 0,
- 132
- 00:04:35,450 --> 00:04:37,510
- and minimize my cost
- 133
- 00:04:37,790 --> 00:04:39,950
- function j of theta and this
- 134
- 00:04:40,180 --> 00:04:41,710
- would give me some parameter vector theta
- 135
- 00:04:42,250 --> 00:04:43,400
- and similar to the earlier video,
- 136
- 00:04:43,600 --> 00:04:46,460
- let me just denote this as
- 137
- 00:04:46,950 --> 00:04:48,050
- theta superscript 1.
- 138
- 00:04:49,980 --> 00:04:50,840
- And then I can take my
- 139
- 00:04:51,020 --> 00:04:52,610
- second model, with lambda
- 140
- 00:04:53,090 --> 00:04:54,620
- set to 0.01 and
- 141
- 00:04:55,250 --> 00:04:57,210
- minimize my cost function, now
- 142
- 00:04:57,340 --> 00:04:58,960
- using lambda equals 0.01
- 143
- 00:04:59,060 --> 00:05:00,170
- of course, to get some
- 144
- 00:05:00,360 --> 00:05:01,380
- different parameter vector theta,
- 145
- 00:05:01,930 --> 00:05:02,820
- we need to know that theta 2,
- 146
- 00:05:02,950 --> 00:05:04,090
- and for that I
- 147
- 00:05:04,330 --> 00:05:05,610
- end up with theta 3
- 148
- 00:05:05,810 --> 00:05:06,680
- so that this is correct for my
- 149
- 00:05:06,750 --> 00:05:08,490
- third model, and so on,
- 150
- 00:05:09,020 --> 00:05:10,380
- until for for my final model
- 151
- 00:05:10,850 --> 00:05:13,150
- with lambda set to 10,
- 152
- 00:05:13,450 --> 00:05:16,550
- or 10.24, or I end up with this theta 12.
- 153
- 00:05:17,740 --> 00:05:19,210
- Next I can take
- 154
- 00:05:19,450 --> 00:05:21,110
- all of these hypotheses, all of
- 155
- 00:05:21,190 --> 00:05:23,250
- these parameters, and use
- 156
- 00:05:23,560 --> 00:05:25,600
- my cross-validation set to evaluate them.
- 157
- 00:05:26,340 --> 00:05:27,840
- So I can look at my
- 158
- 00:05:28,520 --> 00:05:29,820
- first model, my second
- 159
- 00:05:30,170 --> 00:05:32,070
- model, fits with these different values
- 160
- 00:05:32,300 --> 00:05:34,090
- of the regularization parameter and
- 161
- 00:05:34,340 --> 00:05:36,220
- evaluate them on my cross-validation
- 162
- 00:05:36,470 --> 00:05:40,550
- set - basically measure the average squared error of each of these parameter
- 163
- 00:05:40,740 --> 00:05:43,310
- vectors theta on my cross-validation set.
- 164
- 00:05:45,050 --> 00:05:46,800
- And I would then pick whichever one
- 165
- 00:05:46,960 --> 00:05:48,400
- of these 12 models gives me
- 166
- 00:05:48,570 --> 00:05:50,850
- the lowest error on the cross-validation set.
- 167
- 00:05:51,250 --> 00:05:52,790
- And let's say, for the sake
- 168
- 00:05:53,070 --> 00:05:54,660
- of this example, that I
- 169
- 00:05:54,950 --> 00:05:56,570
- end up picking theta 5,
- 170
- 00:05:56,650 --> 00:05:59,260
- the fifth order polynomial, because
- 171
- 00:05:59,650 --> 00:06:02,240
- that has the Noah's cross-validation error.
- 172
- 00:06:03,010 --> 00:06:05,220
- Having done that, finally, what
- 173
- 00:06:05,390 --> 00:06:06,220
- I would do if I want
- 174
- 00:06:06,490 --> 00:06:07,630
- to report a test set error
- 175
- 00:06:08,370 --> 00:06:09,690
- is to take the parameter theta
- 176
- 00:06:10,000 --> 00:06:11,890
- 5 that I've
- 177
- 00:06:12,040 --> 00:06:13,550
- selected and look at
- 178
- 00:06:13,670 --> 00:06:15,710
- how well it does on my test set.
- 179
- 00:06:15,840 --> 00:06:17,310
- And once again here is as
- 180
- 00:06:17,480 --> 00:06:18,870
- if we fit this parameter
- 181
- 00:06:19,230 --> 00:06:21,940
- theta to my cross-validation
- 182
- 00:06:22,270 --> 00:06:23,460
- set, which is why I
- 183
- 00:06:23,660 --> 00:06:25,140
- am saving aside a separate
- 184
- 00:06:25,420 --> 00:06:26,810
- test set that I
- 185
- 00:06:26,860 --> 00:06:28,260
- am going to use to get
- 186
- 00:06:28,350 --> 00:06:29,470
- a better estimate of how
- 187
- 00:06:29,730 --> 00:06:30,940
- well my a parameter vector
- 188
- 00:06:31,190 --> 00:06:34,590
- theta will generalize to previously unseen examples.
- 189
- 00:06:35,120 --> 00:06:36,870
- So that's model selection applied
- 190
- 00:06:37,260 --> 00:06:39,310
- to selecting the regularization parameter
- 191
- 00:06:40,260 --> 00:06:41,350
- lambda. The last thing
- 192
- 00:06:41,490 --> 00:06:42,520
- I'd like to do in this
- 193
- 00:06:42,770 --> 00:06:43,890
- video, is get a
- 194
- 00:06:43,970 --> 00:06:46,080
- better understanding of how
- 195
- 00:06:46,650 --> 00:06:48,340
- cross-validation and training error
- 196
- 00:06:48,680 --> 00:06:49,920
- vary as we
- 197
- 00:06:50,130 --> 00:06:52,430
- vary the regularization parameter lambda.
- 198
- 00:06:53,060 --> 00:06:54,360
- And so just a reminder, that
- 199
- 00:06:54,460 --> 00:06:55,960
- was our original cost function j of
- 200
- 00:06:56,140 --> 00:06:57,530
- theta, but for this
- 201
- 00:06:57,700 --> 00:06:58,950
- purpose we're going to define
- 202
- 00:06:59,750 --> 00:07:01,130
- training error without using
- 203
- 00:07:01,540 --> 00:07:03,480
- the regularization parameter, and cross-validation
- 204
- 00:07:04,160 --> 00:07:05,450
- error without using the
- 205
- 00:07:05,660 --> 00:07:08,110
- regularization parameter and what I'd like
- 206
- 00:07:08,510 --> 00:07:10,070
- to do is plot this J train
- 207
- 00:07:11,050 --> 00:07:13,720
- and plot this Jcv, meaning just
- 208
- 00:07:14,000 --> 00:07:15,120
- how well does my
- 209
- 00:07:15,220 --> 00:07:17,550
- hypothesis do for on
- 210
- 00:07:17,880 --> 00:07:19,060
- the training set and how well
- 211
- 00:07:19,220 --> 00:07:20,580
- does my hypothesis do on the
- 212
- 00:07:20,640 --> 00:07:22,550
- cross-validation set as I
- 213
- 00:07:22,620 --> 00:07:24,530
- vary my regularization parameter
- 214
- 00:07:25,000 --> 00:07:28,470
- lambda so as
- 215
- 00:07:28,620 --> 00:07:31,040
- we saw earlier, if lambda
- 216
- 00:07:31,370 --> 00:07:33,030
- is small, then we're
- 217
- 00:07:33,220 --> 00:07:35,620
- not using much regularization and
- 218
- 00:07:36,070 --> 00:07:38,160
- we run a larger risk of overfitting.
- 219
- 00:07:39,250 --> 00:07:40,980
- Where as if lambda is
- 220
- 00:07:41,230 --> 00:07:42,390
- large, that is if we
- 221
- 00:07:42,610 --> 00:07:43,510
- were on the right part
- 222
- 00:07:44,490 --> 00:07:46,700
- of this horizontal axis, then
- 223
- 00:07:46,990 --> 00:07:48,070
- with a large value of lambda
- 224
- 00:07:48,860 --> 00:07:51,360
- we run the high risk of having a bias problem.
- 225
- 00:07:52,340 --> 00:07:53,950
- So if you plot J train
- 226
- 00:07:54,580 --> 00:07:56,200
- and Jcv, what you
- 227
- 00:07:56,280 --> 00:07:58,030
- find is that for small
- 228
- 00:07:58,400 --> 00:08:00,470
- values of lambda you can
- 229
- 00:08:01,310 --> 00:08:02,340
- fit the training set relatively
- 230
- 00:08:02,940 --> 00:08:04,490
- well because you're not regularizing.
- 231
- 00:08:04,900 --> 00:08:06,190
- So, for small values of
- 232
- 00:08:06,290 --> 00:08:08,050
- lambda, the regularization term basically
- 233
- 00:08:08,260 --> 00:08:09,400
- goes away and you're just
- 234
- 00:08:09,720 --> 00:08:11,760
- minimizing pretty much your squared error.
- 235
- 00:08:12,170 --> 00:08:13,790
- So when lambda is small, you
- 236
- 00:08:13,930 --> 00:08:14,880
- end up with a small value
- 237
- 00:08:15,470 --> 00:08:17,090
- for J train, whereas if
- 238
- 00:08:17,200 --> 00:08:18,480
- lambda is large, then you
- 239
- 00:08:19,040 --> 00:08:21,780
- have a high bias problem and you might not fit your training set so well.
- 240
- 00:08:21,940 --> 00:08:23,100
- So you end up with a value up there.
- 241
- 00:08:23,850 --> 00:08:28,100
- So, J train of
- 242
- 00:08:28,230 --> 00:08:29,430
- theta will tend to
- 243
- 00:08:29,620 --> 00:08:31,590
- increase when lambda increases
- 244
- 00:08:32,350 --> 00:08:34,020
- because a large value of
- 245
- 00:08:34,220 --> 00:08:35,150
- lambda corresponds a high bias
- 246
- 00:08:35,700 --> 00:08:36,700
- where you might not even fit your
- 247
- 00:08:36,890 --> 00:08:38,460
- training set well, whereas a
- 248
- 00:08:38,590 --> 00:08:40,680
- small value of lambda corresponds to,
- 249
- 00:08:40,950 --> 00:08:42,800
- if you can you know freely
- 250
- 00:08:43,150 --> 00:08:45,990
- fit to very high degree polynomials, your data, let's say.
- 251
- 00:08:46,220 --> 00:08:50,160
- As for the cross-validation error, we end up with a figure like this.
- 252
- 00:08:51,380 --> 00:08:52,900
- Where, over here on
- 253
- 00:08:53,230 --> 00:08:54,760
- the right, if we
- 254
- 00:08:54,830 --> 00:08:55,770
- have a large value of lambda,
- 255
- 00:08:56,740 --> 00:08:57,900
- we may end up underfitting.
- 256
- 00:08:59,200 --> 00:09:00,580
- And so, this is the bias regime
- 257
- 00:09:02,250 --> 00:09:05,050
- whereas and cross
- 258
- 00:09:05,330 --> 00:09:06,980
- validation error will be
- 259
- 00:09:07,220 --> 00:09:08,360
- high and let me just leave
- 260
- 00:09:08,550 --> 00:09:11,060
- all that. So, that's Jcv of theta because with
- 261
- 00:09:11,570 --> 00:09:12,740
- high bias we won't be fitting.
- 262
- 00:09:13,730 --> 00:09:15,880
- We won't be doing well on the cross-validation set.
- 263
- 00:09:17,350 --> 00:09:20,300
- Whereas here on the left, this is the high-variance regime.
- 264
- 00:09:21,420 --> 00:09:22,920
- Where if we have two smaller
- 265
- 00:09:23,320 --> 00:09:25,210
- value of lambda then we
- 266
- 00:09:25,370 --> 00:09:26,490
- may be overfitting the data
- 267
- 00:09:27,170 --> 00:09:28,440
- and so by over fitting the
- 268
- 00:09:28,530 --> 00:09:30,620
- data then it a cross validation error
- 269
- 00:09:31,010 --> 00:09:31,910
- will also be high.
- 270
- 00:09:33,000 --> 00:09:34,680
- And so, this is what the
- 271
- 00:09:35,920 --> 00:09:37,570
- cross-validation error and what
- 272
- 00:09:37,810 --> 00:09:39,160
- the training error may look
- 273
- 00:09:39,430 --> 00:09:40,710
- like on a training set
- 274
- 00:09:41,050 --> 00:09:43,820
- as we vary the regularization parameter lambda.
- 275
- 00:09:44,010 --> 00:09:45,120
- And so, once again, it will
- 276
- 00:09:45,330 --> 00:09:47,000
- often be some intermediate value
- 277
- 00:09:47,290 --> 00:09:50,420
- of lambda that you know, subsequent just right
- 278
- 00:09:50,620 --> 00:09:51,890
- or that works best in
- 279
- 00:09:52,020 --> 00:09:53,370
- terms of having a small
- 280
- 00:09:53,670 --> 00:09:56,610
- cross-validation error or a small test set error.
- 281
- 00:09:56,820 --> 00:09:57,880
- And whereas the curves I've drawn
- 282
- 00:09:58,200 --> 00:10:01,130
- here are somewhat cartoonish and somewhat idealized.
- 283
- 00:10:01,550 --> 00:10:02,570
- So on a real data set
- 284
- 00:10:02,810 --> 00:10:04,300
- the pros you get may
- 285
- 00:10:04,410 --> 00:10:05,370
- end up looking a little bit more
- 286
- 00:10:05,590 --> 00:10:07,480
- messy and just a little bit more noisy than this.
- 287
- 00:10:08,440 --> 00:10:09,540
- For some data sets you will
- 288
- 00:10:10,080 --> 00:10:11,350
- really see these poor
- 289
- 00:10:11,640 --> 00:10:13,080
- source of trends and
- 290
- 00:10:13,350 --> 00:10:14,240
- by looking at the plot
- 291
- 00:10:14,800 --> 00:10:15,830
- of the whole or cross validation
- 292
- 00:10:16,720 --> 00:10:18,360
- error, you can either
- 293
- 00:10:18,500 --> 00:10:20,270
- manually, automatically try to
- 294
- 00:10:20,580 --> 00:10:22,000
- select a point that minimizes
- 295
- 00:10:22,450 --> 00:10:25,490
- the cross-validation error and
- 296
- 00:10:25,780 --> 00:10:27,500
- select the value of lambda corresponding
- 297
- 00:10:28,180 --> 00:10:29,680
- to low cross-validation error.
- 298
- 00:10:30,460 --> 00:10:31,690
- When I'm trying to pick the
- 299
- 00:10:31,820 --> 00:10:33,770
- regularization parameter lambda
- 300
- 00:10:34,100 --> 00:10:36,200
- for a learning algorithm, often I
- 301
- 00:10:36,320 --> 00:10:37,420
- find that plotting a figure
- 302
- 00:10:37,700 --> 00:10:39,370
- like this one showed here, helps
- 303
- 00:10:39,650 --> 00:10:41,420
- me understand better what's going
- 304
- 00:10:41,680 --> 00:10:43,220
- on and helps me verify that
- 305
- 00:10:43,780 --> 00:10:45,040
- I am indeed picking a good
- 306
- 00:10:45,220 --> 00:10:46,570
- value for the regularization parameter
- 307
- 00:10:47,020 --> 00:10:49,220
- lambda. So hopefully that
- 308
- 00:10:49,420 --> 00:10:51,060
- gives you more insight into regularization
- 309
- 00:10:52,550 --> 00:10:53,790
- and it's effects on the bias
- 310
- 00:10:54,300 --> 00:10:55,370
- and variance of the learning algorithm.
- 311
- 00:10:56,870 --> 00:10:58,410
- By know you've seen bias and
- 312
- 00:10:58,570 --> 00:11:00,310
- variance from a lot of different perspectives.
- 313
- 00:11:01,080 --> 00:11:02,370
- And what I'd like to do
- 314
- 00:11:02,600 --> 00:11:03,900
- in the next video is take
- 315
- 00:11:04,130 --> 00:11:05,010
- a lot of the insights
- 316
- 00:11:05,180 --> 00:11:06,970
- that we've gone through and build
- 317
- 00:11:07,220 --> 00:11:08,110
- on them to put together
- 318
- 00:11:08,820 --> 00:11:10,670
- a diagnostic that's called learning
- 319
- 00:11:10,950 --> 00:11:12,000
- curves, which is a
- 320
- 00:11:12,050 --> 00:11:13,200
- tool that I often use
- 321
- 00:11:13,620 --> 00:11:14,820
- to try to diagnose if a
- 322
- 00:11:15,090 --> 00:11:16,530
- learning algorithm may be suffering
- 323
- 00:11:16,940 --> 00:11:18,230
- from a bias problem or a
- 324
- 00:11:18,460 --> 00:11:19,850
- variance problem or a little bit of both.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement