Advertisement
Guest User

Untitled

a guest
Oct 23rd, 2017
68
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 1.40 KB | None | 0 0
  1. The learning rates were chosen after experimenting with values in the range of 0.001..0.01 for f2 and 0.1..0.5 for f3
  2. \item $f_2(\vec x)$ has one local minimum, while $f_3(\vec x)$ has one local minimum and one local maximum. Considering the learning rates ($\eta$), when experimenting in the range between 0.1 and 1 both gradient descents fail and cannot find the local minimum after some limit. The limit for f2 is approximately $\eta=0.2$, while the limit for f3 is $\eta=0.9$. Figure 4. and Figure 5. show the moment when the learning rate becomes too large so that the algorithm starts zig-zagging instead of smoothly approaching the local minimum.
  3. Before $f_3$ approaches the limit learning rate, it approaches the local minimum faster and faster, with even less points visible in the contours. On the other hand, the descent of $f_2$ at these learning rates completely breaks out of control, with some points of the algorithm reaching very large values, for example when $\eta=0.3$, 2 points of the algorithm approach (-100000,-100000) and (200000,-200000), and as $\eta$ increases, these coordinates increase as well. $f_3$ exhibits the same behavior for $\eta>1$, although much less intensely due to its non-monotonic nature. In conclusion, learning rates depends on the functions and their gradients, and gradient descent algorithms on monotonic functions can converge with smaller learning rates than non-monotonic ones.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement