Untitled

function STOCHASTIC GRADIENT DESCENT(L(), f(), x, y) returns q
# where: L is the loss function
# f is a function parameterized by q
# x is the set of training inputs x(1)
, x(2)
,..., x(n)
# y is the set of training outputs (labels) y(1)
, y(2)
,..., y(n)
q 0
repeat T times
For each training tuple (x(i)
, y(i)
) (in random order)
Compute ˆy(i) = f(x(i)
;q) # What is our estimated output ˆy?
Compute the loss L(yˆ(i)
, y(i)
) # How far off is ˆy(i)
) from the true output y(i)
?
g —q L(f(x(i)
;q), y(i)
) # How should we move q to maximize loss ?
q q  h g # go the other way instead
return q