NN

Gradient Descent Learning Algorithm for Sigmoidal Perceptrons
----------------------------------------------------------------
Algorithm
Initialization: Examples {( xe, ye)}e=1N, initial weights wi set to small random values, learning rate parameter η = 0.1
Repeat
for each training example ( xe, ye )
 calculate the output: o = s( s ) = 1 / ( 1 + e-s ), where: s= Σi=0d wi xi
 if the Perceptron does not respond correctly compute weight corrections:
   Δwi = Δwi + η ( ye - oe ) s( s )( 1 - s( s )) xie
 update the weights with the accumulated error from all examples
 wi = wi + Δwi // Gradient Descent Rule
until termination condition is satisfied.

Example: Suppose an example of Perceptron which accepts two inputs x1 and x2, with weights w1 = 0.5 and w2 = 0.3 and w0 = -1.

Let the following example is given: x1 = 2, x2 = 1, y = 0 The output of the Perceptron is :

   o = s( -1 + 2 * 0.5 + 1 * 0.3 ) = s( 0.3 ) = 0.5744
The weight updates according to the gradient descent algorithm will be:

    Δw0 = ( 0 - 0.5744 ) * 0.5744 * ( 1 -0.5744 ) * 1 = - 0.1404

    Δw1 = ( 0 - 0.5744 ) * 0.5744 * ( 1 -0.5744 ) * 2 = - 0.2808

    Δw2 = ( 0 - 0.5744 ) * 0.5744 * ( 1 -0.5744 ) * 1 = - 0.1404
Let another example is given: x1 = 1, x2 = 2, y = 1

The output of the Perceptron is :

   o = s( -1 + 1 * 0.5 + 2 * 0.3 ) = s( 0.1 ) = 0.525
The weight updates according to the gradient descent algorithm will be:

     Δw0 = - 0.1404 + ( 1 - 0.525 ) * 0.525 * ( 1 - 0.525 ) * 1 = -0.0219

    Δw1 = - 0.2808 + ( 1 - 0.525 ) * 0.525 * ( 1 - 0.525 ) * 1 = -0.1623

    Δw2 = - 0.1404 + ( 1 - 0.525 ) * 0.525 * ( 1 - 0.525 ) * 2 = 0.0966
If there are no more examples in the batch, the weights will be modified as follows:

         w0 = - 1 + ( -0.0219 ) = -1.0219

         w1 = 0.5 + ( -0.1623 ) = 0.3966

         w2 = 0.3 + 0.0966 = 0.3966