next up previous
Next: About this document ...

CS5652: Artificial Neural Networks
Implementation of tanmlp.m


  
Figure 1: A 1-3-2 MLP.
\begin{figure}
 \centerline{
\psfig {figure=tanmlp.eps,width=.7\hsize}
}\end{figure}

For simplicity, suppose that we are dealing with a 1-3-2 multilayer perceptron with hyperbolic tangent activation functions, and we have 100 input-output data pairs as the training data set. The training data set can be represented by a $100 \times 3$ matrix:

\begin{displaymath}
\left[
\begin{array}
{ccc}
x_{1,1} & t_{5,1} & t_{6,1} \\ \v...
 ...dots \\ x_{1,100} & t_{5,100} & t_{6,100} \\ \end{array}\right]\end{displaymath}

where the input part is denoted by ${\bf X}_0$:

\begin{displaymath}
{\bf X}_0 =
\left[
\begin{array}
{ccc}
x_{1,1} \\ \vdots \\ x_{1,p} \\ \vdots \\ x_{1,100} \\ \end{array}\right]\end{displaymath}

and the output (target) part is denoted by ${\bf T}$:

\begin{displaymath}
{\bf T}=
\left[
\begin{array}
{cc}
t_{5,1} & t_{6,1} \\ \vdo...
 ... \vdots & \vdots \\ t_{5,100} & t_{6,100} \\ \end{array}\right]\end{displaymath}

For discussion convenience, we shall also defined ${\bf X}_1$ and ${\bf X}_2$as the node outputs for layer 1 and 2, respectively:

\begin{displaymath}
{\bf X}_1 = 
\left[
\begin{array}
{ccc}
x_{2,1} & x_{3,1} & ...
 ...dots \\ x_{2,100} & x_{3,100} & x_{4,100} \\ \end{array}\right]\end{displaymath}

\begin{displaymath}
{\bf X}_2 = 
\left[
\begin{array}
{cc}
x_{5,1} & x_{6,1} \\ ...
 ... \vdots & \vdots \\ x_{5,100} & x_{6,100} \\ \end{array}\right]\end{displaymath}

Similarly, the parameters ${\bf W}_1$ and ${\bf W}_2$ for the first and second layers can be defined as follows:

\begin{displaymath}
{\bf W}_1 = 
\left[
\begin{array}
{ccc}
w_{12} & w_{13} & w_{14} \\  
w_{2} & w_{3} & w_{4} \\  \end{array}\right]\end{displaymath}

\begin{displaymath}
{\bf W}_2 = 
\left[
\begin{array}
{cc}
w_{25} & w_{26} \\  
...
 ... \\  
w_{45} & w_{46} \\  
w_{5} & w_{6} \\  \end{array}\right]\end{displaymath}

The equations for computing the output of the first layer are

\begin{displaymath}
\begin{array}
{rcl}
x_2 & = & tanh(x_1 w_{12} + w_2)\\ x_3 &...
 ... w_{13} + w_3)\\ x_4 & = & tanh(x_1 w_{14} + w_4)\\ \end{array}\end{displaymath}

or equivalently,

\begin{displaymath}
\left[
\begin{array}
{ccc}
x_2 & x_3 & x_4 \\ \end{array}\ri...
 ... w_{14} \\ w_{2} & w_{3} & w_{4} \\ \end{array}\right]
\right).\end{displaymath}

After plugging 100 inputs into the preceding equation, we have

\begin{displaymath}
\left[
\begin{array}
{ccc}
x_{2,1} & x_{3,1} & x_{4,1} \\ \v...
 ... w_{14} \\ w_{2} & w_{3} & w_{4} \\ \end{array}\right]
\right),\end{displaymath}

or equivalently,

\begin{displaymath}
{\bf X}_1 = tanh([{\bf X}_0, one]*{\bf W}_1).\end{displaymath}

The preceding equation corresponds to line 47 of tanmlp.m:
X1 = tanh([X0 one]*W1);
The output of layer 2 can be computed similarly and we have line 48 of tanmlp.m:
X2 = tanh([X1 one]*W2);

The instantaneous error measure for the pth data pair is defined by

Ep = (t5,p-x5,p)2 + (t6,p-x6,p)2,

where t5,p and t6,p are the pth target outputs; x5,p and x6,p are the pth network outputs. The derivative of the above instantaneous error measure with respect to the network outputs is written as

\begin{displaymath}
\frac{\textstyle \partial E_p}{\textstyle \partial {\bf X}_2...
 ...t_{5,p} - x_{5,p}) & -2(t_{6,p} - x_{6,p})\\ \end{array}\right]\end{displaymath}

We can stack the above equation for each p to obtain the following matrix expression:

\begin{displaymath}
\begin{array}
{rcl}
\frac{\textstyle \partial E}{\textstyle ...
 ...nd{array} \right]
\right)
=
-2 ({\bf T}- {\bf X}_2),\end{array}\end{displaymath}

where ${\bf X}_2$ is the actual output of the MLP. The preceding equation corresponds to line 59 of tanmlp.m:
dE_dX2 = -2*(T - X2);

Now we can compute the derivatives of Ep with respect to the second-layer's weights and bias. The derivatives of Ep with respect to the parameters (weights and bias) of node 5 are

\begin{displaymath}
\begin{array}
{rllll}
\frac{\textstyle \partial E_p}{\textst...
 ...ial E_p}{\textstyle \partial x_5}
(1+x_5)(1-x_5) \\ \end{array}\end{displaymath}

The derivatives of Ep with respect to the parameters (weights and bias) of node 6 are

\begin{displaymath}
\begin{array}
{rllll}
\frac{\textstyle \partial E_p}{\textst...
 ...ial E_p}{\textstyle \partial x_6}
(1+x_6)(1-x_6) \\ \end{array}\end{displaymath}

We can combine the above eight equations to have the following concise expression:

\begin{displaymath}
\frac{\textstyle \partial E_p}{\textstyle \partial {\bf W}_2...
 ... E_p}{\textstyle \partial x_6}(1-x_6)(1+x_6)\end{array}\right].\end{displaymath}

Therefore the accumulated gradient vector is  
 \begin{displaymath}
\begin{array}
{rcl}
\frac{\textstyle \partial E}{\textstyle ...
 ... {\bf X}_2}.*(1+{\bf X}_2).*(1-{\bf X}_2)
\right]\\ \end{array}\end{displaymath} (1)
The preceding equation corresponds to line 60 of tanmlp.m:

dE_dW2 = [X1 one]'*(dE_dX2.*(1+X2).*(1-X2));

For derivatives of Ep with respect to x2, we have

\begin{displaymath}
\begin{array}
{rcl}
\frac{\textstyle \partial E_p}{\textstyl...
 ...E_p}{\textstyle \partial x_6} 
(1-x_6)(1+x_6)w_{26}.\end{array}\end{displaymath}

Similarly, we have

\begin{displaymath}
\begin{array}
{rcl}
\frac{\textstyle \partial E_p}{\textstyl...
 ...E_p}{\textstyle \partial x_6} 
(1-x_6)(1+x_6)w_{46}.\end{array}\end{displaymath}

The preceding three equations can be put into matrix form:

\begin{displaymath}
\begin{array}
{rcl}
\left[ 
\frac{\textstyle \partial E_p}{\...
 ...& w_{36}\\ w_{45} & w_{46}\\ \end{array}\right]^T\\ \end{array}\end{displaymath}

Hence the accumulated derivatives of E with respect to ${\bf X}_1$ are

\begin{displaymath}
\begin{array}
{rcl}
\frac{\textstyle \partial E}{\textstyle ...
 ...f X}_2)\right) * {\bf W}_2 (1:\mbox{hidden}, :)'.\\ \end{array}\end{displaymath}

The preceding equation corresponds to line 62 of tanmlp.m:

dE_dX1 = dE_dX2.*(1-X2).*(1+X2)*W2(1:hidden_n,:)';

By proceeding as what we have done in Equation (1), we have

\begin{displaymath}
\begin{array}
{rcl}
\frac{\textstyle \partial E}{\textstyle ...
 ... {\bf X}_1}.*(1+{\bf X}_1).*(1-{\bf X}_1)
\right]\\ \end{array}\end{displaymath}

The preceding equation corresponds to line 63 of tanmlp.m:
dE_dW1 = [X0 one]'*(dE_dX1.*(1+X1).*(1-X1));


 
next up previous
Next: About this document ...
J.-S. Roger Jang
11/26/1997