next up previous
Next: About this document ...

CS5652: Artificial Neural Networks
Implementation of rbfn.m


  
Figure 1: A 1-3-2 RBFN.
\begin{figure}
 \centerline{
\psfig {figure=rbfn.eps,width=.8\hsize}
}\end{figure}

For simplicity, suppose that we are dealing with a 2-3-2 radial basis function network, and we have 100 input-output data pairs as the training data set. The training data set can be represented by a $100 \times 4$ matrix:

\begin{displaymath}
\left[
\begin{array}
{cccc}
x_{1,1} & x_{2,1} & t_{6,1} & t_...
 ...,100} & x_{2,100} & t_{6,100} & t_{7,100} \\ \end{array}\right]\end{displaymath}

where the input part is denoted by ${\bf X}_0$:

\begin{displaymath}
{\bf X}_0 =
\left[
\begin{array}
{cc}
x_{1,1} & x_{2,1} \\ \...
 ... \vdots & \vdots \\ x_{1,100} & x_{2,100} \\ \end{array}\right]\end{displaymath}

and the output (target) part is denoted by ${\bf T}$:

\begin{displaymath}
{\bf T}=
\left[
\begin{array}
{cc}
t_{6,1} & t_{7,1} \\ \vdo...
 ... \vdots & \vdots \\ t_{6,100} & t_{7,100} \\ \end{array}\right]\end{displaymath}

For discussion convenience, we shall also defined ${\bf X}_1$ and ${\bf X}_2$as the node outputs for layer 1 and 2, respectively:

\begin{displaymath}
{\bf X}_1 = 
\left[
\begin{array}
{ccc}
x_{3,1} & x_{4,1} & ...
 ...dots \\ x_{3,100} & x_{4,100} & x_{5,100} \\ \end{array}\right]\end{displaymath}

\begin{displaymath}
{\bf X}_2 = 
\left[
\begin{array}
{cc}
x_{6,1} & x_{7,1} \\ ...
 ... \vdots & \vdots \\ x_{6,100} & x_{7,100} \\ \end{array}\right]\end{displaymath}

The centers of the Gaussian functions in the second layer can be expressed as

\begin{displaymath}
{\bf C}= 
\left[
\begin{array}
{cc}
c_{11} & c_{12} \\  
c_{21} & c_{22} \\  
c_{31} & c_{32} \\  \end{array}\right],\end{displaymath}

where each row represents a center point in a two-dimensional space. The standard deviation of the Gaussian functions in the first layer can be expressed as

\begin{displaymath}
{\bf C}= 
\left[
\begin{array}
{c}
\sigma_1 \\  
\sigma_2 \\  
\sigma_3 \\  \end{array}\right],\end{displaymath}

where each element represents a standard deviation of a gaussian functions in the second layer.

The parameters ${\bf W}$ for the second layer can be defined as follows:

\begin{displaymath}
{\bf W}= 
\left[
\begin{array}
{cc}
w_{36} & w_{37} \\  
w_{46} & w_{47} \\  
w_{56} & w_{57} \\  \end{array}\right]\end{displaymath}

The equations for computing the output of the first layer are

\begin{displaymath}
\begin{array}
{rcl}
x_3 & = & \exp\left(-\frac{\textstyle \V...
 ...bf c}_3\Vert^2}
 {\textstyle 2\sigma_3^2}\right)\\ \end{array},\end{displaymath}

where ${\bf x}=[x_1, x_2]^T$ and ${\bf c}_i = [c_{i1}, c_{i2}]^T$, i=1,2,3. After plugging 100 inputs into the preceding equation, we have

\begin{displaymath}
{\bf X}_1 = 
\left[
\begin{array}
{ccc}
x_{3,1} & x_{4,1} & ...
 ...\Vert^2}{\textstyle 2\sigma_3^2} \\ \end{array}\right]
\right),\end{displaymath}

where ${\bf x}_p = [x_{1,p}, x_{2,p}]^T$ is the pth input vector. The preceding expression can be further simplified into the following expression:

\begin{displaymath}
{\bf X}_1 = 
\exp \left(
\left[
\begin{array}
{ccc}
\Vert{\b...
 ...xtstyle 1}{\textstyle 2\sigma_3} \\  \end{array}\right]
\right)\end{displaymath}

This corresponds to line 63 (or so) in rbfn.m:
X1 = \exp(-(dist.^2)*diag(1./(2*SIGMA.^2)));
where dist is the $100 \times 3$ distance matrix between the 100 input vectors and 3 gaussian centers.

The equations for computing the output of the second layer are

\begin{displaymath}
\begin{array}
{ccc}
x_6 & = & x_3 w_{36} + x_4 w_{46} + x_5 ...
 ...\\ x_7 & = & x_3 w_{37} + x_4 w_{47} + x_5 w_{57}\\ \end{array}\end{displaymath}

or equivalently,

\begin{displaymath}[x_6 \; x_7]
=
[x_3 \; x_4 \; x_5]
*
\left[
\begin{array}
{cc...
 ...37} \\ w_{46} & w_{47} \\ w_{56} & w_{57} \\ \end{array}\right]\end{displaymath}

After plugging 100 data entries into the preceding equation, we have

\begin{displaymath}
\left[
\begin{array}
{cc}
x_{6,1} & x_{7,1} \\ \vdots & \vdo...
 ...7} \\ w_{46} & w_{47} \\ w_{56} & w_{57} \\ \end{array}\right],\end{displaymath}

or equivalently,

\begin{displaymath}
{\bf X}_2 = {\bf X}_1*{\bf W}.\end{displaymath}

The preceding equation corresponds to line 64 (or os) of rbfn.m:
X2 = X1*W;

The instantaneous error measure for the pth data pair is defined by

Ep = (t6,p-x6,p)2 + (t7,p-x7,p)2,

where t6,p and t7,p are the pth target outputs; x6,p and x7,p are the pth network outputs. The derivative of the above instantaneous error measure with respect to the network outputs is written as

\begin{displaymath}
\frac{\textstyle \partial E_p}{\textstyle \partial {\bf X}_2...
 ...t_{6,p} - x_{6,p}) & -2(t_{7,p} - x_{7,p})\\ \end{array}\right]\end{displaymath}

We can stack the above equation for each p to obtain the following matrix expression:

\begin{displaymath}
\begin{array}
{rcl}
\frac{\textstyle \partial E}{\textstyle ...
 ...nd{array} \right]
\right)
=
-2 ({\bf T}- {\bf X}_2),\end{array}\end{displaymath}

where ${\bf X}_2$ is the actual output of the MLP. The preceding equation corresponds to line 56 of rbfn.m:
dE_dX2 = -2*(T - X2);

Now we can compute the derivatives of Ep with respect to the second-layer's weights. The derivatives of Ep with respect to the parameters of node 6 are

\begin{displaymath}
\begin{array}
{rllll}
\frac{\textstyle \partial E_p}{\textst...
 ...style \partial E_p}{\textstyle \partial x_6} x_5 \\ \end{array}\end{displaymath}

The derivatives of Ep with respect to the parameters of node 7 are

\begin{displaymath}
\begin{array}
{rllll}
\frac{\textstyle \partial E_p}{\textst...
 ...style \partial E_p}{\textstyle \partial x_7} x_5 \\ \end{array}\end{displaymath}

We can combine the above eight equations to have the following concise expression:

\begin{displaymath}
\frac{\textstyle \partial E_p}{\textstyle \partial {\bf W}_2...
 ...style \partial E_p}{\textstyle \partial x_6}\end{array}\right].\end{displaymath}

Therefore the accumulated gradient vector is

\begin{displaymath}
\begin{array}
{rcl}
\frac{\textstyle \partial E}{\textstyle ...
 ...style \partial E}{\textstyle \partial {\bf X}_2}.\\ \end{array}\end{displaymath}

The preceding equation corresponds to line 74 (or so) of rbfn.m:

dE_dW = X1'*dE_dX2;

For derivatives of Ep with respect to x3, we have

\begin{displaymath}
\begin{array}
{rcl}
\frac{\textstyle \partial E_p}{\textstyl...
 ...style \partial E_p}{\textstyle \partial x_7} w_{37} \end{array}\end{displaymath}

Similarly, we have

\begin{displaymath}
\begin{array}
{rcl}
\frac{\textstyle \partial E_p}{\textstyl...
 ...style \partial E_p}{\textstyle \partial x_7} w_{57} \end{array}\end{displaymath}

The preceding three equations can be put into matrix form:

\begin{displaymath}
\begin{array}
{rcl}
\left[ 
\frac{\textstyle \partial E_p}{\...
 ...& w_{36}\\ w_{45} & w_{46}\\ \end{array}\right]^T\\ \end{array}\end{displaymath}

Hence the accumulated derivatives of E with respect to ${\bf X}_1$ are

\begin{displaymath}
\begin{array}
{rcl}
\frac{\textstyle \partial E}{\textstyle ...
 ...l E}{\textstyle \partial {\bf X}_2}
*{\bf W}_2^T.\\ \end{array}\end{displaymath}

The preceding equation corresponds to line 77 (or so) of rbfn.m:
dE_dX1 = dE_dX2*W';

The derivative of layer 1's output with respective to the standard deviations are

\begin{displaymath}
\frac{\textstyle dx_3}{\textstyle d \sigma_1} =
\exp
\left(-...
 ...extstyle \Vert {\bf x}-{\bf c}_1\Vert^2}{\textstyle \sigma_1^3}\end{displaymath}

Similarly,

\begin{displaymath}
\frac{\textstyle dx_4}{\textstyle d \sigma_2} =
x_4 \frac{\textstyle \Vert {\bf x}-{\bf c}_2\Vert^2}{\textstyle \sigma_2^3}.\end{displaymath}

\begin{displaymath}
\frac{\textstyle dx_5}{\textstyle d \sigma_2} =
x_5 \frac{\textstyle \Vert {\bf x}-{\bf c}_3\Vert^2}{\textstyle \sigma_3^3},\end{displaymath}

The preceding three equations can be put into a matrix format:

\begin{displaymath}
\left[
\begin{array}
{ccc}
\frac{\textstyle \partial x_3}{\t...
 ...}-{\bf c}_3\Vert}{\textstyle \sigma_3^3} \\  \end{array}\right]\end{displaymath}

Therefore we have

\begin{displaymath}
\begin{array}
{rcl}
\frac{\textstyle d {\bf X}_1}{\textstyle...
 ... 1}{\textstyle \sigma_3^3} \\  \end{array}\right]\\ \end{array}\end{displaymath}

This corresponds to line 78 of rbfn.m:
dX1_dSigma = X1.*(dist.^2*diag(SIGMA.^(-3)));

The derivative of Ep with respect to the standard deviatins are

\begin{displaymath}
\frac{\textstyle \partial E_p}{\textstyle \partial \mbox{\bo...
 ...partial x_5}{\textstyle \partial \sigma_3}\\ \end{array}\right]\end{displaymath}

Hence  
 \begin{displaymath}
\begin{array}
{rcl}
\frac{\textstyle \partial E}{\textstyle ...
 ...yle \partial \mbox{\boldmath$\sigma$}}
\right)^T,\\ \end{array}\end{displaymath} (1)
where sum denotes the operation of summing each column to produce a row vector. The preceding equation corresponds to line 80 of rbfn.m:
dE_dSigma = sum(dE_dX1.*dX1_dSigma)';

Now we are moving toward the final step: to calculate the derivative of E with respect to the centers of the gaussians. Since $x_3 = \exp \left(-\frac{\textstyle \Vert{\bf x}-{\bf c}_1\Vert}
{\textstyle 2\sigma_1^2} \right)$, the derivative of x3 with respect to ${\bf c}_1= [c_{11} \; c_{12}]$ are

\begin{displaymath}
\begin{array}
{l}
\frac{\textstyle \partial x_3}{\textstyle ...
 ...ac{\textstyle x_2-c_{12}}{\textstyle \sigma_2^2}\\  \end{array}\end{displaymath}

Hence

\begin{displaymath}
\left[
\begin{array}
{cc}
\frac{\textstyle \partial E_p}{\te...
 ...yle x_3}{\textstyle \sigma_1^2} (x_2-c_{12})\end{array}\right].\end{displaymath}

Similarly, we have

\begin{displaymath}
\begin{array}
{rcl}
\left[
\begin{array}
{cc}
\frac{\textsty...
 ...x_5 \\ \end{array}\right]
\right)
*
{\bf C}
\right\}\end{array}\end{displaymath}

Therefore  
 \begin{displaymath}
\begin{array}
{rcl}
\frac{\textstyle \partial E}{\textstyle ...
 ...,p} \\ \end{array}\right]
\right)
*
{\bf C}
\right\}\end{array}\end{displaymath} (2)

The first term in the curly braces can be further simplified:

\begin{displaymath}
\begin{array}
{rcl}
\sum_{p=1}^{100}
\left[
\begin{array}
{c...
 ...ial {\bf X}_1} .* {\bf X}_1
\right)^T
*{\bf X}_1 \\ \end{array}\end{displaymath}

The second term in the curly brace of Equation (2) can be simplified similarly as what we have done in Equation (1), which leads to

\begin{displaymath}
diag
\left(
\sum_{p=1}^{100}
\left[
\begin{array}
{ccc}
\fra...
 ...tyle \partial {\bf X}_1}
.*
{\bf X}_1
\right)
\right)
*{\bf C}.\end{displaymath}

Consequently, Equation (2) can be simplified as follows:

\begin{displaymath}
\frac{\textstyle \partial E}{\textstyle \partial {\bf C}} 
=...
 ...tial {\bf X}_1} 
.*
{\bf X}_1
\right)
\right)
*{\bf C}
\right\}\end{displaymath}

The preceding equation corresponds to line 81 (or so) of rbfn.m:

dE_dCenter=diag(SIGMA.^(-2))*((dE_dX1.*X1)'*X0-diag(sum(dE_dX1.*X1))*CENTER);


 
next up previous
Next: About this document ...
J.-S. Roger Jang
11/26/1997