No Title

CS5652: Artificial Neural Networks
Implementation of rbfn.m

**Figure 1:** *A 1-3-2 RBFN.*
$\begin{figure} \centerline{ \psfig {figure=rbfn.eps,width=.8\hsize} }\end{figure}$

For simplicity, suppose that we are dealing with a 2-3-2 radial basis function network, and we have 100 input-output data pairs as the training data set. The training data set can be represented by a $100 \times 4$ matrix:

$\begin{displaymath} \left[ \begin{array} {cccc} x_{1,1} & x_{2,1} & t_{6,1} & t_... ...,100} & x_{2,100} & t_{6,100} & t_{7,100} \\ \end{array}\right]\end{displaymath}$

where the input part is denoted by ${\bf X}_0$ :

$\begin{displaymath} {\bf X}_0 = \left[ \begin{array} {cc} x_{1,1} & x_{2,1} \\ \... ... \vdots & \vdots \\ x_{1,100} & x_{2,100} \\ \end{array}\right]\end{displaymath}$

and the output (target) part is denoted by ${\bf T}$ :

$\begin{displaymath} {\bf T}= \left[ \begin{array} {cc} t_{6,1} & t_{7,1} \\ \vdo... ... \vdots & \vdots \\ t_{6,100} & t_{7,100} \\ \end{array}\right]\end{displaymath}$

For discussion convenience, we shall also defined ${\bf X}_1$ and ${\bf X}_2$ as the node outputs for layer 1 and 2, respectively:

$\begin{displaymath} {\bf X}_1 = \left[ \begin{array} {ccc} x_{3,1} & x_{4,1} & ... ...dots \\ x_{3,100} & x_{4,100} & x_{5,100} \\ \end{array}\right]\end{displaymath}$

$\begin{displaymath} {\bf X}_2 = \left[ \begin{array} {cc} x_{6,1} & x_{7,1} \\ ... ... \vdots & \vdots \\ x_{6,100} & x_{7,100} \\ \end{array}\right]\end{displaymath}$

The centers of the Gaussian functions in the second layer can be expressed as

$\begin{displaymath} {\bf C}= \left[ \begin{array} {cc} c_{11} & c_{12} \\ c_{21} & c_{22} \\ c_{31} & c_{32} \\ \end{array}\right],\end{displaymath}$

where each row represents a center point in a two-dimensional space. The standard deviation of the Gaussian functions in the first layer can be expressed as

$\begin{displaymath} {\bf C}= \left[ \begin{array} {c} \sigma_1 \\ \sigma_2 \\ \sigma_3 \\ \end{array}\right],\end{displaymath}$

where each element represents a standard deviation of a gaussian functions in the second layer.

The parameters ${\bf W}$ for the second layer can be defined as follows:

$\begin{displaymath} {\bf W}= \left[ \begin{array} {cc} w_{36} & w_{37} \\ w_{46} & w_{47} \\ w_{56} & w_{57} \\ \end{array}\right]\end{displaymath}$

The equations for computing the output of the first layer are

$\begin{displaymath} \begin{array} {rcl} x_3 & = & \exp\left(-\frac{\textstyle \V... ...bf c}_3\Vert^2} {\textstyle 2\sigma_3^2}\right)\\ \end{array},\end{displaymath}$

where ${\bf x}=[x_1, x_2]^T$ and ${\bf c}_i = [c_{i1}, c_{i2}]^T$ , i=1,2,3. After plugging 100 inputs into the preceding equation, we have

$\begin{displaymath} {\bf X}_1 = \left[ \begin{array} {ccc} x_{3,1} & x_{4,1} & ... ...\Vert^2}{\textstyle 2\sigma_3^2} \\ \end{array}\right] \right),\end{displaymath}$

where ${\bf x}_p = [x_{1,p}, x_{2,p}]^T$ is the pth input vector. The preceding expression can be further simplified into the following expression:

$\begin{displaymath} {\bf X}_1 = \exp \left( \left[ \begin{array} {ccc} \Vert{\b... ...xtstyle 1}{\textstyle 2\sigma_3} \\ \end{array}\right] \right)\end{displaymath}$

This corresponds to line 63 (or so) in rbfn.m:

X1 = \exp(-(dist.^2)*diag(1./(2*SIGMA.^2)));

where dist is the $100 \times 3$ distance matrix between the 100 input vectors and 3 gaussian centers.

The equations for computing the output of the second layer are

$\begin{displaymath} \begin{array} {ccc} x_6 & = & x_3 w_{36} + x_4 w_{46} + x_5 ... ...\\ x_7 & = & x_3 w_{37} + x_4 w_{47} + x_5 w_{57}\\ \end{array}\end{displaymath}$

or equivalently,

$\begin{displaymath}[x_6 \; x_7] = [x_3 \; x_4 \; x_5] * \left[ \begin{array} {cc... ...37} \\ w_{46} & w_{47} \\ w_{56} & w_{57} \\ \end{array}\right]\end{displaymath}$

After plugging 100 data entries into the preceding equation, we have

$\begin{displaymath} \left[ \begin{array} {cc} x_{6,1} & x_{7,1} \\ \vdots & \vdo... ...7} \\ w_{46} & w_{47} \\ w_{56} & w_{57} \\ \end{array}\right],\end{displaymath}$

or equivalently,

$\begin{displaymath} {\bf X}_2 = {\bf X}_1*{\bf W}.\end{displaymath}$

The preceding equation corresponds to line 64 (or os) of rbfn.m:

X2 = X1*W;

The instantaneous error measure for the pth data pair is defined by

E_p = (t_6,p-x_6,p)² + (t_7,p-x_7,p)²,

where t_6,p and t_7,p are the pth target outputs; x_6,p and x_7,p are the pth network outputs. The derivative of the above instantaneous error measure with respect to the network outputs is written as

$\begin{displaymath} \frac{\textstyle \partial E_p}{\textstyle \partial {\bf X}_2... ...t_{6,p} - x_{6,p}) & -2(t_{7,p} - x_{7,p})\\ \end{array}\right]\end{displaymath}$

We can stack the above equation for each p to obtain the following matrix expression:

$\begin{displaymath} \begin{array} {rcl} \frac{\textstyle \partial E}{\textstyle ... ...nd{array} \right] \right) = -2 ({\bf T}- {\bf X}_2),\end{array}\end{displaymath}$

where ${\bf X}_2$ is the actual output of the MLP. The preceding equation corresponds to line 56 of rbfn.m:

dE_dX2 = -2*(T - X2);

Now we can compute the derivatives of E_p with respect to the second-layer's weights. The derivatives of E_p with respect to the parameters of node 6 are

$\begin{displaymath} \begin{array} {rllll} \frac{\textstyle \partial E_p}{\textst... ...style \partial E_p}{\textstyle \partial x_6} x_5 \\ \end{array}\end{displaymath}$

The derivatives of E_p with respect to the parameters of node 7 are

$\begin{displaymath} \begin{array} {rllll} \frac{\textstyle \partial E_p}{\textst... ...style \partial E_p}{\textstyle \partial x_7} x_5 \\ \end{array}\end{displaymath}$

We can combine the above eight equations to have the following concise expression:

$\begin{displaymath} \frac{\textstyle \partial E_p}{\textstyle \partial {\bf W}_2... ...style \partial E_p}{\textstyle \partial x_6}\end{array}\right].\end{displaymath}$

Therefore the accumulated gradient vector is

$\begin{displaymath} \begin{array} {rcl} \frac{\textstyle \partial E}{\textstyle ... ...style \partial E}{\textstyle \partial {\bf X}_2}.\\ \end{array}\end{displaymath}$

The preceding equation corresponds to line 74 (or so) of rbfn.m:

dE_dW = X1'*dE_dX2;

For derivatives of E_p with respect to x₃, we have

$\begin{displaymath} \begin{array} {rcl} \frac{\textstyle \partial E_p}{\textstyl... ...style \partial E_p}{\textstyle \partial x_7} w_{37} \end{array}\end{displaymath}$

Similarly, we have

$\begin{displaymath} \begin{array} {rcl} \frac{\textstyle \partial E_p}{\textstyl... ...style \partial E_p}{\textstyle \partial x_7} w_{57} \end{array}\end{displaymath}$

The preceding three equations can be put into matrix form:

$\begin{displaymath} \begin{array} {rcl} \left[ \frac{\textstyle \partial E_p}{\... ...& w_{36}\\ w_{45} & w_{46}\\ \end{array}\right]^T\\ \end{array}\end{displaymath}$

Hence the accumulated derivatives of E with respect to ${\bf X}_1$ are

$\begin{displaymath} \begin{array} {rcl} \frac{\textstyle \partial E}{\textstyle ... ...l E}{\textstyle \partial {\bf X}_2} *{\bf W}_2^T.\\ \end{array}\end{displaymath}$

The preceding equation corresponds to line 77 (or so) of rbfn.m:

dE_dX1 = dE_dX2*W';

The derivative of layer 1's output with respective to the standard deviations are

$\begin{displaymath} \frac{\textstyle dx_3}{\textstyle d \sigma_1} = \exp \left(-... ...extstyle \Vert {\bf x}-{\bf c}_1\Vert^2}{\textstyle \sigma_1^3}\end{displaymath}$

Similarly,

$\begin{displaymath} \frac{\textstyle dx_4}{\textstyle d \sigma_2} = x_4 \frac{\textstyle \Vert {\bf x}-{\bf c}_2\Vert^2}{\textstyle \sigma_2^3}.\end{displaymath}$

$\begin{displaymath} \frac{\textstyle dx_5}{\textstyle d \sigma_2} = x_5 \frac{\textstyle \Vert {\bf x}-{\bf c}_3\Vert^2}{\textstyle \sigma_3^3},\end{displaymath}$

The preceding three equations can be put into a matrix format:

$\begin{displaymath} \left[ \begin{array} {ccc} \frac{\textstyle \partial x_3}{\t... ...}-{\bf c}_3\Vert}{\textstyle \sigma_3^3} \\ \end{array}\right]\end{displaymath}$

Therefore we have

$\begin{displaymath} \begin{array} {rcl} \frac{\textstyle d {\bf X}_1}{\textstyle... ... 1}{\textstyle \sigma_3^3} \\ \end{array}\right]\\ \end{array}\end{displaymath}$

This corresponds to line 78 of rbfn.m:

dX1_dSigma = X1.*(dist.^2*diag(SIGMA.^(-3)));

The derivative of E_p with respect to the standard deviatins are

$\begin{displaymath} \frac{\textstyle \partial E_p}{\textstyle \partial \mbox{\bo... ...partial x_5}{\textstyle \partial \sigma_3}\\ \end{array}\right]\end{displaymath}$

Hence

$\begin{displaymath} \begin{array} {rcl} \frac{\textstyle \partial E}{\textstyle ... ...yle \partial \mbox{\boldmath$\sigma$}} \right)^T,\\ \end{array}\end{displaymath}$

(1)

where sum denotes the operation of summing each column to produce a row vector. The preceding equation corresponds to line 80 of rbfn.m:

dE_dSigma = sum(dE_dX1.*dX1_dSigma)';

Now we are moving toward the final step: to calculate the derivative of E with respect to the centers of the gaussians. Since $x_3 = \exp \left(-\frac{\textstyle \Vert{\bf x}-{\bf c}_1\Vert} {\textstyle 2\sigma_1^2} \right)$ , the derivative of x₃ with respect to ${\bf c}_1= [c_{11} \; c_{12}]$ are

$\begin{displaymath} \begin{array} {l} \frac{\textstyle \partial x_3}{\textstyle ... ...ac{\textstyle x_2-c_{12}}{\textstyle \sigma_2^2}\\ \end{array}\end{displaymath}$

Hence

$\begin{displaymath} \left[ \begin{array} {cc} \frac{\textstyle \partial E_p}{\te... ...yle x_3}{\textstyle \sigma_1^2} (x_2-c_{12})\end{array}\right].\end{displaymath}$

Similarly, we have

$\begin{displaymath} \begin{array} {rcl} \left[ \begin{array} {cc} \frac{\textsty... ...x_5 \\ \end{array}\right] \right) * {\bf C} \right\}\end{array}\end{displaymath}$

Therefore

$\begin{displaymath} \begin{array} {rcl} \frac{\textstyle \partial E}{\textstyle ... ...,p} \\ \end{array}\right] \right) * {\bf C} \right\}\end{array}\end{displaymath}$ (2)

The first term in the curly braces can be further simplified:

$\begin{displaymath} \begin{array} {rcl} \sum_{p=1}^{100} \left[ \begin{array} {c... ...ial {\bf X}_1} .* {\bf X}_1 \right)^T *{\bf X}_1 \\ \end{array}\end{displaymath}$

The second term in the curly brace of Equation (2) can be simplified similarly as what we have done in Equation (1), which leads to

$\begin{displaymath} diag \left( \sum_{p=1}^{100} \left[ \begin{array} {ccc} \fra... ...tyle \partial {\bf X}_1} .* {\bf X}_1 \right) \right) *{\bf C}.\end{displaymath}$

Consequently, Equation (2) can be simplified as follows:

$\begin{displaymath} \frac{\textstyle \partial E}{\textstyle \partial {\bf C}} =... ...tial {\bf X}_1} .* {\bf X}_1 \right) \right) *{\bf C} \right\}\end{displaymath}$

The preceding equation corresponds to line 81 (or so) of rbfn.m:

dE_dCenter=diag(SIGMA.^(-2))*((dE_dX1.*X1)'*X0-diag(sum(dE_dX1.*X1))*CENTER);

About this document ...

Next: About this document ...

J.-S. Roger Jang
11/26/1997