Neural Networks

Model Representation I

How we represent our hypothesis or how we represent our model when using neural networks?

neurons are cells in the brain
neuron has a cell body
neuron has a number of input wires( dendrites, receive inputs from other locations)
has an output wire ( Axon)

Communicate with little pulses of electricity (also called spikes)

Doing computations and passing messages to other neurons as a result of what other inputs they’ve got.

-Do Computation

output some value on this output wire, or

in the biological neuron, this is an axon

where it gets a number of inputs, x1,

x2, x3 and it outputs some value computed.

x0, the bias unit or the bias neuron

Sigmoid(logestic) activiation function( $g(z)=\frac{1}{1+e^{-z}}$ )

A simplistic representation looks like:

$[x_0x_1x_2]\rightarrow[\;]\rightarrow h_\theta(x)$

( $h_\theta(x)=\frac{1}{1+e^{-\theta^TX}}$ )

“input layer”(layer 1) $\rightarrow$ layer 2 $\rightarrow$ "output layer"(layer 3)

have intermediate layers of nodes between the input and output layers called the "hidden layers"

“weights”(权重)=“parameter”

notation:

? $a_i^{(j)}$ :第j层的第i个神经元或单元的激活项

? $\Theta^{(j)}$ :权重矩阵，控制从第j层到第j+1层的映射

If we have one hidden layer, it would look like:

$[x_0x_1x_2x_3]\rightarrow[a_1^{(2)}a_2^{(2)}a_3^{(2)}]\rightarrow h_\theta(x)$

The values for each of the “activation” nodes is obtained as follows:

? $a_1^{(2)}=g(\Theta_{10}^{(1)}x_0+\Theta_{11}^{(1)}x_1+\Theta_{12}{(1)}x_2+\Theta_{13}^{(1)}x_3)$

? $a_2^{(2)}=g(\Theta_{20}^{(1)}x_0+\Theta_{21}^{(1)}x_1+\Theta_{22}{(1)}x_2+\Theta_{23}^{(1)}x_3)$

? $a_3^{(2)}=g(\Theta_{30}^{(1)}x_0+\Theta_{31}^{(1)}x_1+\Theta_{32}{(1)}x_2+\Theta_{33}^{(1)}x_3)$

$h_\Theta(x)=a_1^{(3)}=g(\Theta_{10}^{(2)}a_0^{(2)}+\Theta_{11}^{(2)}a_1^{(2)}+\Theta_{12}^{(2)}a_2^{(2)}+\Theta_{13}^{(2)}a_3^{(2)})$

如果一个网络在第j层有 $s_j$ 个单元，在j+1层有 $s_{j+1}$ 个单元，那么矩阵 $\Theta_j$ 即控制第j层到第j+1层映射的矩阵，它的维度是** $s_{j+1}\times s_{j}+1$ **

Model Representation II

Vectorize

$a_1^{(2)}=g(z_1^{(2)})$

$a_2^{(2)}=g(z_2^{(2)})$

$a_3^{(2)}=g(z_3^{(2)})$

for layer j=2 and node k, the variable z will be:

$z_k^{(2)}=\Theta_{k,0}^{(1)}x_0+\Theta_{k,1}^{(1)}x_1+...+\Theta_{k,n}^{(1)}x_n$

vector repretation of x and $z^j$ :
$x=\begin{bmatrix}x_0\\x_1\\...\\x_n\end{bmatrix}\quad z^{(j)}=\begin{bmatrix} z^{(j)}_1\\z^{(j)}_2\\...\\z^{(j)}_n\end{bmatrix}$
setting $x=a^{(1)}$ , we can write the equation as:

$z^{(j)}=\Theta^{(j-1)}a^{(j-1)}$

$\Theta^{(j-1)}$ is matrix with dimensions $s_j\times (n+1)$ (where $s_j$ is the number of our activation nodes) by our vector $a^{(j-1)}$ with height (n+1). This gives us our vector $z^{(j)}$ with height $s_j$

for layer j:

$a^{(j)}=g(z^{(j)})$ , g can be applied element-wise to vector $z^{(j)}$

After we have computed $a^{(j)}$ , we can add a bias unit(equals to 1) to layer j.

add $a^{(j)}_0$ =1

first compute another $z$ vector: $z^{(j+1)}=\Theta^{(j)}a^{(j)}$

get the final $z$ vector by multiplying the next theta matrix after $\Theta^{(j-1)}$ with the values of all the activation nodes we just got. This last theta matrix $\Theta^{(j)}$ will have only one row which is multiplied by one column$ a^{(j)}$ so that our result is a single number. We then get our final result with: