Recursive Neural Networks

Posted on Fri 10 October 2014 in old • 2 min read

These notes are gathered from various places. When I can, I give credits and links, but even if I don't, they are certainly not original ideas.

Sequence Learning in RNNs

A example to sequence is a set of words in English. Sequence learning and transforming allows computers to translate this sequence to another language.

Or if no target exists, RNNs predict next element in a sequence. The prediction blurs the line between supervised and unsupervised learning.

Models with State

Autoregressive models.

\begin{equation*} input_t = input_{t-1} + input_{t-2} + \ldots \end{equation*}

With hidden states, it's much easier to achieve more complex tasks.

\begin{equation*} input_t = hidden + input_{t-1} + input_{t-2} + \ldots \end{equation*}

Hidden state is nonlinear.

Similarity to Quantum Mechanics

In Feed Forward Neural Nets, hidden state is not observable. Similar to Quantum state?

Two Earlier Models

There are two general types of models worth mentioning.

Linear Dynamical Systems

Used in engineering. The system state is always linear, therefore Kalman filtering is utilized.

Hidden Markov Models

Stochastic models, with discrete states that keep log(N) bits for N states. HMMs have efficient learning and prediction algorithms.

An important limitation of HMMs is their memory. They can keep log(N) bits of information and for a full-fledged linguistic application, we need at least 100 bits to keep as state and this means 2^{100} states. Infeasible.

Difference of RNNs from HMMs and LDSs

Unlike HMMs, RNNs have distributed hidden state and non-linear complex hidden units. Also they are deterministic

Behavior of RNN states - Oscillate. Good for motor control? - Settle to point attractors. Good for retrieving memories? - Chaotic. Bad for information processing.

RNNs could learn to implement lots of small programs to run in parallel.

A disadvatage of RNNs: RNNs are hard to train.