Recurrent Neural Networks
These notes are gathered from various places. When I can, I give credits and links, but even if I don't, they are certainly not original ideas.
Sequence Learning in RNNs
A example to sequence is a set of words in English. Sequence learning and transforming allows computers to translate this sequence to another language.
Or if no target exists, RNNs predict next element in a sequence. The prediction blurs the line between supervised and unsupervised learning.
Models with State
Autoregressive models.
$$input_t = input_{t-1} + input_{t-2} + \ldots$$
With hidden states, it's much easier to achieve more complex tasks.
$$input_t = hidden + input_{t-1} + input_{t-2} + \ldots$$
Hidden state is nonlinear.
Similarity to Quantum Mechanics
In Feed Forward Neural Nets, hidden state is not observable. Similar to Quantum state?
Two Earlier Models
There are two general types of models worth mentioning.
Linear Dynamical Systems
Used in engineering. The system state is always linear, therefore Kalman filtering is utilized.
Hidden Markov Models
Stochastic models, with discrete states that keep log(N) bits for N states. HMMs have efficient learning and prediction algorithms.
An important limitation of HMMs is their memory. They can keep log(N) bits of information and for a full-fledged linguistic application, we need at least 100 bits to keep as state and this means 2^{100} states. Infeasible.
Difference of RNNs from HMMs and LDSs
Unlike HMMs, RNNs have distributed hidden state and non-linear complex hidden units. Also they are deterministic
Behavior of RNN states - Oscillate. Good for motor control? - Settle to point attractors. Good for retrieving memories? - Chaotic. Bad for information processing.
RNNs could learn to implement lots of small programs to run in parallel.
A disadvatage of RNNs: RNNs are hard to train.