These notes are gathered from various places. When I can, I give credits and links, but even if I don't, they are certainly not original ideas.

# Sequence Learning in RNNs

A example to sequence is a set of words in English. Sequence learning and transforming allows computers to translate this sequence to another language.

Or if no target exists, RNNs predict next element in a sequence. The prediction blurs the line between supervised and unsupervised learning.

## Models with State

Autoregressive models.

$$input_t = input_{t-1} + input_{t-2} + \ldots$$

With hidden states, it's much easier to achieve more complex tasks.

$$input_t = hidden + input_{t-1} + input_{t-2} + \ldots$$

Hidden state is nonlinear.

## Similarity to Quantum Mechanics

In Feed Forward Neural Nets, hidden state is not observable. Similar to Quantum state?

## Two Earlier Models

There are two general types of models worth mentioning.

### Linear Dynamical Systems

Used in engineering. The system state is always linear, therefore Kalman filtering is utilized.

### Hidden Markov Models

Stochastic models, with discrete states that keep log(N) bits for N states. HMMs have efficient learning and prediction algorithms.

An important *limitation* of HMMs is their *memory.* They can keep
log(N) bits of information and for a full-fledged linguistic
application, we need at least 100 bits to keep as state and this means
2^{100} states. Infeasible.

### Difference of RNNs from HMMs and LDSs

Unlike HMMs, RNNs have distributed hidden state and non-linear complex
hidden units. Also they are *deterministic*

Behavior of RNN states - Oscillate. Good for motor control? - Settle to point attractors. Good for retrieving memories? - Chaotic. Bad for information processing.

RNNs could learn to implement lots of small programs to run in parallel.

A disadvatage of RNNs: **RNNs are hard to train**.