Name		Name	Last commit message	Last commit date
parent directory ..
notebooks		notebooks
plots		plots
README.md		README.md
dino_names.txt		dino_names.txt

README.md

Character Level RNN (Char RNN)

The fundamental basis of any language is its alphabet. Hence character level RNN(s) are the most basic form of recurrent neural networks and we have implemented them from scratch. Also, we have used the RNN, LSTM and GRU layers available in PyTorch for comparison.

All the models are trained on the Dino Names dataset which contains 1536 dinosaur names. Data splitting and training parameters are mentioned below.

Parameter	Value
Training Set	95% (1460/1536)
Testing Set	3% (46/1536)
Validation Set	2% (30/1536)
Number of Epochs	40
Learning Rate	4x10^-4
Hidden Dimensions	64
Loss Function	Cross Entropy Loss
Optimizer	AdamW

General Pipeline

Every character in the vocabulary(including the delimiter ".") is embedded using One-Hot vector representation.
A sequence of embeddings is fed to the model. This typically contains embeddings of all the characters of a word, except the delimiter.
The output of the model is validated against the ground truth, i.e. a sequence of embeddings of all the characters of the word (including the delimiter) except the first.
The training and validation losses are calculated and stochastic optimization is applied.
This is repeated for every word in the corpus.
The cumilitive training and validation losses for each epoch are recorded and plotted.

Architectures:

1. Char RNN from Scratch

This notebook implements a vanilla RNN, right from scratch using only basic linear layers and activation functions. This notebook aims at deepening the understanding and general implementation paradigm for recurrent nerual networks. It covers every step, right from data preprocessing to sampling from the trained model, using only basic Python and PyTorch functionalities.

2. Char RNN using Vanilla RNN Layer

This architecture employs the Vanilla RNN Layer available in PyTorch instead of writing the entire RNN model from scratch. This makes the model atleast 4 times faster owing to the optimized data-flow design of PyTorch's inbuilt layers. The increment in performance, is quite nominal due to the general drawbacks of Vanilla RNN models like vanishing gradients and short temporal memory spans.

3. Char RNN using LSTM Layer

In this notebook, we implement a single layered Long Short Term Memory (LSTM) network using PyTorch's inbuilt LSTM Layer. This network has more than 3 times more parameters than the Vanilla RNN. The final convergent loss does not change a lot but the results of sampling seem to make a lot more sense, i.e. appear to be closer to natural language and sound more sensible or natural.

4. Char RNN using GRU Layer

The final notebook in the comparison, uses Gated Recurrent Units (GRU) which are a more computationally efficient version of the LSTM units. The number of trainable parameters is almost 2.5 times that of the equivalent Vanilla RNN, but the performance is often at par with the LSTM network. This makes them a friendlier choice for large scale deployment.

Summary

Below is a table, summarising the number of parameters and the convergent loss achieved by each model.

Architecture	No. of Learnable Parameters	Training Loss	Validation Loss	Test Loss
RNN from scratch	10856	2.0110	1.8492	1.7026
Vanilla RNN	7707	1.8576	1.6835	1.5294
LSTM	25563	1.7426	1.6065	1.4923
GRU	19611	1.7202	1.5082	1.3935

Plots

Note:

In the notebooks (all except RNN from scratch), a parameter named "NUM_LAYERS" can be changed to change the number of stacked layers in the model. However, keeping in mind the miniscule amount of data available and the meagrely small objective at hand, using multi-layer architectures seems futile as the trade-off between performance and required computation, tilts more towards the computationally heavy side.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

char_rnns

char_rnns

README.md

Character Level RNN (Char RNN)

General Pipeline

Architectures:

1. Char RNN from Scratch

2. Char RNN using Vanilla RNN Layer

3. Char RNN using LSTM Layer

4. Char RNN using GRU Layer

Summary

Plots

Note:

Files

char_rnns

Directory actions

More options

Directory actions

More options

Latest commit

History

char_rnns

Folders and files

parent directory

README.md

Character Level RNN (Char RNN)

General Pipeline

Architectures:

1. Char RNN from Scratch

2. Char RNN using Vanilla RNN Layer

3. Char RNN using LSTM Layer

4. Char RNN using GRU Layer

Summary

Plots

Note: