Dependencies

Python 3.7.2, requests, numpy, nltk

1. STT Error Dataset

cd make_stterror_data
python main.py --data_dir data/intent_chatbot/

Output:
- TTS audios, STT recovered texts, BLEU scores
- The stterror_data/chatbot/ directory was organized in such way to separate train and test in each TTS-STT combination

Examples of sentences with STT error

Corpus	TTS	iBLEU	Original	With STT error
Chatbot	gtts	0.4376	"how can i get from garching to milbertshofen?"	"how can i get from garching to melbourne open."
Chatbot	macsay	0.5042	"how can i get from garching to milbertshofen?"	"how can i get from garching to meal prep."

STT: Wit.ai

iBLEU = 1 - BLEU

Examples of sentences with Natural Human error

Original	With Error
"goonite sweet dreamz"	"Good night, sweet dreams."
"well i dunno..i didnt give him an ans yet"	"Well I don't know, I didn't give him an answer yet."
"u kno who am i talkin bout??"	"Do you know who I am talking about?"

In order for the model to be robust to missing data it also needs to be trained on sentences with missing words.
After making the incomplete dataset, there are two options
1. Make dataset with Complete and Incomplete Data
```
python make_joint_comp_inc_data.py
```
2. Make dataset with Incomplete Data (add target sentence to tsv file to train autoencoder)
```
python add_target_to_inc_data.py
```