Spoken Language Identification (LID) is broadly defined as recognizing the language of a given speech utterance. It has numerous applications in automated language and speech recognition,multilingual machine translations, speech-to-speech translations, and emergency call routing. Inthis homework, we will try to classify three languages (English, Hindi and Mandarin) from the spoken utterances that have been crowd-sourced from the class.
mapping = {'english': 0, 'hindi': 1, ' mandarin': 2}
Extract MFCC features from audio files, build up Recurrent Neural Network (GRU/LSTM) to train the model to output 3-class probability.
Mark silence audios with label -1, and omit them both in loss and accuracy measurement.
After training from scratch and dueling with overfitting, my model performed ~90% validation accuracy.