Speech Translation/Alignment

This is the implementation of our NAACL paper titled : [An Attentional Model for Speech Translation Without Transcription] (http://aclweb.org/anthology/N/N16/N16-1109.pdf)

If you use this code, please cite the paper

@InProceedings{duong-EtAl:2016:N16-1,
  author    = {Duong, Long  and  Anastasopoulos, Antonios  and  Chiang, David  and  Bird, Steven  and  Cohn, Trevor},
  title     = {An Attentional Model for Speech Translation Without Transcription},
  booktitle = {Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
  month     = {June},
  year      = {2016},
  address   = {San Diego, California},
  publisher = {Association for Computational Linguistics},
  pages     = {949--959},
  url       = {http://www.aclweb.org/anthology/N16-1109}
}

Getting started

This implementation is based on [C++ neural network library (CNN)] (https://github.com/clab/cnn) with development version of [Eigen] (http://eigen.tuxfamily.org/). Note: you also need boost for this. I made some modifications with CNN and release with the code.

sudo apt-get install boost
hg clone https://bitbucket.org/eigen/eigen/ 
git clone https://github.com/longdt219/speechtranslation.git
cd speechtranslation	
mkdir build
cd build
cmake .. -DEIGEN3_INCLUDE_DIR=../eigen
make -j 10

Experiment with Phone - Word

This is experiment directly from phone sequence to word. Similar with machine translation problem. The data format is

<s> source phone </s> ||| <s> target words </s>

We attached a tiny training data for the demo purposes.

Train the attentional model

./build/attentional_model/attentional --train data/train.attentional --devel data/dev.attentional --lstm --bidirectional -a 32 --hidden 32 --parameters model.phone --epochs 50 --coverage 0.05 --trainer sgd --layers 4 --giza --smoothsm 0.1

Some options :

parameters: periodically save the parameters to this file so that learning can be resumed
lstm: use LSTM for RNN (other options are: GRU and RNN) where GRU use Gated-Recurrent Unit
coverage: use the coverage penalty described in the paper
layers n: stack n layers of lstm on the target
giza: use giza features described in the paper
smoothsm: use smoothing softmax function described in the paper
help: display the detail of other options.

Output the translation and retrieval task

We need to initialise with the trained model and use the test data instead of dev data.

./build/attentional_model/attentional --train data/train.attentional --devel data/test.attentional --lstm --bidirectional -a 32 --hidden 32 --initialise model.phone --epochs 50 --coverage 0.05 --trainer sgd --layers 4 --giza --smoothsm 0.1 --translation

The output will be the translation on test, first 200 trainning sentences and some output for retrieval task.

Use the attentional model as reranker

Need to extract the first 100 hypothesis from Moses, assuming in the file data/rescore.pairs

./build/attentional_model/attentional --train data/train.attentional --devel data/test.attentional --lstm --bidirectional -a 32 --hidden 32 --initialise model.phone --epochs 50 --coverage 0.05 --trainer sgd --layers 4 --giza --smoothsm 0.1 --rescore --test data/rescore.pairs

The model will score each pair and add the score at the end. The final translation will be the candidate having lowest score.

Experiment with Speech - Word

Extract speech features

We use [SPRACHcore] (http://www1.icsi.berkeley.edu/~dpwe/projects/sprach/sprachcore.html) to extract plp features from speech file with the following options. ./feacalc -hpfilter 100 -dither -domain cepstra -deltaorder 2 -plp 12 -sr 16000 -opformat ascii -o OUTPUTFILE INPUTFILE

Obviously, the sample rate (-sr) will be different based on your data.

Training the model directly from speech signal

For demo, we added a tiny data extracted from TIMIT in data folder

./build/attentional_model/attentional_plp --ttrain data/text/ --strain data/plp/ --lstm --bidirectional --align 32 --hidden 32 --parameters model.speech.plp --epochs 50 --coverage 0.05 --trainer sgd --layers 4 --giza --pyramid --smoothsm 0.1 --split data.split

Some options :

strain: source folder storing all plp files. One plp file represent one speech sentence.
ttrain: target folder where each file is a translation or transcription of the speech sentence. Note that files in strain and ttrain should have the same ID.
pyramid: use the pyramidal structure described in the paper
split: a file specify the data split. It will have 3 lines, each line specify list of files for Train, Dev and Test

Testing

Show the translation from the trained model for test data (and some train data).

./build/attentional_model/attentional_plp --ttrain data/text/ --strain data/plp/ --lstm --bidirectional --align 32 --hidden 32 --initialise model.speech.plp --epochs 50 --coverage 0.05 --trainer sgd --layers 4 --giza --pyramid --smoothsm 0.1 --split data.split --translation

Others

There are several useful debugging/outputting options, for example

display: to shows the alignment matrix in tikz format which can be imported to (la)tex ./build/attentional_model/attentional_plp --ttrain data/text/ --strain data/plp/ --lstm --bidirectional --align 32 --hidden 32 --initialise model.speech.plp --epochs 50 --coverage 0.05 --trainer sgd --layers 4 --giza --pyramid --smoothsm 0.1 --split data.split --display
verbose: output intermediate alignments/translations.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
attentional_model		attentional_model
cmake		cmake
cnn		cnn
data		data
examples		examples
CMakeLists.txt		CMakeLists.txt
README.md		README.md
config.h.cmake		config.h.cmake
data.split		data.split

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech Translation/Alignment

Getting started

Experiment with Phone - Word

Train the attentional model

Output the translation and retrieval task

Use the attentional model as reranker

Experiment with Speech - Word

Extract speech features

Training the model directly from speech signal

Testing

Others

About

Releases

Packages

Languages

longdt219/speechtranslation

Folders and files

Latest commit

History

Repository files navigation

Speech Translation/Alignment

Getting started

Experiment with Phone - Word

Train the attentional model

Output the translation and retrieval task

Use the attentional model as reranker

Experiment with Speech - Word

Extract speech features

Training the model directly from speech signal

Testing

Others

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages