-
Notifications
You must be signed in to change notification settings - Fork 7
/
Copy pathreadme.txt
37 lines (31 loc) · 1.67 KB
/
readme.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
Kaldi Aligner: A simple script to create time alignment for given speech/transcription pairs.
This script also enrich the transcription using [laughter] and [noise] markers.
It does not use the forced-alignment instead it creates a bigram LM using the input transcription (after enriching the transcription with markers).
After creating a language model it create an HCLG graph and use Kaldi decoder to generate a lattice and finally use the lattice to obtain alignment information.
requirements:
1- Kaldi tool. From: https://github.com/kaldi-asr/kaldi
2- SRILM (also existed under Kaldi/tools). From: http://www.speech.sri.com/projects/srilm/
3- Python 3
4- bash (only tested under Linux)
After installing Kaldi and SRILM:
open path.sh and update export KALDI_ROOT=/home/amir/Projects/kaldi to your kaldi path.
Also make sure SRILM binaries (specifically ngram-count) is in the PATH.
Before running the aligner:
Download the pretrained Aspire chain model by running:
sh sownload_extract.sh
This script downloads the model and also run some commands for preparation.
Alternatively, you can train your own model. However, you might need to update the scripts accordingly.
Example:
bash align.sh example/trans.txt example/test.wav data/lang_chain/ out.ctm out_phone.ctm out_transid_seq.txt lpf.txt
cat out.ctm
test.wav 1 0.070 0.840 [noise]
test.wav 1 0.910 0.320 my
test.wav 1 1.240 0.300 name
test.wav 1 1.540 0.340 is
test.wav 1 1.880 0.300 [noise]
test.wav 1 2.180 0.780 [laughter]
test.wav 1 2.960 0.600 [noise]
test.wav 1 3.630 0.360 amir
test.wav 1 4.000 0.480 <unk>
test.wav 1 4.510 1.610 [noise]
Notice OOVs are replaced with <unk>. The scripts adds [noise]/[laughter] markers when needed.