- Mimic Recording Studio
- Recording Tips
- Providing your recording to Mycroft for training
- Contributions
- Where to get support and assistance
The Mycroft open source Mimic technologies are Text-to-Speech engines which take a piece of written text and convert it into spoken audio. The latest generation of this technology, Mimic 2, uses machine learning techniques to create a model which can speak a specific language, sounding like the voice on which it was trained.
The Mimic Recording Studio simplifies the collection of training data from individuals, each of which can be used to produce a distinct voice for Mimic.
git clone https://github.com/MycroftAI/mimic-recording-studio.git
cd mimic-recording-studio
start-windows.bat
- Docker (community edition is fine)
- Docker Compose
Why docker? To make this super easy to set up and run cross platforms.
-
git clone https://github.com/MycroftAI/mimic-recording-studio.git
-
cd mimic-recording-studio
-
docker-compose up
to build and run (Note: You may need to usesudo docker-compose up
depending on your distribution)Alternatively, you can build and run separately.
docker-compose build
thendocker-compose up
-
In your browser, go to
http://localhost:3000
Note:
The first execution of docker-compose up
will take a while as this command will also build the docker containers. Subsequent executions of docker-compose up
should be quicker to boot.
- python 3.5 +
- ffmpeg
cd backend/
pip install -r requirements.txt
python run.py
- node & npm
- create-react-app
- yarn - optional for faster build, install, and start
cd frontend/
npm install
, alternativelyyarn install
npm start
, alternativelyyarn start
Online, http://mimic.mycroft.ai hosted version requiring zero setup.
Audio is saved as WAV files to the backend/audio_file/{uuid}/
directory. The
backend automatically trims the beginning and ending silence for all WAV files
using ffmpeg.
Metadata is also saved to backend/audio_file/{uuid}/
. This file maps the WAV
file name to the phrase spoken. This along with the WAV files are what you
needed to get started on training Mimic 2.
For now, we have an English corpus, english_corpus.csv
made available which
can be found in backend/prompt/
. To use your own corpus follow these steps.
- Create a csv file in the same format as
english_corpus.csv
using tabs (\t
) as the delimiter. - Add your corpus to the
backend/prompt
directory. - Change the
CORPUS
environment variable indocker-compose.yml
to your corpus name.
If you wish to develop a corpus in a language other than English, then Mimic Recording Studio can be used to produce voice recordings for TTS voices in additional languages. If you are building a corpus in a language other than English, we encourage you to choose phrases which:
- occur in natural, everyday speech in the target language
- have a variety of string lengths
- cover a wide variety of phonemes (basic sounds)
IMPORTANT:
For now, you must reset the sqlite
database to use a new corpus. If you've
recorded on another corpus and would like to save that data, you can simply
rename your sqlite
db found in backend/db/
to another name. The backend will
detect that mimicstudio.db
is not there and create a new one for you. You may
continue recording data for your new corpus.
The web UI is built using Javascript and React and create-react-app as a scaffolding tool. Refer to CRA.md to find out more on how to use create-react-app.
- Record and play audio
- Generate audio visualization
- Calculate and display metrics
The web service is built using Python, Flask as the backend framework, gunicorn as a http webserver, and sqlite as the database.
- Process audio
- Serves corpus and metrics data
- Record info in database
- Record data to the file system
Docker is used to containerize both applications. By default, the frontend uses
network port 3000
while the backend uses networking port 5000
. You can
configure these in the docker-compose.yml
file.
NOTE: If you are running docker-registry
, this runs by default on port 5000
, so you will need to change which port you use.
Creating a voice requires an achievable, but significant effort. An individual will need to record 15,000 - 20,000 phrases. In order to get the best possible Mimic voice, the recordings need to be clean and consistent. To that end, follow these recommendations:
- Record in a quiet environment with noise-dampening material. If your ears can hear outside noise, so can the microphone. For best results, even the sound of air conditioning blowing through a vent should be avoided. Bare walls create subtle echoes and reverberation. A sound dampening booth is ideal, but you can also create a homemade recording studio using soft materials such as acoustic foam in a closet. Comforters and mattresses can also be used effectively!
- Speak at a consistent volume and speed. Rushing through the phrases will only result in a lower quality voice.
- Use a quality microphone. To obtain consistent results, we recommend a headset microphone so your mouth is always the same distance from the mic.
- Avoid vocal fatigue. Record a maximum of 4 hours a day, taking a break every half hour.
We welcome your voice donations to Mycroft for use in Text-to-Speech applications. If you would like to provide your voice recordings, you must license them to us under the Creative Commons CC0 Public Domain license so that we can utilise them in TTS voices - which are derivative works. If you're ready to donate your voice recordings, email us at [email protected].
PR's are gladly accepted!
You can get help and support with Mimic Recording Studio at;
- The Mycroft Forum
- In Mycroft Chat