-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use a TTS voice across regions with the same language #170
Comments
This will also require an adjustment when ingesting existing settings files as they all have prio = 1 but we want the preferred voices to have prio=2 and the derived entries for that same voice to have prio=1 |
Now that locales in the voice config XML are interpreted as language ranges, it shouldn't be needed anymore to iterate through the all available language+region combos. A single entry with just the language subtag (e.g. |
Ok can we close this issue? Prioritization should be covered by #169 . |
@marisa Here is an example of a voice configuration XML to demonstrate what I said about locales being language ranges now: <config>
<voice lang="en-IN" engine="google" name="en-IN-Standard-A" gender="female-adult" priority="1"/>
<voice lang="en" engine="google" name="en-IN-Standard-A" gender="female-adult" priority="1"/>
<voice lang="en-US" engine="google" name="en-US-Wavenet-J" gender="male-adult" priority="1"/>
<voice lang="en" engine="google" name="en-US-Wavenet-J" gender="male-adult" priority="1"/>
</config> Compared to what the current config looks like when you select the "en-IN-Standard-A" and "en-US-Wavenet-J" voices, the new config above has two additional voice mappings, for |
Expected behaviour of TTS in Pipeline app: Case 1: Case 2: Case 3: Case 4: |
Ok thanks for the info both of you; I am thinking about how to best present all this information. A user having to decide "I want to prioritize these voices, and out of the 3 English ones, I want this en-IN one as the one to use across other types of English" is a rather complicated question. I was playing with a table of voices to see how many there are for each language, you can see it here: Across 3 engines, there are 200+ English voices, not to mention 80 languages (not counting regions, just languages). We probably need some additional settings dialog screens to help make this more manageable. A side effect should be that screen-readers would get less overwhelmed by a giant list of voices. |
A good example of this is how we choose voices in NVDA. Instead of a table of all voices, We have set of 3 combo-boxes:
TTS, Language (including the dialect) and third is the list of voices.
Choice of first combo-box populates second combo-box. Similarly, choice made in second combo-box populated the third combo-box.
The updating of the second combo-box should wait till the choice in first combo-box is finalized. In other words, When we are in first combo-box, arrow keys should expand the list of available choices and not start updating the dependent combo-box on every down-up-arrow key press.
|
Ok that's an idea to make selecting the voices easier. And then we need another way to look at all the selected voices in a language (not counting region) to choose one to use as the fallback for that language. We could also start out by having the user identify which languages they are interested in, and then add voices to each language (again language not counting region). |
Sure, that approach would also be good to select languages first.
Thanks
Dipendra
|
@MDipendra @prashantverma2014 @bertfrees Could you look at this idea and let me know if it could work for the TTS settings dialog? You can pick preferred voices via drop down filters like what Dipendra described, and then you can pick one voice per language "group" (e.g. language regardless of region) to be the default. Suggestions welcome! Again, the voice list is hard coded, it's just a mock-up. https://665145ce30bad9a628e61770--pipeline-voices-table.netlify.app/voices2.html |
Dear Marisa,
I like this design.
Few suggestions and questions:
* The “Code” drop down can be renamed as Dialect and in its drop down
list in addition to the language code its full name can be
displayed. For example en-IN English (India)
* In the Voices list it is better to write all names in English. For
example, at present Hindi voice names are written in Hindi
alphabets. The user may not have configured the screen reader to
speak different languages.
* What happens after I select one language and voice? I assume that I
should be able to select another language with voice and it will be
listed in the table below. Currently this did not happen. I think
you can add buttons like “add a TTS voice”, “Reset/Delete” in this
screen so that users can setup more than one language with a
preferred voice for it.
Thanks,
Prashant
|
thanks @prashantverma2014 for having a look! I will implement your suggestion for language name display. As for selecting a voice, if you pick one from the drop down, it should appear with a checkbox that says "select as a preferred voice" then in the table of preferred voices you can pick one for each language group to be the default, eg one english voice for dialects that have no specific setting. but if this isn't working then maybe it's a browser issue. The actual UI should behave ok in this respect. as for voice names, we don't control that as far as I am aware. That info comes from the TTS engine directly. |
@marisademeglio I like the interface. I'm only not sure if marking one So I don't know whether or not it may be useful to have multiple |
This is something we can do automatically. Selecting voices for all Being able to select a default voice for a language will still be Nevertheless it seems too big of a change for this short development About what I said before:
Perhaps for now it is sufficient if we include gender/age in the |
That sounds like it could work - one thing though, does the endpoint return age info? I don't remember seeing it. |
Age and gender is actually combined in a single attribute "gender" in the web service, sorry for the confusion. Attribute can be I don't know how it is best presented in the UI. When age is specified In CSS, it is specified in combination with gender, but not in a single keyword: https://www.w3.org/TR/css-speech-1/#typedef-generic-voice. (Note that this is not the exact CSS syntax that is currently supported by Pipeline, but I want to become compatible with this syntax.) |
Does this make sense? 3 English voices are "preferred": Ava, Ananya, and Aarav. 2 of them are "default" for English: Ananya and and Ava 1 of them is "high" priority, as indicated by the user: Ananya and the configuration looks like this:
I don't know what scenario the user is facing when they indicate normal/high priority as well as default=yes or default=no. But this represents what I've heard we need in the TTS settings, from this issue and also #169. Below is a screenshot of what the dialog currently looks like. A brief description is: Top of dialog: Bottom of dialog: |
The way I had understood we were going to do it, is that we were going to allow all preferred voices to be used across regions with the same language, and that "default" just was a different word for "higher priority". I didn't expect two settings. So, in my understanding, in the example with the three preferred English voices, setting the female Indian English voice Ananya as default (high priority) would result in:
This means:
What scenarios can we think of that require controlling which preferred voices can not be used across regions, or that require more than two levels of preference? |
Yes I have been confused by separating default vs priority and initially I had implemented what you described, e.g. "default" = higher priority in the config file. But then I went back and read through all the comments here and I was afraid I'd missed something because you mentioned you might still want 2 priority levels. But if that's not required in addition to default-ness then it gets way simpler which is great! |
All preferred TTS voices are entered into the config file as two entries: one with their language+region and one with just language (no region) Users may pick a default voice for a language (regardless of region/gender/age). This is recorded with a higher priority.
Some suggestions:
One final thing: I noticed that while searching for a voice I'm kind of missing seeing the filtered voice list/table instantly, like we had previously. E.g. you used to be able to see all the English voices with corresponding accent and gender, without having to decide on an accent and gender first. That is not possible anymore. But I guess that is the sacrifice for having a simpler and more accessible interface. |
I noticed a small hitch: when you first select an engine to filter by, then a language, the gender/age and voice options are not updated. |
And does the "Clear default for English" work for you? |
Yeah, no issues. It doesn't work for you? Maybe delete your settings file and restart the app? But first post your settings file here and let me see what's going on with it. Minus any API keys ofc. |
Ah it seems to have an effect on settings.json and ttsConfig.xml, the UI is just not updated. I need to close and reopen the settings window. |
Unfortunately there is still an issue with the voice selection. It's due to my wrong advice 😞. I think it needs to be done as folows (but let me first double-check it):
The way we are doing it now results in the default voice being used even when there is a more suitable preferred voice. My bad 😞 |
Ok no problem to change it, so the ttsConfig XML file would look like this then?
Where Ananya is default and Ava, Ananya, and Aarav are preferred? |
No like this:
But I need to double-check it, don't want to be wrong this time. |
For a next release: it might be useful to be able to select one default voice for each language/gender combination. |
I confirm that we were able to test out 2 cases and run the TTS
conversion successfully.
In first case, the document was in US English and I set Indian English
as my preferred voice for English. Resulting audio recording came in
Indian English voice.
In second case, I had document in US English and hindi. We chose
Indian English as preferred voice for English and no other voice for
chosen. Audio recording recorded English in Indian English and chose
Hindi voice on its own to record Hindi text.
In Third case we added a Hindi voice too as preferred voice for Hindi
and that voice was used in the recording.
However, we faced issue with Save As DAISY in correct markup of Hindi
text in the document and had to do markup of Hindi text manually
inside the DTBook XML document to get desired results. Without the
correct markup, Hindi text was not read out and skipped.
|
Create the TTS Config file with multiple voice entries for the same voice name but iterate through the all available (in the voices list) language+region combos that include the voice's language.
This way, a preferred en-CA voice could get used for an en-IN document even if no en-IN voice is preferred.
See https://daisy-dev.slack.com/archives/C064GB8U9/p1700499741742109
The text was updated successfully, but these errors were encountered: