Use a TTS voice across regions with the same language #170

marisademeglio · 2023-11-21T00:55:10Z

Create the TTS Config file with multiple voice entries for the same voice name but iterate through the all available (in the voices list) language+region combos that include the voice's language.

This way, a preferred en-CA voice could get used for an en-IN document even if no en-IN voice is preferred.

See https://daisy-dev.slack.com/archives/C064GB8U9/p1700499741742109

marisademeglio · 2023-11-21T00:55:57Z

This will also require an adjustment when ingesting existing settings files as they all have prio = 1 but we want the preferred voices to have prio=2 and the derived entries for that same voice to have prio=1

bertfrees · 2024-05-13T22:08:01Z

Now that locales in the voice config XML are interpreted as language ranges, it shouldn't be needed anymore to iterate through the all available language+region combos. A single entry with just the language subtag (e.g. en) has the same effect. I might still want to have two priority levels though.

marisademeglio · 2024-05-20T14:54:01Z

Ok can we close this issue? Prioritization should be covered by #169 .

bertfrees · 2024-05-20T19:45:22Z

@marisa Here is an example of a voice configuration XML to demonstrate what I said about locales being language ranges now:

<config>
  <voice lang="en-IN" engine="google" name="en-IN-Standard-A" gender="female-adult" priority="1"/>
  <voice lang="en" engine="google" name="en-IN-Standard-A" gender="female-adult" priority="1"/>
  <voice lang="en-US" engine="google" name="en-US-Wavenet-J" gender="male-adult" priority="1"/>
  <voice lang="en" engine="google" name="en-US-Wavenet-J" gender="male-adult" priority="1"/>
</config>

Compared to what the current config looks like when you select the "en-IN-Standard-A" and "en-US-Wavenet-J" voices, the new config above has two additional voice mappings, for lang="en". This is needed because a voice for a specific region is not automatically applied to locales without that region subtag anymore. Since en is equivalent to en-*, one of these new mappings will be used for other "en" dialects than "en-IN" and "en-AU". Note how it is still unpredictable which voice will be chosen for e.g. "en-GB", unless a gender has been specified in CSS. This is where the priority attribute would help to make one of the voices more preferred (#169).

MDipendra · 2024-05-22T07:40:24Z

Expected behaviour of TTS in Pipeline app:

Case 1:
In Pipeline app, we select Indian English voice.
Document can have any dialect of English as Lang attribute. Example, US English or a mix of US English and Australian English.
Then:
Recording is done in chosen Indian English voice.

Case 2:
In Pipeline app, we select Indian English voice.
Document has text in any dialect of English and in Hindi.
Than:
English text is recorded in chosen Indian English voice and Hindi text is recorded in one of the Hindi voice.

Case 3:
In Pipeline app, we select Indian English and Australian English voice.
Document has text in Indian English and US English
Than:
Since there is a match of Indian English and no match for US English, entire document is recorded in Indian English voice is used and Australian English voice is ignored.

Case 4:
In Pipeline app, we select Indian English, Nigerian English and US English voices.
Document has text in Indian English, Nigerian English and US English
Then:
Text is recorded in their respective dialects.

marisademeglio · 2024-05-23T01:01:23Z

Ok thanks for the info both of you; I am thinking about how to best present all this information.

A user having to decide "I want to prioritize these voices, and out of the 3 English ones, I want this en-IN one as the one to use across other types of English" is a rather complicated question.

I was playing with a table of voices to see how many there are for each language, you can see it here:
https://664e9474fae363742287c120--pipeline-voices-table.netlify.app/

Across 3 engines, there are 200+ English voices, not to mention 80 languages (not counting regions, just languages).

We probably need some additional settings dialog screens to help make this more manageable. A side effect should be that screen-readers would get less overwhelmed by a giant list of voices.

MDipendra · 2024-05-23T04:54:57Z

A good example of this is how we choose voices in NVDA. Instead of a table of all voices, We have set of 3 combo-boxes: TTS, Language (including the dialect) and third is the list of voices. Choice of first combo-box populates second combo-box. Similarly, choice made in second combo-box populated the third combo-box. The updating of the second combo-box should wait till the choice in first combo-box is finalized. In other words, When we are in first combo-box, arrow keys should expand the list of available choices and not start updating the dependent combo-box on every down-up-arrow key press.

marisademeglio · 2024-05-23T17:45:04Z

Ok that's an idea to make selecting the voices easier. And then we need another way to look at all the selected voices in a language (not counting region) to choose one to use as the fallback for that language.

We could also start out by having the user identify which languages they are interested in, and then add voices to each language (again language not counting region).

MDipendra · 2024-05-24T05:18:10Z

Sure, that approach would also be good to select languages first. Thanks Dipendra

marisademeglio · 2024-05-25T02:01:47Z

@MDipendra @prashantverma2014 @bertfrees Could you look at this idea and let me know if it could work for the TTS settings dialog? You can pick preferred voices via drop down filters like what Dipendra described, and then you can pick one voice per language "group" (e.g. language regardless of region) to be the default.

Suggestions welcome!

Again, the voice list is hard coded, it's just a mock-up.

https://665145ce30bad9a628e61770--pipeline-voices-table.netlify.app/voices2.html

prashantverma2014 · 2024-05-25T05:25:25Z

Dear Marisa, I like this design. Few suggestions and questions: * The “Code” drop down can be renamed as Dialect and in its drop down list in addition to the language code its full name can be displayed. For example en-IN English (India) * In the Voices list it is better to write all names in English. For example, at present Hindi voice names are written in Hindi alphabets. The user may not have configured the screen reader to speak different languages. * What happens after I select one language and voice? I assume that I should be able to select another language with voice and it will be listed in the table below. Currently this did not happen. I think you can add buttons like “add a TTS voice”, “Reset/Delete” in this screen so that users can setup more than one language with a preferred voice for it. Thanks, Prashant

marisademeglio · 2024-05-25T06:03:48Z

thanks @prashantverma2014 for having a look!

I will implement your suggestion for language name display.

As for selecting a voice, if you pick one from the drop down, it should appear with a checkbox that says "select as a preferred voice"

then in the table of preferred voices you can pick one for each language group to be the default, eg one english voice for dialects that have no specific setting.

but if this isn't working then maybe it's a browser issue. The actual UI should behave ok in this respect.

as for voice names, we don't control that as far as I am aware. That info comes from the TTS engine directly.

bertfrees · 2024-05-25T10:45:05Z

@marisademeglio I like the interface. I'm only not sure if marking one
voice as default for a language going to be enough. What will it mean
to be default? There is region, but there is also gender and age. Note
that Pipeline attaches more importance to gender and age than to
region/accent when selecting voices.

So I don't know whether or not it may be useful to have multiple
"defaults" per language. We could e.g. allow one default per
language/gender/age combination? I think that might make sense.

bertfrees · 2024-05-25T11:09:21Z

Case 3: In Pipeline app, we select Indian English and Australian
English voice. Document has text in Indian English and US English
Than: Since there is a match of Indian English and no match for US
English, entire document is recorded in Indian English voice is used
and Australian English voice is ignored.

This is something we can do automatically. Selecting voices for all
sentences is already done before the sentences are narrated, so this
might be feasable.

Being able to select a default voice for a language will still be
needed though, for other use cases. But both features are compatible
AFAICS, that shouldn't be a problem.

Nevertheless it seems too big of a change for this short development
sprint, and it would need extensive testing. So I think this is
something for a following release.

About what I said before:

There is region, but there is also gender and age. Note that
Pipeline attaches more importance to gender and age than to
region/accent when selecting voices.

Perhaps for now it is sufficient if we include gender/age in the
interface. That might make it clear for users that "default" does not
mean "when there is no exact match for a given dialect", but rather
"when there is no exact match for a given age/gender/dialect".

marisademeglio · 2024-05-26T00:12:05Z

That sounds like it could work - one thing though, does the endpoint return age info? I don't remember seeing it.

bertfrees · 2024-05-26T22:59:32Z

Age and gender is actually combined in a single attribute "gender" in the web service, sorry for the confusion. Attribute can be * (neutral) / male-adult / male-child / male-elderly / female-child / female-adult / female-elderly.

I don't know how it is best presented in the UI. When age is specified In CSS, it is specified in combination with gender, but not in a single keyword: https://www.w3.org/TR/css-speech-1/#typedef-generic-voice. (Note that this is not the exact CSS syntax that is currently supported by Pipeline, but I want to become compatible with this syntax.)

marisademeglio · 2024-05-31T00:38:24Z

Does this make sense?

3 English voices are "preferred": Ava, Ananya, and Aarav.

2 of them are "default" for English: Ananya and and Ava

1 of them is "high" priority, as indicated by the user: Ananya

and the configuration looks like this:

 <config>
                <voice engine="azure" name="Ananya" lang="en-IN" gender="female-adult" priority="2"/>
                <voice engine="azure" name="Ananya" lang="en" gender="female-adult" priority="4"/>
                <voice engine="azure" name="Ava" lang="en-US" gender="female-adult" priority="1"/>
                <voice engine="azure" name="Ava" lang="en" gender="female-adult" priority="3"/>
                <voice engine="azure" name="Aarav" lang="en-IN" gender="male-adult" priority="1"/>
  </config>

I don't know what scenario the user is facing when they indicate normal/high priority as well as default=yes or default=no.

But this represents what I've heard we need in the TTS settings, from this issue and also #169.

Below is a screenshot of what the dialog currently looks like. A brief description is:

Top of dialog:
Series of drop down boxes for finding and adding voices to the preferred voices table

Bottom of dialog:
Preferred voices table, with voice info for each and options to make a voice the "default", to set its priority (high/normal) and to remove it from the list.

bertfrees · 2024-05-31T14:12:23Z

The way I had understood we were going to do it, is that we were going to allow all preferred voices to be used across regions with the same language, and that "default" just was a different word for "higher priority". I didn't expect two settings.

So, in my understanding, in the example with the three preferred English voices, setting the female Indian English voice Ananya as default (high priority) would result in:

 <config>
    <voice engine="azure" name="Ananya" lang="en-IN" gender="female-adult" priority="2"/>
    <voice engine="azure" name="Ananya" lang="en" gender="female-adult" priority="2"/>
    <voice engine="azure" name="Ava" lang="en-US" gender="female-adult" priority="1"/>
    <voice engine="azure" name="Ava" lang="en" gender="female-adult" priority="1"/>
    <voice engine="azure" name="Aarav" lang="en-IN" gender="male-adult" priority="1"/>
    <voice engine="azure" name="Aarav" lang="en" gender="male-adult" priority="1"/>
  </config>

This means:

Aarav is used when a male English voice is requested.
Ava is used when a female American English voice is requested.
Ananya is used when any other (female) English voice (e.g. British English) is requested.

What scenarios can we think of that require controlling which preferred voices can not be used across regions, or that require more than two levels of preference?

marisademeglio · 2024-05-31T16:26:03Z

Yes I have been confused by separating default vs priority and initially I had implemented what you described, e.g. "default" = higher priority in the config file. But then I went back and read through all the comments here and I was afraid I'd missed something because you mentioned you might still want 2 priority levels.

But if that's not required in addition to default-ness then it gets way simpler which is great!

All preferred TTS voices are entered into the config file as two entries: one with their language+region and one with just language (no region) Users may pick a default voice for a language (regardless of region/gender/age). This is recorded with a higher priority.

bertfrees · 2024-06-04T10:41:04Z

Some suggestions:

In the voices dialog, you capitalize the engine name. Engines also have a "display name". I will make that available in the web API.
The gender/age options could maybe be presented better also. Instead of "Female-adult" etc., maybe something like "Female adult" (or just "Female" or "Woman"), "Female elderly" (or "Old woman"), "Female child" (or "Girl"), "Neutral" (or "Gender neutral"), ...

In fact "Unknown" would be more accurate than "Neutral". Voices marked "neutral" (or "*", same thing) match any requested gender, so it's effectively like a wildcard. But in practice this category is used primarily for voices for which we can't automatically determined the gender, notably the macOS voices. We should probably make two separate categories.

(Actually it might be useful to be able to select the gender for which the macOS voice is to be used when it is selected as a preferred voice.)

Perhaps another possibility would be to have two options, one for gender and one for age, although that might become a bit much.
The list of preferred voices basically represents the voice config XML. In that regard, I thought it might be a good idea to present it in such a way that it makes it more obvious how the voice selection will work. This is currently left up to the user to guess, and I think it's pretty intuitive so should be fine, but still... The user might not realize at first that an American English preferred voice will be used for any English content (unless another preferred voice is selected as default). So perhaps each preferred voice could have a column or ℹ button that tells you in plain English when the voice will be used, similar to how I did it in my comment above.

One final thing: I noticed that while searching for a voice I'm kind of missing seeing the filtered voice list/table instantly, like we had previously. E.g. you used to be able to see all the English voices with corresponding accent and gender, without having to decide on an accent and gender first. That is not possible anymore. But I guess that is the sacrifice for having a simpler and more accessible interface.

bertfrees · 2024-06-04T10:44:59Z

I noticed a small hitch: when you first select an engine to filter by, then a language, the gender/age and voice options are not updated.

marisademeglio · 2024-06-04T17:43:50Z

Oh I can't reproduce that at all, I just tried a few different combos.

initially showing all voices:

select an engine and get fewer voices:

bertfrees · 2024-06-04T19:38:59Z

And does the "Clear default for English" work for you?

marisademeglio · 2024-06-04T19:42:48Z

And does the "Clear default for English" work for you?

Yeah, no issues. It doesn't work for you? Maybe delete your settings file and restart the app? But first post your settings file here and let me see what's going on with it. Minus any API keys ofc.

bertfrees · 2024-06-04T19:52:37Z

Ah it seems to have an effect on settings.json and ttsConfig.xml, the UI is just not updated. I need to close and reopen the settings window.

bertfrees · 2024-06-04T20:15:16Z

Unfortunately there is still an issue with the voice selection. It's due to my wrong advice 😞.

I think it needs to be done as folows (but let me first double-check it):

Preferred: priority 1
Default: second voice with lang = primary language + priority 2

The way we are doing it now results in the default voice being used even when there is a more suitable preferred voice. My bad 😞

marisademeglio · 2024-06-04T20:18:45Z

Unfortunately there is still an issue with the voice selection. It's due to my wrong advice 😞.

I think it needs to be done as folows (but let me first double-check it):
* Preferred: priority 1

* Default: second voice with lang = primary language + priority 2
The way we are doing it now results in the default voice being used even when there is a more suitable preferred voice. My bad 😞

Ok no problem to change it, so the ttsConfig XML file would look like this then?

 <config>
    <voice engine="azure" name="Ananya" lang="en-IN" gender="female-adult" priority="1"/>
    <voice engine="azure" name="Ananya" lang="en" gender="female-adult" priority="2"/>
    <voice engine="azure" name="Ava" lang="en-US" gender="female-adult" priority="1"/>
    <voice engine="azure" name="Ava" lang="en" gender="female-adult" priority="1"/>
    <voice engine="azure" name="Aarav" lang="en-IN" gender="male-adult" priority="1"/>
    <voice engine="azure" name="Aarav" lang="en" gender="male-adult" priority="1"/>
  </config>

Where Ananya is default and Ava, Ananya, and Aarav are preferred?

bertfrees · 2024-06-04T20:23:10Z

No like this:

<config>
    <voice engine="azure" name="Ananya" lang="en-IN" gender="female-adult" priority="2"/>
    <voice engine="azure" name="Ananya" lang="en" gender="female-adult" priority="2"/>
    <voice engine="azure" name="Ava" lang="en-US" gender="female-adult" priority="1"/>
    <voice engine="azure" name="Aarav" lang="en-IN" gender="male-adult" priority="1"/>
</config>

But I need to double-check it, don't want to be wrong this time.

bertfrees · 2024-06-10T20:48:32Z

For a next release: it might be useful to be able to select one default voice for each language/gender combination.

MDipendra · 2024-06-10T23:39:49Z

I confirm that we were able to test out 2 cases and run the TTS conversion successfully. In first case, the document was in US English and I set Indian English as my preferred voice for English. Resulting audio recording came in Indian English voice. In second case, I had document in US English and hindi. We chose Indian English as preferred voice for English and no other voice for chosen. Audio recording recorded English in Indian English and chose Hindi voice on its own to record Hindi text. In Third case we added a Hindi voice too as preferred voice for Hindi and that voice was used in the recording. However, we faced issue with Save As DAISY in correct markup of Hindi text in the document and had to do markup of Hindi text manually inside the DTBook XML document to get desired results. Without the correct markup, Hindi text was not read out and skipped.

marisademeglio mentioned this issue Nov 21, 2023

Add TTS voice ranking #169

Closed

marisademeglio mentioned this issue Apr 15, 2024

feat(settings): new settings migration utility for evolutions #214

Merged

bertfrees added this to Pipeline 2 | June 2024 release May 13, 2024

marisademeglio added this to the 1.4 milestone May 20, 2024

marisademeglio added a commit that referenced this issue May 31, 2024

fix(#169, #170): default voices and voice prioritization

a0cb27b

bertfrees moved this to In Progress in Pipeline 2 | June 2024 release May 31, 2024

bertfrees assigned marisademeglio May 31, 2024

marisademeglio closed this as completed Jun 1, 2024

github-project-automation bot moved this from In Progress to Done in Pipeline 2 | June 2024 release Jun 1, 2024

This was referenced Jun 4, 2024

Settings dialog, TTS voices page: display names of TTS engines #228

Closed

Settings dialog, TTS voices page: presentation of gender/age info #229

Open

marisademeglio mentioned this issue Jun 4, 2024

Settings dialog, TTS voices page: presentation of preferred voices #230

Open

marisademeglio added a commit that referenced this issue Jun 4, 2024

fix(#170): tts config file tweak

23d44a1

marisademeglio mentioned this issue Jun 11, 2024

Settings dialog, TTS voices page: select default for language + gender #236

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a TTS voice across regions with the same language #170

Use a TTS voice across regions with the same language #170

marisademeglio commented Nov 21, 2023

marisademeglio commented Nov 21, 2023

bertfrees commented May 13, 2024

marisademeglio commented May 20, 2024

bertfrees commented May 20, 2024

MDipendra commented May 22, 2024

marisademeglio commented May 23, 2024 •

edited

Loading

MDipendra commented May 23, 2024 via email •

edited by bertfrees

Loading

marisademeglio commented May 23, 2024

MDipendra commented May 24, 2024 via email •

edited by bertfrees

Loading

marisademeglio commented May 25, 2024

prashantverma2014 commented May 25, 2024 via email •

edited by bertfrees

Loading

marisademeglio commented May 25, 2024

bertfrees commented May 25, 2024

bertfrees commented May 25, 2024 •

edited

Loading

marisademeglio commented May 26, 2024

bertfrees commented May 26, 2024

marisademeglio commented May 31, 2024

bertfrees commented May 31, 2024

marisademeglio commented May 31, 2024

bertfrees commented Jun 4, 2024 •

edited

Loading

bertfrees commented Jun 4, 2024

marisademeglio commented Jun 4, 2024

bertfrees commented Jun 4, 2024

marisademeglio commented Jun 4, 2024

bertfrees commented Jun 4, 2024

bertfrees commented Jun 4, 2024

marisademeglio commented Jun 4, 2024

bertfrees commented Jun 4, 2024

bertfrees commented Jun 10, 2024

MDipendra commented Jun 10, 2024 via email •

edited by bertfrees

Loading

Use a TTS voice across regions with the same language #170

Use a TTS voice across regions with the same language #170

Comments

marisademeglio commented Nov 21, 2023

marisademeglio commented Nov 21, 2023

bertfrees commented May 13, 2024

marisademeglio commented May 20, 2024

bertfrees commented May 20, 2024

MDipendra commented May 22, 2024

marisademeglio commented May 23, 2024 • edited Loading

MDipendra commented May 23, 2024 via email • edited by bertfrees Loading

marisademeglio commented May 23, 2024

MDipendra commented May 24, 2024 via email • edited by bertfrees Loading

marisademeglio commented May 25, 2024

prashantverma2014 commented May 25, 2024 via email • edited by bertfrees Loading

marisademeglio commented May 25, 2024

bertfrees commented May 25, 2024

bertfrees commented May 25, 2024 • edited Loading

marisademeglio commented May 26, 2024

bertfrees commented May 26, 2024

marisademeglio commented May 31, 2024

bertfrees commented May 31, 2024

marisademeglio commented May 31, 2024

bertfrees commented Jun 4, 2024 • edited Loading

bertfrees commented Jun 4, 2024

marisademeglio commented Jun 4, 2024

bertfrees commented Jun 4, 2024

marisademeglio commented Jun 4, 2024

bertfrees commented Jun 4, 2024

bertfrees commented Jun 4, 2024

marisademeglio commented Jun 4, 2024

bertfrees commented Jun 4, 2024

bertfrees commented Jun 10, 2024

MDipendra commented Jun 10, 2024 via email • edited by bertfrees Loading

marisademeglio commented May 23, 2024 •

edited

Loading

MDipendra commented May 23, 2024 via email •

edited by bertfrees

Loading

MDipendra commented May 24, 2024 via email •

edited by bertfrees

Loading

prashantverma2014 commented May 25, 2024 via email •

edited by bertfrees

Loading

bertfrees commented May 25, 2024 •

edited

Loading

bertfrees commented Jun 4, 2024 •

edited

Loading

MDipendra commented Jun 10, 2024 via email •

edited by bertfrees

Loading