Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a TTS voice across regions with the same language #170

Closed
marisademeglio opened this issue Nov 21, 2023 · 30 comments
Closed

Use a TTS voice across regions with the same language #170

marisademeglio opened this issue Nov 21, 2023 · 30 comments
Assignees
Milestone

Comments

@marisademeglio
Copy link
Member

Create the TTS Config file with multiple voice entries for the same voice name but iterate through the all available (in the voices list) language+region combos that include the voice's language.

This way, a preferred en-CA voice could get used for an en-IN document even if no en-IN voice is preferred.

See https://daisy-dev.slack.com/archives/C064GB8U9/p1700499741742109

@marisademeglio
Copy link
Member Author

This will also require an adjustment when ingesting existing settings files as they all have prio = 1 but we want the preferred voices to have prio=2 and the derived entries for that same voice to have prio=1

@bertfrees
Copy link
Member

Now that locales in the voice config XML are interpreted as language ranges, it shouldn't be needed anymore to iterate through the all available language+region combos. A single entry with just the language subtag (e.g. en) has the same effect. I might still want to have two priority levels though.

@marisademeglio
Copy link
Member Author

Ok can we close this issue? Prioritization should be covered by #169 .

@marisademeglio marisademeglio added this to the 1.4 milestone May 20, 2024
@bertfrees
Copy link
Member

@marisa Here is an example of a voice configuration XML to demonstrate what I said about locales being language ranges now:

<config>
  <voice lang="en-IN" engine="google" name="en-IN-Standard-A" gender="female-adult" priority="1"/>
  <voice lang="en" engine="google" name="en-IN-Standard-A" gender="female-adult" priority="1"/>
  <voice lang="en-US" engine="google" name="en-US-Wavenet-J" gender="male-adult" priority="1"/>
  <voice lang="en" engine="google" name="en-US-Wavenet-J" gender="male-adult" priority="1"/>
</config>

Compared to what the current config looks like when you select the "en-IN-Standard-A" and "en-US-Wavenet-J" voices, the new config above has two additional voice mappings, for lang="en". This is needed because a voice for a specific region is not automatically applied to locales without that region subtag anymore. Since en is equivalent to en-*, one of these new mappings will be used for other "en" dialects than "en-IN" and "en-AU". Note how it is still unpredictable which voice will be chosen for e.g. "en-GB", unless a gender has been specified in CSS. This is where the priority attribute would help to make one of the voices more preferred (#169).

@MDipendra
Copy link

Expected behaviour of TTS in Pipeline app:

Case 1:
In Pipeline app, we select Indian English voice.
Document can have any dialect of English as Lang attribute. Example, US English or a mix of US English and Australian English.
Then:
Recording is done in chosen Indian English voice.

Case 2:
In Pipeline app, we select Indian English voice.
Document has text in any dialect of English and in Hindi.
Than:
English text is recorded in chosen Indian English voice and Hindi text is recorded in one of the Hindi voice.

Case 3:
In Pipeline app, we select Indian English and Australian English voice.
Document has text in Indian English and US English
Than:
Since there is a match of Indian English and no match for US English, entire document is recorded in Indian English voice is used and Australian English voice is ignored.

Case 4:
In Pipeline app, we select Indian English, Nigerian English and US English voices.
Document has text in Indian English, Nigerian English and US English
Then:
Text is recorded in their respective dialects.

@marisademeglio
Copy link
Member Author

marisademeglio commented May 23, 2024

Ok thanks for the info both of you; I am thinking about how to best present all this information.

A user having to decide "I want to prioritize these voices, and out of the 3 English ones, I want this en-IN one as the one to use across other types of English" is a rather complicated question.

I was playing with a table of voices to see how many there are for each language, you can see it here:
https://664e9474fae363742287c120--pipeline-voices-table.netlify.app/

Across 3 engines, there are 200+ English voices, not to mention 80 languages (not counting regions, just languages).

We probably need some additional settings dialog screens to help make this more manageable. A side effect should be that screen-readers would get less overwhelmed by a giant list of voices.

@MDipendra
Copy link

MDipendra commented May 23, 2024 via email

@marisademeglio
Copy link
Member Author

Ok that's an idea to make selecting the voices easier. And then we need another way to look at all the selected voices in a language (not counting region) to choose one to use as the fallback for that language.

We could also start out by having the user identify which languages they are interested in, and then add voices to each language (again language not counting region).

@MDipendra
Copy link

MDipendra commented May 24, 2024 via email

@marisademeglio
Copy link
Member Author

@MDipendra @prashantverma2014 @bertfrees Could you look at this idea and let me know if it could work for the TTS settings dialog? You can pick preferred voices via drop down filters like what Dipendra described, and then you can pick one voice per language "group" (e.g. language regardless of region) to be the default.

Suggestions welcome!

Again, the voice list is hard coded, it's just a mock-up.

https://665145ce30bad9a628e61770--pipeline-voices-table.netlify.app/voices2.html

@prashantverma2014
Copy link
Collaborator

prashantverma2014 commented May 25, 2024 via email

@marisademeglio
Copy link
Member Author

thanks @prashantverma2014 for having a look!

I will implement your suggestion for language name display.

As for selecting a voice, if you pick one from the drop down, it should appear with a checkbox that says "select as a preferred voice"

then in the table of preferred voices you can pick one for each language group to be the default, eg one english voice for dialects that have no specific setting.

but if this isn't working then maybe it's a browser issue. The actual UI should behave ok in this respect.

as for voice names, we don't control that as far as I am aware. That info comes from the TTS engine directly.

@bertfrees
Copy link
Member

@marisademeglio I like the interface. I'm only not sure if marking one
voice as default for a language going to be enough. What will it mean
to be default? There is region, but there is also gender and age. Note
that Pipeline attaches more importance to gender and age than to
region/accent when selecting voices.

So I don't know whether or not it may be useful to have multiple
"defaults" per language. We could e.g. allow one default per
language/gender/age combination? I think that might make sense.

@bertfrees
Copy link
Member

bertfrees commented May 25, 2024

Case 3: In Pipeline app, we select Indian English and Australian
English voice. Document has text in Indian English and US English
Than: Since there is a match of Indian English and no match for US
English, entire document is recorded in Indian English voice is used
and Australian English voice is ignored.

This is something we can do automatically. Selecting voices for all
sentences is already done before the sentences are narrated, so this
might be feasable.

Being able to select a default voice for a language will still be
needed though, for other use cases. But both features are compatible
AFAICS, that shouldn't be a problem.

Nevertheless it seems too big of a change for this short development
sprint, and it would need extensive testing. So I think this is
something for a following release.

About what I said before:

There is region, but there is also gender and age. Note that
Pipeline attaches more importance to gender and age than to
region/accent when selecting voices.

Perhaps for now it is sufficient if we include gender/age in the
interface. That might make it clear for users that "default" does not
mean "when there is no exact match for a given dialect", but rather
"when there is no exact match for a given age/gender/dialect".

@marisademeglio
Copy link
Member Author

That sounds like it could work - one thing though, does the endpoint return age info? I don't remember seeing it.

@bertfrees
Copy link
Member

Age and gender is actually combined in a single attribute "gender" in the web service, sorry for the confusion. Attribute can be * (neutral) / male-adult / male-child / male-elderly / female-child / female-adult / female-elderly.

I don't know how it is best presented in the UI. When age is specified In CSS, it is specified in combination with gender, but not in a single keyword: https://www.w3.org/TR/css-speech-1/#typedef-generic-voice. (Note that this is not the exact CSS syntax that is currently supported by Pipeline, but I want to become compatible with this syntax.)

@marisademeglio
Copy link
Member Author

Does this make sense?

3 English voices are "preferred": Ava, Ananya, and Aarav.

2 of them are "default" for English: Ananya and and Ava

1 of them is "high" priority, as indicated by the user: Ananya

and the configuration looks like this:

 <config>
                <voice engine="azure" name="Ananya" lang="en-IN" gender="female-adult" priority="2"/>
                <voice engine="azure" name="Ananya" lang="en" gender="female-adult" priority="4"/>
                <voice engine="azure" name="Ava" lang="en-US" gender="female-adult" priority="1"/>
                <voice engine="azure" name="Ava" lang="en" gender="female-adult" priority="3"/>
                <voice engine="azure" name="Aarav" lang="en-IN" gender="male-adult" priority="1"/>
  </config>

I don't know what scenario the user is facing when they indicate normal/high priority as well as default=yes or default=no.

But this represents what I've heard we need in the TTS settings, from this issue and also #169.

Below is a screenshot of what the dialog currently looks like. A brief description is:

Top of dialog:
Series of drop down boxes for finding and adding voices to the preferred voices table

Bottom of dialog:
Preferred voices table, with voice info for each and options to make a voice the "default", to set its priority (high/normal) and to remove it from the list.

Screenshot 2024-05-30 at 17 33 58

@bertfrees
Copy link
Member

The way I had understood we were going to do it, is that we were going to allow all preferred voices to be used across regions with the same language, and that "default" just was a different word for "higher priority". I didn't expect two settings.

So, in my understanding, in the example with the three preferred English voices, setting the female Indian English voice Ananya as default (high priority) would result in:

 <config>
    <voice engine="azure" name="Ananya" lang="en-IN" gender="female-adult" priority="2"/>
    <voice engine="azure" name="Ananya" lang="en" gender="female-adult" priority="2"/>
    <voice engine="azure" name="Ava" lang="en-US" gender="female-adult" priority="1"/>
    <voice engine="azure" name="Ava" lang="en" gender="female-adult" priority="1"/>
    <voice engine="azure" name="Aarav" lang="en-IN" gender="male-adult" priority="1"/>
    <voice engine="azure" name="Aarav" lang="en" gender="male-adult" priority="1"/>
  </config>

This means:

  • Aarav is used when a male English voice is requested.
  • Ava is used when a female American English voice is requested.
  • Ananya is used when any other (female) English voice (e.g. British English) is requested.

What scenarios can we think of that require controlling which preferred voices can not be used across regions, or that require more than two levels of preference?

@marisademeglio
Copy link
Member Author

Yes I have been confused by separating default vs priority and initially I had implemented what you described, e.g. "default" = higher priority in the config file. But then I went back and read through all the comments here and I was afraid I'd missed something because you mentioned you might still want 2 priority levels.

But if that's not required in addition to default-ness then it gets way simpler which is great!

marisademeglio added a commit that referenced this issue Jun 1, 2024
All preferred TTS voices are entered into the config file as two entries: one with their language+region and one with just language (no region)

Users may pick a default voice for a language (regardless of region/gender/age). This is recorded with a higher priority.
@github-project-automation github-project-automation bot moved this from In Progress to Done in Pipeline 2 | June 2024 release Jun 1, 2024
@bertfrees
Copy link
Member

bertfrees commented Jun 4, 2024

Some suggestions:

  • In the voices dialog, you capitalize the engine name. Engines also have a "display name". I will make that available in the web API.

  • The gender/age options could maybe be presented better also. Instead of "Female-adult" etc., maybe something like "Female adult" (or just "Female" or "Woman"), "Female elderly" (or "Old woman"), "Female child" (or "Girl"), "Neutral" (or "Gender neutral"), ...

    In fact "Unknown" would be more accurate than "Neutral". Voices marked "neutral" (or "*", same thing) match any requested gender, so it's effectively like a wildcard. But in practice this category is used primarily for voices for which we can't automatically determined the gender, notably the macOS voices. We should probably make two separate categories.

    (Actually it might be useful to be able to select the gender for which the macOS voice is to be used when it is selected as a preferred voice.)

    Perhaps another possibility would be to have two options, one for gender and one for age, although that might become a bit much.

  • The list of preferred voices basically represents the voice config XML. In that regard, I thought it might be a good idea to present it in such a way that it makes it more obvious how the voice selection will work. This is currently left up to the user to guess, and I think it's pretty intuitive so should be fine, but still... The user might not realize at first that an American English preferred voice will be used for any English content (unless another preferred voice is selected as default). So perhaps each preferred voice could have a column or ℹ button that tells you in plain English when the voice will be used, similar to how I did it in my comment above.

One final thing: I noticed that while searching for a voice I'm kind of missing seeing the filtered voice list/table instantly, like we had previously. E.g. you used to be able to see all the English voices with corresponding accent and gender, without having to decide on an accent and gender first. That is not possible anymore. But I guess that is the sacrifice for having a simpler and more accessible interface.

@bertfrees
Copy link
Member

I noticed a small hitch: when you first select an engine to filter by, then a language, the gender/age and voice options are not updated.

@marisademeglio
Copy link
Member Author

Oh I can't reproduce that at all, I just tried a few different combos.

initially showing all voices:
Screenshot 2024-06-04 at 10 40 08

select an engine and get fewer voices:
Screenshot 2024-06-04 at 10 40 31

@bertfrees
Copy link
Member

And does the "Clear default for English" work for you?

@marisademeglio
Copy link
Member Author

And does the "Clear default for English" work for you?

Yeah, no issues. It doesn't work for you? Maybe delete your settings file and restart the app? But first post your settings file here and let me see what's going on with it. Minus any API keys ofc.

@bertfrees
Copy link
Member

Ah it seems to have an effect on settings.json and ttsConfig.xml, the UI is just not updated. I need to close and reopen the settings window.

@bertfrees
Copy link
Member

Unfortunately there is still an issue with the voice selection. It's due to my wrong advice 😞.

I think it needs to be done as folows (but let me first double-check it):

  • Preferred: priority 1
  • Default: second voice with lang = primary language + priority 2

The way we are doing it now results in the default voice being used even when there is a more suitable preferred voice. My bad 😞

@marisademeglio
Copy link
Member Author

Unfortunately there is still an issue with the voice selection. It's due to my wrong advice 😞.

I think it needs to be done as folows (but let me first double-check it):

* Preferred: priority 1

* Default: second voice with lang = primary language + priority 2

The way we are doing it now results in the default voice being used even when there is a more suitable preferred voice. My bad 😞

Ok no problem to change it, so the ttsConfig XML file would look like this then?

 <config>
    <voice engine="azure" name="Ananya" lang="en-IN" gender="female-adult" priority="1"/>
    <voice engine="azure" name="Ananya" lang="en" gender="female-adult" priority="2"/>
    <voice engine="azure" name="Ava" lang="en-US" gender="female-adult" priority="1"/>
    <voice engine="azure" name="Ava" lang="en" gender="female-adult" priority="1"/>
    <voice engine="azure" name="Aarav" lang="en-IN" gender="male-adult" priority="1"/>
    <voice engine="azure" name="Aarav" lang="en" gender="male-adult" priority="1"/>
  </config>

Where Ananya is default and Ava, Ananya, and Aarav are preferred?

@bertfrees
Copy link
Member

No like this:

<config>
    <voice engine="azure" name="Ananya" lang="en-IN" gender="female-adult" priority="2"/>
    <voice engine="azure" name="Ananya" lang="en" gender="female-adult" priority="2"/>
    <voice engine="azure" name="Ava" lang="en-US" gender="female-adult" priority="1"/>
    <voice engine="azure" name="Aarav" lang="en-IN" gender="male-adult" priority="1"/>
</config>

But I need to double-check it, don't want to be wrong this time.

@bertfrees
Copy link
Member

For a next release: it might be useful to be able to select one default voice for each language/gender combination.

@MDipendra
Copy link

MDipendra commented Jun 10, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

No branches or pull requests

4 participants