Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distribute another file that contains that localized names of everything #8381

Open
bhousel opened this issue Jul 10, 2023 · 3 comments
Open
Labels
considering Not Actionable - still considering if this is something we want

Comments

@bhousel
Copy link
Member

bhousel commented Jul 10, 2023

I was chatting with @1ec5 about this issue of localizing the names that we use for presets. It's an issue that currently affects the flags in NSI, but would also affect some of the new categories we are considering adding, like Species (#8324) or Religions (#5960 et al)

The summary is - we currently have a Display Name property for each item in NSI, and this is used for the name of the preset that gets displayed in iD or JOSM. These strings are currently only in the language that we think the user would be using. We don't offer any localization of these strings.

It would be useful to allow users searching for a preset to be able to type other things. So we'd need some other source of data for the different names an item could be known by.

Wikidata already provides this, somewhat, because labels can be entered in many different languages, and "also known as" property is available too. There are also some properties to track common names that things are known by, like P1843.

We haven't tried to tackle localization in NSI yet, but I'm wondering whether we could just gather up all these names and languages in another sidecar file and distribute it alongside the files we already gather - so that consumers that want to be more locale-aware can use this to improve their user experience.

Open Question: Would we use these gathered names as another source of alternate matchNames - I dont know, maybe?

Would it be only one NSI entry? What would be the preset’s name (since this is the name suggestion index)? If the preset is simply named Acer platanoides, no one but a botanist would find it. If we name it “Norway maple”, then only English speakers would find it, while Spanish speakers in Spain would see English all over the preset list.

Originally posted by @1ec5 in #8324 (comment)

Some examples:

Starbucks: https://www.wikidata.org/wiki/Q37158

Screenshot 2023-07-10 at 2 39 32 PM

Norway Maple: https://www.wikidata.org/wiki/Q26745

Screenshot 2023-07-10 at 2 45 53 PM Screenshot 2023-07-10 at 2 45 47 PM
@bhousel bhousel added the considering Not Actionable - still considering if this is something we want label Jul 10, 2023
@1ec5
Copy link
Member

1ec5 commented Jul 10, 2023

For context, relying on Wikidata labels and properties would be somewhat unconventional for an OSM-related software project compared to the more common approach of soliciting project-specific translations on a system like Transifex or Translatewiki.net. But there is some prior art, such as the highway shield legend in ZeLonewolf/openstreetmap-americana#632.

For NSI, the biggest advantage to relying on Wikidata would be reducing what would otherwise be a very significant burden on volunteer translators. Besides, most of these translations would go to waste, never seen by anyone. Moreover, Wikidata items are supposed to correspond one-for-one with NSI entries, so we’re leaving a lot of valid translations on the table at the moment. (Sometimes they don’t correspond one-to-one, but that’s a bigger problem that these labels would surface, justifiably in my opinion.)

One thing to watch out for is that Wikidata has a different naming convention for labels than we do for presets. For example, Wikidata expects labels to be capitalized only when necessary, so that a data consumer can insert “smoke tree” in a sentence instead of a more jarring “Smoke tree”. By contrast, in the default American English localization, we currently prefer title case: openstreetmap/id-tagging-schema#473. (Some other languages like French and Spanish prefer sentence case.) NSI will need to recase the labels itself to keep people from seeing the wrong case and annoying the Wikidata community with “tagging for the editor” edits, as the Americana project initially did after landing its Wikidata-powered legend.

@LaoshuBaby
Copy link
Collaborator

LaoshuBaby commented Jul 11, 2023

That would mean we need to maintain a list of "what languages ​​are commonly used in what countries/regions"? Will it bring too much breaking changes?

@1ec5
Copy link
Member

1ec5 commented Jul 11, 2023

I’m not sure why such a list would be necessary. The build script would pull in all the labels that Wikidata has for a given operator or flag’s item, then produce a separate sidecar file for each language. It would be up to the client to choose the file appropriate to the language, similar to how interface localization works today.

@osmlab osmlab deleted a comment Jul 14, 2023
@osmlab osmlab deleted a comment Jul 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
considering Not Actionable - still considering if this is something we want
Projects
None yet
Development

No branches or pull requests

3 participants