Skip to content

Don't ignore localized strings that are the same as original#307

Open
felipeborges wants to merge 1 commit into
hughsie:mainfrom
felipeborges:dont-ignore-localized-strings
Open

Don't ignore localized strings that are the same as original#307
felipeborges wants to merge 1 commit into
hughsie:mainfrom
felipeborges:dont-ignore-localized-strings

Conversation

@felipeborges

Copy link
Copy Markdown

Avoiding storing identical strings is a clever measure to keep the
appstream file small and clean, but it ignores translation differences
between languages and their specific locales.

Applications tend to "fallback" from missing locales by picking a
translation of the same language but from a different locale.
For instance, when missing a "pt_BR" translation, some apps will
pick the "pt" translation instead. That usually works but there
are some cases when it doesn't, such as for international words:
Brazilian Portuguese (pt_BR) tends to use them, while European
Portuguese (pt) has a translation for everything.

This way, "GNOME Boxes" gets translated to "Caixas GNOME" in
European Portuguese (pt) but the same "GNOME Boxes" name is
expected in Brazilian Portuguese (pt_BR).

This was initially reported as a Flatpak issue in
https://lists.freedesktop.org/archives/flatpak/2019-May/001578.html
Because OP was seeing the wrong translations in Flathub and
GNOME Software.

Avoiding storing identical strings is a clever measure to keep the
appstream file small and clean, but it ignores translation differences
between languages and their specific locales.

Applications tend to "fallback" from missing locales by picking a
translation of the same language but from a different locale.
For instance, when missing a "pt_BR" translation, some apps will
pick the "pt" translation instead. That usually works but there
are some cases when it doesn't, such as for international words:
Brazilian Portuguese (pt_BR) tends to use them, while European
Portuguese (pt) has a translation for everything.

This way, "GNOME Boxes" gets translated to "Caixas GNOME" in
European Portuguese (pt) but the same "GNOME Boxes" name is
expected in Brazilian Portuguese (pt_BR).

This was initially reported as a Flatpak issue in
https://lists.freedesktop.org/archives/flatpak/2019-May/001578.html
Because OP was seeing the wrong translations in Flathub and
GNOME Software.
@hughsie

hughsie commented May 23, 2019

Copy link
Copy Markdown
Owner

How much does this affect the size of something like the Fedora metadata? Certainly en_GB is 99.9% the same as C although I do concede that gzip should dedupe this effectively. @kalev what do you think?

@kalev

kalev commented May 23, 2019

Copy link
Copy Markdown
Collaborator

I'd say it's probably best to go for correctness, even if it increases the metadata size. Let's see how much bigger it gets and back this change out if it's unacceptably large?

@rffontenelle

Copy link
Copy Markdown
Contributor

How about keeping localized strings equal original only when the fallback language ('pt', in Felipe's example) translation is presented? If absent, there will be no fallback and no problem for other languages (e.g. pt_BR). This way, lines wouldn't be added without being required and file size may not grow too much.

@felipeborges

Copy link
Copy Markdown
Author

How about keeping localized strings equal original only when the fallback language ('pt', in Felipe's example) translation is presented? If absent, there will be no fallback and no problem for other languages (e.g. pt_BR). This way, lines wouldn't be added without being required and file size may not grow too much.

My understanding is that the fallback is done in the application's side. All consumers of appstream.xml will read from it (GNOME Software, Flathub, other-software-store-thingies) and they are the ones using this heuristic to fallback missing translations.

@hughsie

hughsie commented May 24, 2019

Copy link
Copy Markdown
Owner

and they are the ones using this heuristic to fallback missing translations

Old versions of gnome-software use appstream-glib to get the "best" translation, so it should be enough to fix it here for those older versions. For newer versions of gnome-software we parse it directly with libxmlb and so any fix would need to be copied there. I'm going to do some benchmarks compiling the entire fedora repo today with the dedepe functionality and without. It does take some time...

@hughsie

hughsie commented May 24, 2019

Copy link
Copy Markdown
Owner

I'm guessing this wasn't tested :) The XML file is the same size, even post decompression. I think this patch needs to drop AS_NODE_INSERT_FLAG_DEDUPE_LANG -- I'm just running the generator again with that dropped to see what it does to the XML file.

@rffontenelle

Copy link
Copy Markdown
Contributor

Hey there, is there anything blocking this PR? Now and then Brazilians ask about this issue.

@hughsie

hughsie commented Dec 12, 2019

Copy link
Copy Markdown
Owner

Hey there, is there anything blocking this PR?

Well, there's CI failure and the fact that the patch doesn't work, but other than that...

continue;

/* avoid storing identical strings */
data_localized = data->cdata;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove the data_localized variable.

continue;
g_hash_table_insert (hash,
as_ref_string_ref (xml_lang != NULL ? xml_lang : xml_lang_c),
(gpointer) data_localized);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of data_localized, you need to use data->cdata

@igaldino

Copy link
Copy Markdown
Contributor

Hi @felipeborges, I tried to update your branch but I don't have rights, so I just commented it.
Changes should be enough to pass test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants