Skip to content

Any-Latin; Latin-ASCII in replace_non_ascii()#71

Open
dustinstoltz wants to merge 1 commit intotrinker:masterfrom
dustinstoltz:fix-replace-non-ascii-transliteration
Open

Any-Latin; Latin-ASCII in replace_non_ascii()#71
dustinstoltz wants to merge 1 commit intotrinker:masterfrom
dustinstoltz:fix-replace-non-ascii-transliteration

Conversation

@dustinstoltz
Copy link
Copy Markdown

… scripts

Fixes #64

Previously, replace_non_ascii() used stri_trans_general(x, 'latin-ascii'), which only transliterated Latin-script characters. Non-Latin scripts (Cyrillic, CJK, Devanagari, etc.) were either left as byte sequences or stripped entirely by remove.nonconverted.

Now uses 'Any-Latin; Latin-ASCII' to first transliterate any script to Latin, then Latin to ASCII. This is backwards compatible since Any-Latin is a no-op for already-Latin input.

… scripts

Fixes trinker#64

Previously, replace_non_ascii() used stri_trans_general(x, 'latin-ascii'),
which only transliterated Latin-script characters. Non-Latin scripts
(Cyrillic, CJK, Devanagari, etc.) were either left as byte sequences
or stripped entirely by remove.nonconverted.

Now uses 'Any-Latin; Latin-ASCII' to first transliterate any script
to Latin, then Latin to ASCII. This is backwards compatible since
Any-Latin is a no-op for already-Latin input.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Could "Any-Latin; Latin-ASCII" be added to replace_non_ascii() to address logographics/cyrillic/devanagari?

1 participant