Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Website localization, Title Capitalization and avoiding duplicates

I'm working on i18n/l10n for a Python/Django-based website.

My wish is to minimize number of strings and avoid having the same text with only case differences, if possible. I.e. I don't want to keep "Your followers", "your followers" and "Your Followers" - this violates DRY and I fear things will run out of sync very quickly.

Given that Django loves lowercase in model field titles, a lot of strings I have are all-lowercase, except for proper nouns. I.e.:

class User(models.Model):
    ...
    # In my understanding, Django wants me to use "registration date",
    # not "Registration date" or "Registration Date" here.
    registration_date = models.DateField(_("registration date"), ...)

    # But "Skype" is a proper noun and we want it capitalized.
    # Note, in some languages it won't be the first word,
    # e.g. "nome de usuário Skype" in Portuguese.
    skype_username = models.CharField(_("Skype username"), ...)
    ...

However, the designer's wish is to have Each Word's First Letter Capitalized in most page and table titles/headers. So, I thought, I'd keep non-capitalized texts, but use {{ ...|title }} template filter.

But translators' say it's bad to capitalize pronouns in some languages. Even in English they don't look good. So, the site should say "Terms of Service" and "Política de Privacidade" not "Terms Of Service" or "Política De Privacidade". And in French - which we don't target right now, but I'm sure we will someday - the capitalization rules look even more complicated than just a list of "don't touch those" words (those "l'"s etc etc).

So I wonder what's the suggested approach for this kind of stuff, that'd keep the amount of headaches as small as possible.

It seems that my options are:

  1. Find a solution for language-aware string capitalization that'd not capitalize prepositions. Is there anything readily available out there? I'm not sure I want to write one myself, given that I'm not proficient in some languages we target.
  2. Ignore Django's rules and store Capitalized Versions of Strings in translation database, then lowercase them as necessary. This would have issues with proper nouns and given names, though.
  3. Store multiple versions of the same text (with varying capitalization) in translation files. I really wish to avoid this.
  4. Something else I haven't thought of?

I suppose this should be reasonably common case and there are a lot of fellow programmers who had already encountered something like this. Would appreciate any advices on how to deal with the matter.

like image 647
drdaeman Avatar asked Sep 07 '25 12:09

drdaeman


1 Answers

I might not have the perfect solution for your problem, but here are some thoughts I think are worth sharing:

  • "Given that Django loves lowercase in model field titles, a lot of strings I have are all-lowercase, except for proper nouns".
    I think you are confused here. Django doesn't like or dislike any type of capitalization, this is entirely up to you. The only thing Django does is, whenever you omit the verbose_name argument, it auto-generates field's verbose names based on the field name. When these are auto-generated (i.e. you didn't provide your own verbose_name explicitly along with a wrapping it in a gettext() call), they are not localizable.

  • Don't take for granted what your designers say — they generally account for English UIs.

  • Generally speaking, leave capitalization up to localizers: they are the best people to trust on how capitalization should work depending on the context. When you say "Find a solution for language-aware string capitalization that'd not capitalize prepositions" you are making too many assumptions about the target languages: they very likely have their own language and styling rules, but even more, they might not even have prepositions!

  • Provide as much comments and context as possible for localizers. It's not the same to localize a button, a header, a tooltip message etc.
    In Django you can achieve this using comments starting with Translators: as well as using pgettext() for providing context markers.

  • Don't try to be too clever by applying regular programming techniques to your source text. DRY might not be the right thing to do here.
    Let me explain my point: even if you manage to merge all source code strings with differing capitalization, that doesn't mean you can happily rest, as you might have introduced more problems than you had before.
    As an example, consider you have view and View, if you blindly merge them localizers will be given a single string to translate, but guess what, you might have created a problem because depending on the context and the grammar case view can be translated differently into other languages: it can be a verb, a noun, etc. The previous point applies here.

  • In general I believe this problem can be addressed elsewhere in your i18n/l10n workflow.
    You can potentially pre-translate your PO files (one example here, there are probably more) thus re-using already existing translations and pre-filling empty translations as fuzzy. The final decision is left up to localizers: they can simply remove the fuzzy mark if they are happy with it, or adjust the text accordingly.

like image 64
julen Avatar answered Sep 09 '25 13:09

julen