Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect use of python-format and c-format #78

Open
clj opened this issue Mar 7, 2016 · 1 comment
Open

Incorrect use of python-format and c-format #78

clj opened this issue Mar 7, 2016 · 1 comment

Comments

@clj
Copy link

clj commented Mar 7, 2016

As far as I can tell Lingua is marking extracted strings with the wrong formatting specifier flags.

Given the following input file:

print(_('Hello World!'))

print(_('Hello {name}!').format(name))

print(_('Hello %s!') % name)

And running the command:
pot-create hello_world.py -o hello.pot

I get the following .pot file:

# ... snip ...
#, fuzzy
msgid ""
msgstr ""
# ... snip ...
"Generated-By: Lingua 4.8.1\n"

#: ./hello_world.py:1
msgid "Hello World!"
msgstr ""

#: ./hello_world.py:3
#, python-format
msgid "Hello {name}!"
msgstr ""

#: ./hello_world.py:5
#, c-format
msgid "Hello %s!"
msgstr ""

Looking at the gettext source code, I am under the impression that source line 3 which has been written with the python-format flag should in fact use the python-brace-format flag and source line 5 which has been written with the c-format flag should in fact be using the python-format flag.

In particular looking at the gettext source files defining the different formats:

  • format-python.c
    The comment block towards the top of that file describes % (old) style string formatting. Specifically:

    Any string or Unicode string can act as format string via the '%' operator, implemented in stringobject.c and unicodeobject.c.

    although I believe that the String Formatting Operations section referred to in the comment can now be found at https://docs.python.org/2/library/stdtypes.html#string-formatting-operations.

  • format-python-brace.c
    The comment block towards the top of that file describes {} (new) style formatting. Specifically:

    Python brace format strings are defined by PEP3101 together with 'format' method of string class.

  • format-c.c
    This is for formatting C format strings, which are similar to, but not exactly like old style Python format strings. One example differences is the conversion type r (see String Formatting Operations) which formats a string using repr() and is not type that is available using C's printf. It is therefore probably not a good idea to use this format type for actual old style formatting Python strings.

Using the incorrect format specifier flags means that gettext's msgfmt command's --check option provides incorrect output. Given the following 'translation' of the above .pot file:

#, fuzzy
msgid ""
msgstr ""
"Content-Type: text/plain; charset=UTF-8\n"
"Generated-By: Lingua 4.8.1\n"

#: ./hello_world.py:1
msgid "Hello World!"
msgstr ""

#: ./hello_world.py:3
#, python-format
msgid "Hello {name}!"
msgstr "100% {name}!"

#: ./hello_world.py:5
#, c-format
msgid "Hello %s!"
msgstr "Hello %r!"

Running msgfmt produces the following output:

msgfmt hello.po --check-format
hello.po:14: 'msgstr' is not a valid Python format string, unlike 'msgid'. Reason: In the directive number 1, the character '{' is not a valid conversion specifier.
hello.po:19: 'msgstr' is not a valid C format string, unlike 'msgid'. Reason: In the directive number 1, the character 'r' is not a valid conversion specifier.
/usr/local/Cellar/gettext/0.19.7/bin/msgfmt: found 2 fatal errors

Changing the file to use the correct format specifier flags:

#, fuzzy
msgid ""
msgstr ""
"Content-Type: text/plain; charset=UTF-8\n"
"Generated-By: Lingua 4.8.1\n"

#: ./hello_world.py:1
msgid "Hello World!"
msgstr ""

#: ./hello_world.py:3
#, python-brace-format
msgid "Hello {name}!"
msgstr "100% {name}!"

#: ./hello_world.py:5
#, python-format
msgid "Hello %s!"
msgstr "Hello %r!"

will, given the same msgfmt command as above, produce no errors.

I would be happy to supply a patch for this if there is agreement that the format string flags should be corrected.

Cheers,
Christian

@wichert
Copy link
Owner

wichert commented Nov 10, 2019

Matching gettext is definitely a useful goal. If you can supply a patch to do that I'ld be happy to merge it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants