Skip to content

Redesign load_po to use Lex/Yacc state transitions style, to ease #1199

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 13 commits into from

Conversation

hoangduytran
Copy link

@hoangduytran hoangduytran commented Mar 15, 2025

maintenance and also speed up the parsing using multiprocessing option, including debugging, abort if invalid in multiprocessing as well as in linear processing mode, plus custom hook to system's exception handling to not show the frame stacks tracing. Note the number of batches is based on the number of batches = (CPU cores - 2) / batch division reduction. The speed for 10.8MB PO file loading just takes about 1.7seconds in multi-processing mode
Using 11 batch(es) (cpu_count: 24)
number of batches: 11, batch_size: 3222
1.623452367
Execution of run took 0:00:01.62.

…ntenance and also speed up the parsing using multiprocessing option, including debugging, abort if invalid in multiprocessing as well as in linear processing mode, plus custom hook to system's exception handling to not show the frame stacks tracing.
…olean values, allowing combinations of

true_set = {"true", "1", "yes", "y", "t", "on"}

(Note upper cases, mix cases are allowed), inclusing 1, 0, True, False.
…ction name in the messages, using inspect to call work out the name of the function where DEBUG_LOG was called from.
@akx
Copy link
Member

akx commented Mar 17, 2025

Oh... At first sight, this looks like a whole lot more complex than the parser we have right now.

Could you please explain why this would be necessary, to begin with? (What prompted you to start writing this?) What are the improvements over the current implementation (PoFileParser)?

PRs are of course in general welcome, but a 1300 line PR out of the blue is a bit daunting! 😅

Copy link
Member

@akx akx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this locally using https://projects.blender.org/blender/blender-manual-translations/raw/branch/main/fi/LC_MESSAGES/blender_manual.po as the .po file to parse.

After making some changes (e.g. imports from a local_logging package I do not have, or a dependency on sphinx_intl (?)), this benchmarks to be about 3x slower than the current implementation on master on my machine, and prints out a whole lot of "Invalid continuation line for state 128" errors.

With multiprocessing=True passed to load_po, the resulting catalog is empty.

Secondly, I see there's a lot of reliance on shared global state, which is generally not a great idea – in particular, this couldn't necessarily be safely run in e.g. a web server context (if someone needed to e.g. extract messages from a PO file for, say, a translation service).

All in all, I might suggest that if you need this for some other project, you could maintain it separately – and on the other hand, now that I look at our PO file parsing code, there are things that we could optimize even without overhauling the entire thing.

@AA-Turner
Copy link
Contributor

@akx FYI, there is discussion in sphinx-doc/sphinx-intl#118. I'm equally confused about the motivation behind this series of PRs.

A

@hoangduytran
Copy link
Author

hoangduytran commented Mar 19, 2025

Thank you all for your comments and findings. On my machine, this is the code I'm using to load and write a PO file, taking from here -- https://translate.blender.org/projects/blender-ui/ui/vi/ -- click the Files->Download translation. The file is about 7.6MB (the version on my machine). The code is this:
Screenshot 2025-03-19 at 17 26 48

    def testPO(self):        
        file_path = os.environ['BLENDER_PO_FILE']        
        cat: Catalog = c.load_po(file_path, multiprocessing=0)
        dev_home = os.environ['DEV']
        out_file = os.path.join(dev_home, 'test_write.po')
        c.dump_po(out_file, cat, line_width=4096)

I use my local pyenv, and swap in and out the old and the new when testing, this is the results I've got today, the top one, 4.68 seconds is the current load_po(), I then swapped out the old env with the new env, and reran the same test, and the code took only 3.09 seconds. Now this is without running the loading in multiprocessing mode. With the processing mode on, I got 2.26 seconds for both tasks, read and write.

Screenshot 2025-03-19 at 17 36 55

    def testPO(self):        
        file_path = os.environ['BLENDER_PO_FILE']        
        cat: Catalog = c.load_po(file_path, multiprocessing=1)
        dev_home = os.environ['DEV']
        out_file = os.path.join(dev_home, 'test_write.po')
        c.dump_po(out_file, cat, line_width=4096)

Screenshot 2025-03-19 at 17 37 21

(pyenv) hoang@hoangs-Mac-Pro ~ % test.py        
4.675346039
Execution of run took 0:00:04.68.
(pyenv) hoang@hoangs-Mac-Pro ~ % test.py
3.093110572
Execution of run took 0:00:03.09.
(pyenv) hoang@hoangs-Mac-Pro ~ % test.py 
Using 11 batch(es) (cpu_count: 24)
number of batches: 11, batch_size: 3222
2.264604558
Execution of run took 0:00:02.26.
(pyenv) hoang@hoangs-Mac-Pro ~ % 

I am using Python 3.9.6 and running on MacOS Monterey 12.7.6. I have made some more corrections and adding a little more features to this file last few days. If you want, I can update this new code. I don't know why you have problem in running it, when I ran on my machine, it's working fine. The main reasons sparkled all this was from the fact I'm writing a little server to load serveral large PO files, to use them as a kind of dictionary when doing translations, the server runs in fastapi and the client is the vscode community version. In doing so, I can write a little extension from vscode, then invoke html calls to the server to automate tasks, like checking if a word is in the dictionary or not, or finding the highlighted keywords to see where is it in several thousand source code files, just using a single keystroke. After using this model a while, for nearly two years, I just got worked up on the loading speed, and that pushed me to rewrite the load_po(). I do sometimes load PO files individually and I could benefit from using multiprocessing mode. When loading many PO files, I use multiprocessing pool to load them concurrently and each file is loaded linearly, which proves to be much faster. However, when loading them as a single file, using multiprocessing benefit me much more.

I do compare the written copy of the parsing and in this image of diff from vscode, it shown no difference whatsoever:

Screenshot 2025-03-19 at 18 16 43
(I don't know how to overcome this 'This private-user-images.githubusercontent.com page can’t be found' issue to allow you to see the image)

I agree the code appears to be more long winded but I wanted to write a version that can be maintained and readable, plus you can just turn on debug=1 and you can see what the code is doing, what state it's parsing. I had made a much faster version, more cryptic, but I decided not to pursue further with that version as it's not as maintainable as this one, though this one is a little slower after introduced the enumerations.

…rds, plus adding handlings of previous records with '#|' prefix. This one, printing out still treated as use_comments as the current code doesn't match with specification as yet.
@hoangduytran
Copy link
Author

hoangduytran commented Mar 19, 2025

I tried this locally using https://projects.blender.org/blender/blender-manual-translations/raw/branch/main/fi/LC_MESSAGES/blender_manual.po as the .po file to parse.

After making some changes (e.g. imports from a local_logging package I do not have, or a dependency on sphinx_intl (?)), this benchmarks to be about 3x slower than the current implementation on master on my machine, and prints out a whole lot of "Invalid continuation line for state 128" errors.

With multiprocessing=True passed to load_po, the resulting catalog is empty.

Secondly, I see there's a lot of reliance on shared global state, which is generally not a great idea – in particular, this couldn't necessarily be safely run in e.g. a web server context (if someone needed to e.g. extract messages from a PO file for, say, a translation service).

All in all, I might suggest that if you need this for some other project, you could maintain it separately – and on the other hand, now that I look at our PO file parsing code, there are things that we could optimize even without overhauling the entire thing.

Can you please tell me which os are you testing on, so I can see if I can replicate your testing environment, to see exactly what problem is. Please test with the latest source code I just updated and let me know anything else I needed to check for. Thank you so much. I know it's about collaborations and not individuality here. Thank you for your assistance. I am just a translator (retired programmer), and I bet if I'm struggling with the speed, many other translators would feel the same.

…corrected continuation line for obsolete lines, which started also with a "#~".
…ate the scope, plus fixing the obsolete line ValueError exception.
@hoangduytran
Copy link
Author

hoangduytran commented Mar 20, 2025

Ok, I have corrected some more, using your test file (blender_manual.po) above, the resulted runs are these:
The test file is:

    def testPO(self):
        home_dir = os.environ['HOME']
        file_path = os.path.join(home_dir, "Downloads/blender_manual.po")
        # file_path = os.environ['BLENDER_PO_FILE']
        cat: Catalog = c.load_po(file_path, multiprocessing=1)
        print(f'loaded {len(cat)} records!')
        dev_home = os.environ['DEV']
        out_file = os.path.join(dev_home, 'test_manual_write.po')
        c.dump_po(out_file, cat, line_width=4096)

The speed of loading:

Linear Mode

% test.py
loaded 39271 records!
7.5229632010000005
Execution of run took 0:00:07.52.

Multiprocessing mode

% test.py
Using 11 batch(es) (cpu_count: 24)
number of batches: 11, batch_size: 5268
loaded 39271 records!
4.607690345
Execution of run took 0:00:04.61.

However, there is one issue and I haven't yet got an idea how to solve it correctly, that is the problem in the grammar of the header, the key:


"Language-Team: Finnish <https://translate.blender.org/projects/blender-"
"manual/manual/fi/>\n"

has been split into two lines by the 'normalize', I think. Ideally, it must be joined with the previous line to make sense. With the current model of line by line processing, I haven't yet known how to solve this and yet still maintain the correct grammar structure of the header. The common intuition is to join with previous line when error occurs, assuming it's one part that has been broken down by the 'normalize' routine. but to me, only with human eyes that we could work this out clearly, whereas with computer, we're working on assumptions and that's dangerous. That's why I need to think it over, or if you have an idea how to solve this effectively but still maintain the correct logic then do please share. I have to move that part below, "manual..." to the line above it and removed two double quotes before running the tests.


# SOME DESCRIPTIVE TITLE.
# Copyright (C) : This page is licensed under a CC-BY-SA 4.0 Int. License
# This file is distributed under the same license as the Blender 2.92 Manual
# package.
# FIRST AUTHOR <EMAIL@ADDRESS>, 2020.
#
msgid ""
msgstr ""
"Project-Id-Version: Blender 2.92 Manual 2.92\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2025-03-17 16:53+0100\n"
"PO-Revision-Date: 2025-02-24 16:41+0000\n"
"Last-Translator: Anonymous <[email protected]>\n"
"Language: fi\n"
"Language-Team: Finnish <https://translate.blender.org/projects/blender-"
"manual/manual/fi/>\n" ******* >>>> the problem line here
"Plural-Forms: nplurals=2; plural=n != 1;\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: Babel 2.17.0\n"

There is another problem with the comment lines when denormalise when line width changed to a huge value, like 4096, something is wrong somewhere, I will take a look at it tomorrow with the debugger.

@akx
Copy link
Member

akx commented Mar 20, 2025

Can you please tell me which os are you testing on, so I can see if I can replicate your testing environment, to see exactly what problem is.

Python 3.13.2, MacBook Pro (M2 Max).

The code I'm trying with is

import os

from babel.messages.pofile import read_po
from babel.messages.poparse import load_po


def fun():
    method = os.environ["METHOD"]
    if method == "new":
        cat = load_po("blender_manual.po", multiprocessing=False)
    elif method == "old":
        with open("blender_manual.po", encoding="utf-8") as f:
            cat = read_po(f)
    else:
        raise NotImplementedError("...")
    assert cat
    print(cat, len(cat))


if __name__ == '__main__':
    for _x in range(3):
        fun()

The assert and print is there so I can verify that both methods result in the same number of messages.

With 3767daa, running env METHOD=old python readpobench_redesign_vs_orig.py now gives me the continuation error you mentioned earlier.

Using the previous version, 4e1ee45, with the local_logging and sphinx_intl imports removed, benchmarking with hyperfine:

$ hyperfine --warmup 2 'env METHOD=new python readpobench_redesign_vs_orig.py' 'env METHOD=old python readpobench_redesign_vs_orig.py'
Benchmark 1: env METHOD=new python readpobench_redesign_vs_orig.py
  Time (mean ± σ):      4.204 s ±  0.029 s    [User: 4.003 s, System: 0.193 s]
  Range (min … max):    4.162 s …  4.264 s    10 runs

Benchmark 2: env METHOD=old python readpobench_redesign_vs_orig.py
  Time (mean ± σ):      1.402 s ±  0.005 s    [User: 1.327 s, System: 0.073 s]
  Range (min … max):    1.392 s …  1.413 s    10 runs

Summary
  env METHOD=old python readpobench_redesign_vs_orig.py ran
    3.00 ± 0.02 times faster than env METHOD=new python readpobench_redesign_vs_orig.py

Using blender-ui-ui-vi.po as the file:

$ hyperfine --warmup 2 'env METHOD=new python readpobench_redesign_vs_orig.py' 'env METHOD=old python readpobench_redesign_vs_orig.py'
Benchmark 1: env METHOD=new python readpobench_redesign_vs_orig.py
  Time (mean ± σ):      1.775 s ±  0.012 s    [User: 1.679 s, System: 0.093 s]
  Range (min … max):    1.751 s …  1.789 s    10 runs

Benchmark 2: env METHOD=old python readpobench_redesign_vs_orig.py
  Time (mean ± σ):     823.5 ms ±   7.2 ms    [User: 778.7 ms, System: 41.9 ms]
  Range (min … max):   809.3 ms … 832.8 ms    10 runs

Summary
  env METHOD=old python readpobench_redesign_vs_orig.py ran
    2.15 ± 0.02 times faster than env METHOD=new python readpobench_redesign_vs_orig.py

@akx
Copy link
Member

akx commented Mar 20, 2025

After using this model a while, for nearly two years, I just got worked up on the loading speed, and that pushed me to rewrite the load_po().

Sounds like your translation memory app could benefit from simply not loading PO files as much – as you've noticed, parsing text formats tends to be slow. I assume those .po files won't change very much for your app's use, so you could just store more optimized formats of them (and recreate the optimized format when the original has changed, naturally).

For instance, https://gist.github.com/akx/01a75f13324b7c36ad22c2b96c05df51 outputs

load 0.4362940788269043
opt and write 1.2938261032104492
load pickle 0.07299518585205078

on my machine, suggesting loading the catalog from a pickle file is 6 times faster than parsing the PO file. (Running that gist does require a small patch to Babel, since our 2007-vintage FixedOffsetTimezone objects don't pickle well – I'll make a PR to get rid of them.)

Even better, since your app sounds like it's basically doing database lookups, have you considered loading the catalog contents to a database? You could then do lookups like SELECT * FROM messages WHERE msgstr LIKE '%foo%' – I made a little proof of concept of this sort of thing, and on a totally unindexed SQLite database of 588,499 messages across 5,015 catalogs, that lookup takes about 0.1 seconds on my machine...

@hoangduytran
Copy link
Author

import os

from babel.messages.pofile import read_po
from babel.messages.poparse import load_po

def fun():
method = os.environ["METHOD"]
if method == "new":
cat = load_po("blender_manual.po", multiprocessing=False)
elif method == "old":
with open("blender_manual.po", encoding="utf-8") as f:
cat = read_po(f)
else:
raise NotImplementedError("...")
assert cat
print(cat, len(cat))

if name == 'main':
for _x in range(3):
fun()

Thank you for the test code. I'm learning new way of doing things from you. Thank you again.

@hoangduytran
Copy link
Author

hoangduytran commented Mar 20, 2025

Summary
env METHOD=old python readpobench_redesign_vs_orig.py ran
3.00 ± 0.02 times faster than env METHOD=new python readpobench_redesign_vs_orig.py

This is impossible, I am using a rather old machine, 2010 MP 5,1 with a nvme drive, and I bet many people out there are still stuck with an old machine. It appears to me either there is some tricks or some buffering going on somewhere. I just use the @benchmark code like this:

def benchmark(func: Callable[..., Any]) -> Callable[..., Any]:
    def format_td(seconds, digits=2):
        isec, fsec = divmod(round(seconds*10**digits), 10**digits)
        return f'{datetime.timedelta(seconds=isec)}.{fsec:0{digits}.0f}'

    def wrapper(*args: Any, **kwargs: Any) -> Any:
        start_time = perf_counter()
        value = func(*args, **kwargs)
        end_time = perf_counter()
        run_time = end_time - start_time
        # format_time = time.strftime('%H:%M:%S', time.localtime(seconds=run_time/1000.00))
        # execution_time = datetime.timedelta(seconds=run_time)
        print(run_time)
        time_str = format_td(run_time, 2)
        print(f'Execution of {func.__name__} took {time_str}.')
        return value
    return wrapper

and put

@benchmark
on top of my test main routine to measure the execution time.

@hoangduytran
Copy link
Author

hoangduytran commented Mar 20, 2025

I assume those .po files won't change very much for your app's use

No, they are changing freqently, that's why I do not put them in redis. But your suggestion made me think about this again. I might try the redis db and see if the timing and easy of use are improved. Thank you again.

@akx
Copy link
Member

akx commented Mar 20, 2025

No, they are changing freqently,

How frequently? Seconds? Minutes? :)

Redis might not be a great storage here, since it's in-memory and quite simple. I would suggest SQLite, PostgreSQL, even DuckDB these days.

@akx
Copy link
Member

akx commented Mar 20, 2025

Summary
env METHOD=old python readpobench_redesign_vs_orig.py ran
3.00 ± 0.02 times faster than env METHOD=new python readpobench_redesign_vs_orig.py

This is impossible, I am using a rather old machine, 2010 MP 5,1 with a nvme drive, and I bet many people out there are still stuck with an old machine.

I don't know what to tell you – with a script like this that adapts your @benchmark decorator:

from time import perf_counter
from typing import Any, Callable

from babel.messages.pofile import read_po
from babel.messages.poparse import load_po


def benchmark(func: Callable[..., Any]) -> Callable[..., Any]:
    def wrapper(*args: Any) -> Any:
        start_time = perf_counter()
        value = func(*args)
        end_time = perf_counter()
        run_time = end_time - start_time
        print(f'{func.__name__} {args}:'.ljust(40), f'{int(run_time*1000):5d}')
        return value

    return wrapper


@benchmark
def load_po_new(filename):
    return load_po(filename, multiprocessing=False)


@benchmark
def read_po_old(filename):
    with open(filename, encoding="utf-8") as f:
        return read_po(f)


def main():
    for _ in range(3):
        for filename in ("blender-ui-ui-vi.po", "blender_manual.po"):
            for fn in (load_po_new, read_po_old):
                fn(filename)


if __name__ == '__main__':
    main()

I get results like

(babel) ~/b/babel ((4e1ee45a) *) $ uv run --python=3.9 --no-project python bench1.py 2>/dev/null
load_po_new ('blender-ui-ui-vi.po',):      617
read_po_old ('blender-ui-ui-vi.po',):      528
load_po_new ('blender_manual.po',):       1473
read_po_old ('blender_manual.po',):        850
load_po_new ('blender-ui-ui-vi.po',):      930
read_po_old ('blender-ui-ui-vi.po',):      585
load_po_new ('blender_manual.po',):       1749
read_po_old ('blender_manual.po',):        986
load_po_new ('blender-ui-ui-vi.po',):     1129
read_po_old ('blender-ui-ui-vi.po',):      730
load_po_new ('blender_manual.po',):       1922
read_po_old ('blender_manual.po',):       1088
(babel) ~/b/babel ((4e1ee45a) *) $ uv run --python=3.13 --no-project python bench1.py 2>/dev/null
load_po_new ('blender-ui-ui-vi.po',):      451
read_po_old ('blender-ui-ui-vi.po',):      285
load_po_new ('blender_manual.po',):       1088
read_po_old ('blender_manual.po',):        535
load_po_new ('blender-ui-ui-vi.po',):      610
read_po_old ('blender-ui-ui-vi.po',):      398
load_po_new ('blender_manual.po',):       1386
read_po_old ('blender_manual.po',):        650
load_po_new ('blender-ui-ui-vi.po',):      623
read_po_old ('blender-ui-ui-vi.po',):      504
load_po_new ('blender_manual.po',):       1484
read_po_old ('blender_manual.po',):        736

@hoangduytran
Copy link
Author

hoangduytran commented Mar 20, 2025

Yes, the removal of the first part:

    # return poparse.load_po(filename, **kwargs)
    # pre-read to get charset
    with io.open(filename, 'rb') as f:
        cat = pofile.read_po(f)
    charset = cat.charset or 'utf-8'

from the

def load_po(filename, **kwargs):

in order just to get the charset from the file, in

sphinx_intl,

has improved the running speed and my code is not faster, but I'm glad you're aware of the downfall of the old code. I hope this fix will benefit all users out there.

You only needed this:

# -*- coding: utf-8 -*-

import os
import io
import re

from babel.messages import pofile, mofile, poparse

CONTENT_TYPE_CHARSET_PATTERN = re.compile(b"Content-Type: [^;]+; charset=([^\r\n]+)")

def get_char_set(filename: str) -> str:
    """
    Open a file in binary mode, search its content for a Content-Type header that specifies the charset,
    and return the corresponding encoding. If no charset is found, it defaults to 'utf-8'.

    Parameters:
        filename (str): The path to the file to be read.

    Returns:
        str: The charset found in the file or 'utf-8' if not present.

    Workflow:
      1. Open the file in binary mode to correctly handle non-UTF8 encoded files.
      2. Read the entire file content.
      3. Search for the Content-Type header using the precompiled regex pattern.
      4. If found, decode the captured charset value from ASCII (replacing errors) and strip
         any extraneous newline characters or quotes.
      5. If not found, return 'utf-8' as the default encoding.
    """
    with open(filename, "rb") as f:
        content = f.read()
    m = re.search(CONTENT_TYPE_CHARSET_PATTERN, content)
    if m:
        # Decode the captured charset value using ASCII decoding.
        match_part = m.group(1).decode("ascii", errors="replace")
        # Strip potential newline and quote characters from the decoded string.
        char_code = match_part.strip('\\n"').strip()
        return char_code
    else:
        return "utf-8"

def load_po(filename, **kwargs):
    """read po/pot file and return catalog object

    :param unicode filename: path to po/pot file
    :param kwargs: keyword arguments to forward to babel's read_po call
    :return: catalog object
    """
    # return poparse.load_po(filename, **kwargs)
    # # pre-read to get charset
    # with io.open(filename, 'rb') as f:
    #     cat = pofile.read_po(f)
    # charset = cat.charset or 'utf-8'

    charset = get_char_set(filename)

    # To decode lines by babel, read po file as binary mode and specify charset for
    # read_po function.
    with io.open(filename, 'rb') as f:  # FIXME: encoding VS charset
        return pofile.read_po(f, charset=charset, **kwargs)

for the

sphinx_intl/catalog.py

Another suggestion is to remove the TextWrapper on the comment lines, especially the header strings, to avoid ambiguities. Do them for msgid and msgstr is OK. The result I got yesterday like this:

#: ../../manual/addons/3d_view/vr_scene_inspection.rst:16 ../../manual/addons/animation/copy_global_transform.rst:17 ../../manual/addons/import_export/anim_bvh.rst:17 ../../manual/addons/import_export/curve_svg.rst:17 ../../manual/addons/import_export/mesh_uv_layout.rst:13 ../../manual/addons/import_export/scene_fbx.rst:13 ../../manual/addons/import_export/scene_gltf2.rst:13 ../../manual/addons/node/node_wrangler.rst:16 ../../manual/addons/rigging/rigify/introduction.rst:53 ../../manual/addons/system/ui_translations.rst:10
#, fuzzy
msgid "Enabling Add-on"
msgstr "Edistynyt"

is really annoying, when the original is like this:

#: ../../manual/addons/3d_view/vr_scene_inspection.rst:16
#: ../../manual/addons/animation/copy_global_transform.rst:17
#: ../../manual/addons/import_export/anim_bvh.rst:17
#: ../../manual/addons/import_export/curve_svg.rst:17
#: ../../manual/addons/import_export/mesh_uv_layout.rst:13
#: ../../manual/addons/import_export/scene_fbx.rst:13
#: ../../manual/addons/import_export/scene_gltf2.rst:13
#: ../../manual/addons/node/node_wrangler.rst:16
#: ../../manual/addons/rigging/rigify/introduction.rst:53
#: ../../manual/addons/system/ui_translations.rst:10
#, fuzzy
msgid "Enabling Add-on"
msgstr "Edistynyt"

is much cleaner.

@hoangduytran
Copy link
Author

hoangduytran commented Mar 20, 2025

How frequently? Seconds? Minutes? :)

Redis might not be a great storage here, since it's in-memory and quite simple. I would suggest SQLite, PostgreSQL, even DuckDB these days.

I would say in minutes, so might be I will take a look at these DB. I cannot forget this experience, when I was working, my boss asked me to test the company's codes with DB and flat files, I ran 10,000 samples over the codes to test both approaches and DB proved to be 7 times slower than flat files. This probably affected me and I tried to avoid using DB since.

@hoangduytran
Copy link
Author

I think simplify this code:

    def _format_comment(comment, prefix=''):
        for line in comment:
            yield f"#{prefix} {line.strip()}\n"

        # for line in comment_wrapper.wrap(comment):
        #     yield f"#{prefix} {line.strip()}\n"

in

def generate_po(
    catalog: Catalog,
    *,
    ignore_obsolete: bool = False,
    include_lineno: bool = True,
    include_previous: bool = False,
    no_location: bool = False,
    omit_header: bool = False,
    sort_by: Literal["message", "location"] | None = None,
    width: int = 76,
) -> Iterable[str]:

of file

pofile.py

and change this

yield from _format_comment(' '.join(locs), prefix=':')

to this only:

yield from _format_comment(locs, prefix=':')

in the same

generate_po

routine works.

I now got this back:

#: ../../manual/addons/3d_view/vr_scene_inspection.rst:16
#: ../../manual/addons/animation/copy_global_transform.rst:17
#: ../../manual/addons/import_export/anim_bvh.rst:17
#: ../../manual/addons/import_export/curve_svg.rst:17
#: ../../manual/addons/import_export/mesh_uv_layout.rst:13
#: ../../manual/addons/import_export/scene_fbx.rst:13
#: ../../manual/addons/import_export/scene_gltf2.rst:13
#: ../../manual/addons/node/node_wrangler.rst:16
#: ../../manual/addons/rigging/rigify/introduction.rst:53
#: ../../manual/addons/system/ui_translations.rst:10
msgid "Enabling Add-on"
msgstr "Trình Bổ Sung đã được Bật"

(for Vietnamese)

@hoangduytran
Copy link
Author

I think this can be closed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants