Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Column wrapping may break ANSI escape codes #307

Closed
devdanzin opened this issue Jan 12, 2024 · 2 comments · Fixed by #308
Closed

Column wrapping may break ANSI escape codes #307

devdanzin opened this issue Jan 12, 2024 · 2 comments · Fixed by #308

Comments

@devdanzin
Copy link
Contributor

When creating a table with maxcolwidths, ANSI escape codes sometimes get wrongly split.

Here's an example, increasing the length of the "0123..." sequence to show the issue:

print(tabulate.tabulate(tabular_data=(('01234 (\x1b[32mabcdefghij\x1b[0m)', 'XX'),), maxcolwidths=11, tablefmt="grid"))
+-------------+----+
| 01234 (abcd | XX |  # Correctly broken up, colors work on both lines
| efghij)     |    |
+-------------+----+
>>> print(tabulate.tabulate(tabular_data=(('012345 (\x1b[32mabcdefghij\x1b[0m)', 'XX'),), maxcolwidths=11, tablefmt="grid"))
+-------------+----+
| 012345 ( XX | 
| 2mabcdefghi |    |
| j)          |    |
+-------------+----+
>>> print(tabulate.tabulate(tabular_data=(('0123456 (\x1b[32mabcdefghij\x1b[0m)', 'XX'),), maxcolwidths=11, tablefmt="grid"))
+-------------+----+
| 0123456 ( XX |
| 32mabcdefgh |    |
| ij)         |    |
+-------------+----+
>>> print(tabulate.tabulate(tabular_data=(('01234567 (\x1b[32mabcdefghij\x1b[0m)', 'XX'),), maxcolwidths=11, tablefmt="grid"))
+-------------+----+
| 01234567 (� | XX |
| [32mabcdefg |    |
| hij)        |    |
+-------------+----+
>>> print(tabulate.tabulate(tabular_data=(('012345678 (\x1b[32mabcdefghij\x1b[0m)', 'XX'),), maxcolwidths=11, tablefmt="grid"))
+-------------+----+
| 012345678 ( | XX |  # Correctly broken up, colors work on both lines
| abcdefghij) |    |
+-------------+----+

We can see how the ANSI escape code is broken by looking at the repr, e.g. '| 0123456 (\x1b[ | XX |', '| 32mabcdefgh | |' for the "0123456" case.

Tested on Windows Terminal with both Powershell and WSL, on Python 3.11 and 3.12.

Thank you for this wonderful library!

@devdanzin
Copy link
Contributor Author

The issue is that _CustomTextWrap._handle_long_word doesn't take ANSI escape codes into account when breaking up words.

There is a simple but incomplete fix: add len(_ansi_codes.search(chunk).group()) to i so when we run cur_line.append(chunk[: i - 1]) it actually copies the whole thing, normal chars and escape codes included.

But it's incomplete as the escape codes may appear after the split, so we'd include spurious normal characters in the line. I'm working on creating tests for these cases and a proper fix.

@devdanzin
Copy link
Contributor Author

You can create a file with:

import tabulate

strip_ansi = tabulate._strip_ansi  # type: ignore
ansi_codes = tabulate._ansi_codes  # type: ignore


def handle_long_word(
    self, reversed_chunks: List[str], cur_line: List[str], cur_len: int, width: int
):
    """
    Handle a chunk of text that is too long to fit in any line.
    Fixed version of tabulate._CustomTextWrap._handle_long_word that avoids a
    wrapping bug (https://github.com/astanin/python-tabulate/issues/307) where
    ANSI escape codes would be broken up in the middle.
    """
    # Figure out when indent is larger than the specified width, and make
    # sure at least one character is stripped off on every pass
    if width < 1:
        space_left = 1
    else:
        space_left = width - cur_len

    # If we're allowed to break long words, then do so: put as much
    # of the next chunk onto the current line as will fit.
    if self.break_long_words:
        # Tabulate Custom: Build the string up piece-by-piece in order to
        # take each character's width into account
        chunk = reversed_chunks[-1]
        i = 1
        # Only count printable characters, so strip_ansi first, index later.
        while len(strip_ansi(chunk)[:i]) <= space_left:
            i = i + 1
        # Consider escape codes when breaking words up
        total_escape_len = 0
        last_group = 0
        if ansi_codes.search(chunk) is not None:
            for group, _, _, _ in ansi_codes.findall(chunk):
                escape_len = len(group)
                if group in chunk[last_group : i + total_escape_len + escape_len - 1]:
                    total_escape_len += escape_len
                    found = ansi_codes.search(chunk[last_group:])
                    last_group += found.end()
        cur_line.append(chunk[: i + total_escape_len - 1])
        reversed_chunks[-1] = chunk[i + total_escape_len - 1 :]

    # Otherwise, we have to preserve the long word intact.  Only add
    # it to the current line if there's nothing already there --
    # that minimizes how much we violate the width constraint.
    elif not cur_line:
        cur_line.append(reversed_chunks.pop())

    # If we're not allowed to break long words, and there's already
    # text on the current line, do nothing.  Next time through the
    # main loop of _wrap_chunks(), we'll wind up here again, but
    # cur_len will be zero, so the next line will be entirely
    # devoted to the long word that we can't handle right now.

Then you import handle_long_word from that file and monkeypatch tabulate after you import it, but before you use it:

from some_file import handle_long_word
import tabulate

tabulate._CustomTextWrap._handle_long_word = handle_long_word 

# Use tabulate.tabulate() here and it should be fixed.

Hope this helps, please let me know if it doesn't work or you find any new issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant