Column wrapping may break ANSI escape codes #307

devdanzin · 2024-01-12T16:01:14Z

When creating a table with maxcolwidths, ANSI escape codes sometimes get wrongly split.

Here's an example, increasing the length of the "0123..." sequence to show the issue:

print(tabulate.tabulate(tabular_data=(('01234 (\x1b[32mabcdefghij\x1b[0m)', 'XX'),), maxcolwidths=11, tablefmt="grid"))
+-------------+----+
| 01234 (abcd | XX |  # Correctly broken up, colors work on both lines
| efghij)     |    |
+-------------+----+
>>> print(tabulate.tabulate(tabular_data=(('012345 (\x1b[32mabcdefghij\x1b[0m)', 'XX'),), maxcolwidths=11, tablefmt="grid"))
+-------------+----+
| 012345 ( XX | 
| 2mabcdefghi |    |
| j)          |    |
+-------------+----+
>>> print(tabulate.tabulate(tabular_data=(('0123456 (\x1b[32mabcdefghij\x1b[0m)', 'XX'),), maxcolwidths=11, tablefmt="grid"))
+-------------+----+
| 0123456 ( XX |
| 32mabcdefgh |    |
| ij)         |    |
+-------------+----+
>>> print(tabulate.tabulate(tabular_data=(('01234567 (\x1b[32mabcdefghij\x1b[0m)', 'XX'),), maxcolwidths=11, tablefmt="grid"))
+-------------+----+
| 01234567 (� | XX |
| [32mabcdefg |    |
| hij)        |    |
+-------------+----+
>>> print(tabulate.tabulate(tabular_data=(('012345678 (\x1b[32mabcdefghij\x1b[0m)', 'XX'),), maxcolwidths=11, tablefmt="grid"))
+-------------+----+
| 012345678 ( | XX |  # Correctly broken up, colors work on both lines
| abcdefghij) |    |
+-------------+----+

We can see how the ANSI escape code is broken by looking at the repr, e.g. '| 0123456 (\x1b[ | XX |', '| 32mabcdefgh | |' for the "0123456" case.

Tested on Windows Terminal with both Powershell and WSL, on Python 3.11 and 3.12.

Thank you for this wonderful library!

The text was updated successfully, but these errors were encountered:

devdanzin · 2024-01-12T23:43:59Z

The issue is that _CustomTextWrap._handle_long_word doesn't take ANSI escape codes into account when breaking up words.

There is a simple but incomplete fix: add len(_ansi_codes.search(chunk).group()) to i so when we run cur_line.append(chunk[: i - 1]) it actually copies the whole thing, normal chars and escape codes included.

But it's incomplete as the escape codes may appear after the split, so we'd include spurious normal characters in the line. I'm working on creating tests for these cases and a proper fix.

devdanzin · 2024-01-23T03:44:25Z

You can create a file with:

import tabulate

strip_ansi = tabulate._strip_ansi  # type: ignore
ansi_codes = tabulate._ansi_codes  # type: ignore


def handle_long_word(
    self, reversed_chunks: List[str], cur_line: List[str], cur_len: int, width: int
):
    """
    Handle a chunk of text that is too long to fit in any line.
    Fixed version of tabulate._CustomTextWrap._handle_long_word that avoids a
    wrapping bug (https://github.com/astanin/python-tabulate/issues/307) where
    ANSI escape codes would be broken up in the middle.
    """
    # Figure out when indent is larger than the specified width, and make
    # sure at least one character is stripped off on every pass
    if width < 1:
        space_left = 1
    else:
        space_left = width - cur_len

    # If we're allowed to break long words, then do so: put as much
    # of the next chunk onto the current line as will fit.
    if self.break_long_words:
        # Tabulate Custom: Build the string up piece-by-piece in order to
        # take each character's width into account
        chunk = reversed_chunks[-1]
        i = 1
        # Only count printable characters, so strip_ansi first, index later.
        while len(strip_ansi(chunk)[:i]) <= space_left:
            i = i + 1
        # Consider escape codes when breaking words up
        total_escape_len = 0
        last_group = 0
        if ansi_codes.search(chunk) is not None:
            for group, _, _, _ in ansi_codes.findall(chunk):
                escape_len = len(group)
                if group in chunk[last_group : i + total_escape_len + escape_len - 1]:
                    total_escape_len += escape_len
                    found = ansi_codes.search(chunk[last_group:])
                    last_group += found.end()
        cur_line.append(chunk[: i + total_escape_len - 1])
        reversed_chunks[-1] = chunk[i + total_escape_len - 1 :]

    # Otherwise, we have to preserve the long word intact.  Only add
    # it to the current line if there's nothing already there --
    # that minimizes how much we violate the width constraint.
    elif not cur_line:
        cur_line.append(reversed_chunks.pop())

    # If we're not allowed to break long words, and there's already
    # text on the current line, do nothing.  Next time through the
    # main loop of _wrap_chunks(), we'll wind up here again, but
    # cur_len will be zero, so the next line will be entirely
    # devoted to the long word that we can't handle right now.

Then you import handle_long_word from that file and monkeypatch tabulate after you import it, but before you use it:

from some_file import handle_long_word
import tabulate

tabulate._CustomTextWrap._handle_long_word = handle_long_word 

# Use tabulate.tabulate() here and it should be fixed.

Hope this helps, please let me know if it doesn't work or you find any new issues.

This was referenced Jan 15, 2024

Fix column wrapping breaking ANSI escape codes (fixes #307) #308

Merged

Fix wrapping of colored values tonybaloney/wily#229

Open

astanin closed this as completed in #308 Sep 27, 2024

astanin closed this as completed in bbc5ff1 Sep 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Column wrapping may break ANSI escape codes #307

Column wrapping may break ANSI escape codes #307

devdanzin commented Jan 12, 2024

devdanzin commented Jan 12, 2024

devdanzin commented Jan 23, 2024

Column wrapping may break ANSI escape codes #307

Column wrapping may break ANSI escape codes #307

Comments

devdanzin commented Jan 12, 2024

devdanzin commented Jan 12, 2024

devdanzin commented Jan 23, 2024