-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Column wrapping may break ANSI escape codes #307
Comments
The issue is that _CustomTextWrap._handle_long_word doesn't take ANSI escape codes into account when breaking up words. There is a simple but incomplete fix: add But it's incomplete as the escape codes may appear after the split, so we'd include spurious normal characters in the line. I'm working on creating tests for these cases and a proper fix. |
You can create a file with: import tabulate
strip_ansi = tabulate._strip_ansi # type: ignore
ansi_codes = tabulate._ansi_codes # type: ignore
def handle_long_word(
self, reversed_chunks: List[str], cur_line: List[str], cur_len: int, width: int
):
"""
Handle a chunk of text that is too long to fit in any line.
Fixed version of tabulate._CustomTextWrap._handle_long_word that avoids a
wrapping bug (https://github.com/astanin/python-tabulate/issues/307) where
ANSI escape codes would be broken up in the middle.
"""
# Figure out when indent is larger than the specified width, and make
# sure at least one character is stripped off on every pass
if width < 1:
space_left = 1
else:
space_left = width - cur_len
# If we're allowed to break long words, then do so: put as much
# of the next chunk onto the current line as will fit.
if self.break_long_words:
# Tabulate Custom: Build the string up piece-by-piece in order to
# take each character's width into account
chunk = reversed_chunks[-1]
i = 1
# Only count printable characters, so strip_ansi first, index later.
while len(strip_ansi(chunk)[:i]) <= space_left:
i = i + 1
# Consider escape codes when breaking words up
total_escape_len = 0
last_group = 0
if ansi_codes.search(chunk) is not None:
for group, _, _, _ in ansi_codes.findall(chunk):
escape_len = len(group)
if group in chunk[last_group : i + total_escape_len + escape_len - 1]:
total_escape_len += escape_len
found = ansi_codes.search(chunk[last_group:])
last_group += found.end()
cur_line.append(chunk[: i + total_escape_len - 1])
reversed_chunks[-1] = chunk[i + total_escape_len - 1 :]
# Otherwise, we have to preserve the long word intact. Only add
# it to the current line if there's nothing already there --
# that minimizes how much we violate the width constraint.
elif not cur_line:
cur_line.append(reversed_chunks.pop())
# If we're not allowed to break long words, and there's already
# text on the current line, do nothing. Next time through the
# main loop of _wrap_chunks(), we'll wind up here again, but
# cur_len will be zero, so the next line will be entirely
# devoted to the long word that we can't handle right now. Then you import from some_file import handle_long_word
import tabulate
tabulate._CustomTextWrap._handle_long_word = handle_long_word
# Use tabulate.tabulate() here and it should be fixed. Hope this helps, please let me know if it doesn't work or you find any new issues. |
When creating a table with
maxcolwidths
, ANSI escape codes sometimes get wrongly split.Here's an example, increasing the length of the "0123..." sequence to show the issue:
We can see how the ANSI escape code is broken by looking at the repr, e.g.
'| 0123456 (\x1b[ | XX |', '| 32mabcdefgh | |'
for the "0123456" case.Tested on Windows Terminal with both Powershell and WSL, on Python 3.11 and 3.12.
Thank you for this wonderful library!
The text was updated successfully, but these errors were encountered: