Bug fix for issue #318 | `intfmt`: error for strings which contain integer values #319

airvzxf · 2024-04-13T06:31:15Z

Bug fix for issue #318 | `intfmt`: error for strings which contain integer values

Description.

Resolved the problem with the intfmt argument.

Added tests for these changes.
Test results: 308 passed, 2 skipped in 1.10s.

devdanzin · 2024-04-13T11:49:39Z

tabulate/__init__.py

@@ -1229,6 +1229,8 @@ def _format(val, valtype, floatfmt, intfmt, missingval="", has_invisible=True):
    if valtype is str:
        return f"{val}"
    elif valtype is int:
+        if isinstance(val, str):
+            intfmt = ""


Maybe convert val to int instead, similarly to what happens when valtype is float? That way, an int in a string would still get formatted.

Done as requested.

devdanzin · 2024-04-13T11:50:05Z

Thank you for identifying the issue and proposing a bug fix!

I was wondering that maybe we should rather convert val to int instead, similarly to what happens when valtype is float. That way, an int in a string would still get formatted, like what happens for a float. What do you think?

Also, I see you added a nice test for colored ints. If we choose to format ints passed as string, I think it would be nice to format them when the string has color too, same as happens for float. I'd be willing to provide a new PR for that if we decide it's an improvement worth having.

airvzxf · 2024-04-13T17:05:05Z

I was wondering that maybe we should rather convert val to int instead, similarly to what happens when valtype is float. That way, an int in a string would still get formatted, like what happens for a float. What do you think?

Also, I see you added a nice test for colored ints. If we choose to format ints passed as string, I think it would be nice to format them when the string has color too, same as happens for float. I'd be willing to provide a new PR for that if we decide it's an improvement worth having.

In fact, it was one of my first solutions, but this breaks some unit tests. I made several changes before arriving at the proposed solution, but the most important thing is that after all these experiments, I discovered that tabulate may need deeper refactoring.

Unit test

Here are the results of the unit tests with this change:

elif valtype is int:
    # if isinstance(val, str):
    #     intfmt = ""
    return format(int(val), intfmt)

Unit tests results:

==================================================================== FAILURES ====================================================================
_______________________________________________________ test_intfmt_with_string_as_integer _______________________________________________________
                                                                                                                                                  
    def test_intfmt_with_string_as_integer():                                                                                                     
        "Output: integer format"                                                                                                                  
        result = tabulate([[82642], ["1500"], [2463]], intfmt=",", tablefmt="plain")                                                              
        expected = "82,642\n  1500\n 2,463"                                                                                                       
>       assert_equal(expected, result)                                                                                                            
                                                                                                                                                  
test/test_output.py:2646:                                                                                                                         
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
                                                                                                                                                  
expected = '82,642\n  1500\n 2,463', result = '82,642\n 1,500\n 2,463'                                                          10:14:37 [140/612]
                                                                                                                                                  
    def assert_equal(expected, result):                                                                                                           
        print("Expected:\n%s\n" % expected)                                                                                                       
        print("Got:\n%s\n" % result)                                                                                                              
>       assert expected == result                                                                                                                 
E       AssertionError                                                                                                                            
                                                                                                                                                  
test/common.py:8: AssertionError                                                                                                                  
-------------------------------------------------------------- Captured stdout call --------------------------------------------------------------
Expected:                                                                                                                                         
82,642                                                                                                                                            
  1500                                                                                                                                            
 2,463                                                                                                                                            
                                                                                                                                                  
Got:                                                                                                                                              
82,642                                                                                                                                            
 1,500                                                                                                                                            
 2,463                                                                                                                                            
                                                                                                                                                  
____________________________________________________________ test_intfmt_with_colors _____________________________________________________________
                                                                                                                                                  
    def test_intfmt_with_colors():                                                                                                                
        "Regression: Align ANSI-colored values as if they were colorless."                                                                        
        colortable = [                                                                                                                            
            ("abcd", 42, "\x1b[31m42\x1b[0m"),                                                                                                    
            ("elfy", 1010, "\x1b[32m1010\x1b[0m"),                                                                                                
        ]                                                                                                                                         
        colorheaders = ("test", "\x1b[34mtest\x1b[0m", "test")                                                                                    
>       formatted = tabulate(colortable, colorheaders, "grid", intfmt=",")                                                                        
                                                                                                                                                  
test/test_output.py:2664:                                                                                                                         
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tabulate/__init__.py:2198: in tabulate                                                                                                            
    cols = [                                                                                                                                      
tabulate/__init__.py:2199: in <listcomp>                                                                                        10:14:37 [105/612]
    [_format(v, ct, fl_fmt, int_fmt, miss_v, has_invisible) for v in c]                                                                           
tabulate/__init__.py:2199: in <listcomp>                                                                                                          
    [_format(v, ct, fl_fmt, int_fmt, miss_v, has_invisible) for v in c]                                                                           
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
                                                                                                                                                  
val = '\x1b[31m42\x1b[0m', valtype = <class 'int'>, floatfmt = 'g', intfmt = ',', missingval = '', has_invisible = True                           
                                                                                                                                                  
    def _format(val, valtype, floatfmt, intfmt, missingval="", has_invisible=True):                                                               
        """Format a value according to its type.                                                                                                  
                                                                                                                                                  
        Unicode is supported:                                                                                                                     
                                                                                                                                                  
        >>> hrow = ['\u0431\u0443\u043a\u0432\u0430', '\u0446\u0438\u0444\u0440\u0430'] ; \                                                       
            tbl = [['\u0430\u0437', 2], ['\u0431\u0443\u043a\u0438', 4]] ; \                                                                      
            good_result = '\\u0431\\u0443\\u043a\\u0432\\u0430      \\u0446\\u0438\\u0444\\u0440\\u0430\\n-------  -------\\n\\u0430\\u0437       
      2\\n\\u0431\\u0443\\u043a\\u0438           4' ; \                                                                                           
            tabulate(tbl, headers=hrow) == good_result                                                                                            
        True                                                                                                                                      
                                                                                                                                                  
        """  # noqa                                                                                                                               
        if val is None:                                                                                                                           
            return missingval                                                                                                                     
                                                                                                                                                  
        if valtype is str:                                                                                                                        
            return f"{val}"                                                                                                                       
        elif valtype is int:                                                                                                                      
            # if isinstance(val, str):                                                                                                            
            #     intfmt = ""                                                                                                                     
>           return format(int(val), intfmt)                                                                                                       
E           ValueError: invalid literal for int() with base 10: '\x1b[31m42\x1b[0m'                                                               
                                                                                                                                                  
tabulate/__init__.py:1234: ValueError                                                                                                             
________________________________________________________ test_alignment_of_colored_cells _________________________________________________________
                                                                                                                                                  
    def test_alignment_of_colored_cells():                                                                                       10:14:37 [70/612]
        "Regression: Align ANSI-colored values as if they were colorless."                                                                        
        colortable = [                                                                                                                            
            ("test", 42, "\x1b[31m42\x1b[0m"),                                                                                                    
            ("test", 101, "\x1b[32m101\x1b[0m"),                                                                                                  
        ]                                                                                                                                         
        colorheaders = ("test", "\x1b[34mtest\x1b[0m", "test")                                                                                    
>       formatted = tabulate(colortable, colorheaders, "grid")                                                                                    
                                                                                                                                                  
test/test_regression.py:30:                                                                                                                       
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tabulate/__init__.py:2198: in tabulate                                                                                                            
    cols = [                                                                                                                                      
tabulate/__init__.py:2199: in <listcomp>                                                                                                          
    [_format(v, ct, fl_fmt, int_fmt, miss_v, has_invisible) for v in c]                                                                           
tabulate/__init__.py:2199: in <listcomp>                                                                                                          
    [_format(v, ct, fl_fmt, int_fmt, miss_v, has_invisible) for v in c]                                                                           
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
                                                                                                                                                  
val = '\x1b[31m42\x1b[0m', valtype = <class 'int'>, floatfmt = 'g', intfmt = '', missingval = '', has_invisible = True                            
                                                                                                                                                  
    def _format(val, valtype, floatfmt, intfmt, missingval="", has_invisible=True):                                                               
        """Format a value according to its type.                                                                                                  
                                                                                                                                                  
        Unicode is supported:                                                                                                                     
                                                                                                                                                  
        >>> hrow = ['\u0431\u0443\u043a\u0432\u0430', '\u0446\u0438\u0444\u0440\u0430'] ; \                                                       
            tbl = [['\u0430\u0437', 2], ['\u0431\u0443\u043a\u0438', 4]] ; \                                                                      
            good_result = '\\u0431\\u0443\\u043a\\u0432\\u0430      \\u0446\\u0438\\u0444\\u0440\\u0430\\n-------  -------\\n\\u0430\\u0437       
      2\\n\\u0431\\u0443\\u043a\\u0438           4' ; \                                                                                           
            tabulate(tbl, headers=hrow) == good_result                                                                                            
        True                                                                                                                                      
                                                                                                                                                  
        """  # noqa                                                                                                                               
        if val is None:                                                                                                                           
            return missingval                                                                                                                     

        if valtype is str:                                                                                                                        
            return f"{val}"                                                                                                                       
        elif valtype is int:                                                                                                                      
            # if isinstance(val, str):                                                                                                            
            #     intfmt = ""                                                                                                                     
>           return format(int(val), intfmt)                                                                                                       
E           ValueError: invalid literal for int() with base 10: '\x1b[31m42\x1b[0m'                                                               
                                                                                                                                                  
tabulate/__init__.py:1234: ValueError                                                                                                             
________________________________________________________ test_ansi_color_bold_and_fgcolor ________________________________________________________
                                                                                                                                                  
    def test_ansi_color_bold_and_fgcolor():                                                                                                       
        "Regression: set ANSI color and bold face together (issue #65)"                                                                           
        table = [["1", "2", "3"], ["4", "\x1b[1;31m5\x1b[1;m", "6"], ["7", "8", "9"]]                                                             
>       result = tabulate(table, tablefmt="grid")                                                                                                 
                                                                                                                                                  
test/test_regression.py:374:                                                                                                                      
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tabulate/__init__.py:2198: in tabulate                                                                                                            
    cols = [                                                                                                                                      
tabulate/__init__.py:2199: in <listcomp>                                                                                                          
    [_format(v, ct, fl_fmt, int_fmt, miss_v, has_invisible) for v in c]                                                                           
tabulate/__init__.py:2199: in <listcomp>                                                                                                          
    [_format(v, ct, fl_fmt, int_fmt, miss_v, has_invisible) for v in c]                                                                           
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
                                                                                                                                                  
val = '\x1b[1;31m5\x1b[1;m', valtype = <class 'int'>, floatfmt = 'g', intfmt = '', missingval = '', has_invisible = True                          
                                                                                                                                                  
    def _format(val, valtype, floatfmt, intfmt, missingval="", has_invisible=True):                                                               
        """Format a value according to its type.                                                                                                  
                                                                                                                                                  
        Unicode is supported:                                                                                                                     
                                                                                                                                                  
        >>> hrow = ['\u0431\u0443\u043a\u0432\u0430', '\u0446\u0438\u0444\u0440\u0430'] ; \
            tbl = [['\u0430\u0437', 2], ['\u0431\u0443\u043a\u0438', 4]] ; \
            good_result = '\\u0431\\u0443\\u043a\\u0432\\u0430      \\u0446\\u0438\\u0444\\u0440\\u0430\\n-------  -------\\n\\u0430\\u0437             2\\n\\u0431\\u0443\\u043a\\u0438           4' ; \
            tabulate(tbl, headers=hrow) == good_result
        True
    
        """  # noqa
        if val is None:
            return missingval
    
        if valtype is str:
            return f"{val}"
        elif valtype is int:
            # if isinstance(val, str):
            #     intfmt = ""
>           return format(int(val), intfmt)
E           ValueError: invalid literal for int() with base 10: '\x1b[1;31m5\x1b[1;m'

tabulate/__init__.py:1234: ValueError
============================================================ short test summary info =============================================================
FAILED test/test_output.py::test_intfmt_with_string_as_integer - AssertionError
FAILED test/test_output.py::test_intfmt_with_colors - ValueError: invalid literal for int() with base 10: '\x1b[31m42\x1b[0m'
FAILED test/test_regression.py::test_alignment_of_colored_cells - ValueError: invalid literal for int() with base 10: '\x1b[31m42\x1b[0m'
FAILED test/test_regression.py::test_ansi_color_bold_and_fgcolor - ValueError: invalid literal for int() with base 10: '\x1b[1;31m5\x1b[1;m'
==================================================== 4 failed, 304 passed, 2 skipped in 2.06s ====================================================

Conclusions

Description

If we made this change, the strings which contain integers will be formatted. It is not bad, but I was thinking that if I want some number as a string is because maybe I would rather not format as an integer. It will be a hard decision for the owners and maintainers of this Python package.

In the test test_intfmt_with_string_as_integer, the data [[82642], ["1500"], [2463]] is transformed to 82,642 | 1,500 | 2,463.

We need to explore the way to solve the colored numbers. I'll research in the code if there is something that I can reuse or create a new functionality to detect the colored number and format the integer.

Taking advantage of this talk, I would like to recommend that each data value be treated independently, since apparently if it encounters at least one floating number, it takes all numbers as floats, even if they are not.

@pytest.mark.skip(reason="It detects all values as floats but there are strings and integers.")
def test_intfmt_with_string_with_floats():
    "Output: integer format"
    result = tabulate([[82000.38], ["1500.47"], ["2463"], [92165]], intfmt=",", tablefmt="plain")
    expected = "82000.4\n 1500.47\n 2,463\n92,165"
    assert_equal(expected, result)

For this, data [[82000.38], ["1500.47"], ["2463"], [92165]] all the numbers were assigned as a float.

val: 82000.38                                                                                                                                     
    val: <class 'float'>                                                                                                                          
valtype: <class 'float'>                                                                                                                          
                                                                                                                                                  
val: 1500.47                                                                                                                                      
    val: <class 'str'>                                                                                                                            
valtype: <class 'float'>

val: 2463
    val: <class 'str'>
valtype: <class 'float'>

val: 92165
    val: <class 'int'>
valtype: <class 'float'>

Expected:
82000.4
 1500.47
 2,463
92,165

Got:
82000.4
 1500.47
 2463
92165

Furthermore, I noticed that the float 82000.38 is rounded to 82000.4, but not the float 1500.47. Is this an expected behavior?

test_intfmt_with_string_as_integer

tabulate([[82642], ["1500"], [2463]], intfmt=",", tablefmt="plain")

Expected:
82,642
1500
2,463

Got:
82,642
1,500
2,463

test_intfmt_with_colors

tabulate(colortable, colorheaders, "grid", intfmt=",")

ValueError: invalid literal for int() with base 10: \x1b[31m42\x1b[0m

--

test_alignment_of_colored_cells

colortable = [
    ("test", 42, "\x1b[31m42\x1b[0m"),
    ("test", 101, "\x1b[32m101\x1b[0m"),
]
colorheaders = ("test", "\x1b[34mtest\x1b[0m", "test")
formatted = tabulate(colortable, colorheaders, "grid")

ValueError: invalid literal for int() with base 10: \x1b[31m42\x1b[0m

test_ansi_color_bold_and_fgcolor

ValueError: invalid literal for int() with base 10: \x1b[1;31m5\x1b[1;m

Final note

I look forward to your reply or guide to start working on solutions. Otherwise, I could make code, but I really don't know if it aligns with where you want to go.

airvzxf · 2024-04-13T17:34:53Z

It would be the solution for the colored numbers, at least at Regular Expression level.

Regular expression: (\\x[0-9a-fA-F]+\[\d+m)([0-9\.]+)(\\.*)

For this example

\x1b[31m42\x1b[0m
\x1b[32m1010\x1b[0m

Match for these examples:

\x1b[31m   42   \x1b[0m
\x1b[32m   1010   \x1b[0m

airvzxf · 2024-04-13T22:11:35Z

In the above commit: e45cac1; I added the validation and formatting when the number is colored.

airvzxf · 2024-04-13T22:21:33Z

Here other option to format the strings which contain integers. For example, in this data [[82642], ["1500"], [2463]] the result should be 82,642 | 1,500 | 2,463 rather than 82,642 | 1500 | 2,463.

The only change that needs to be performed is adding some elif:

            elif val.isdigit():
                val = format(int(val), intfmt)

In this code:

    if valtype is str:
        return f"{val}"
    elif valtype is int:
        if isinstance(val, str):
            val_striped = val.encode('unicode_escape').decode('utf-8')
            colored = re.search(r'(\\[xX]+[0-9a-fA-F]+\[\d+[mM]+)([0-9.]+)(\\.*)$', val_striped)
            if colored:
                total_groups = len(colored.groups())
                if total_groups == 3:
                    digits = colored.group(2)
                    if digits.isdigit():
                        val_new = colored.group(1) + format(int(digits), intfmt) + colored.group(3)
                        val = val_new.encode('utf-8').decode('unicode_escape')
            elif val.isdigit():
                val = format(int(val), intfmt)
            intfmt = ""
        return format(val, intfmt)

This is the result of the test test_intfmt_with_string_as_integer.

result = tabulate([[82642], ["1500"], [2463]], intfmt=",", tablefmt="plain")
expected = "82,642\n  1500\n 2,463"

Expected:
82,642
  1500
 2,463

Got:
82,642
 1,500
 2,463

Note:

If you think this change is important and relevant to your package, please let me know, so I can add this change to the pull requests.

airvzxf

I made the suggested changes.

airvzxf · 2024-04-13T22:22:48Z

tabulate/__init__.py

@@ -1229,6 +1229,8 @@ def _format(val, valtype, floatfmt, intfmt, missingval="", has_invisible=True):
    if valtype is str:
        return f"{val}"
    elif valtype is int:
+        if isinstance(val, str):
+            intfmt = ""


Done as requested.

Fixed the problem with the intfmt argument.

92cb709

devdanzin reviewed Apr 13, 2024

View reviewed changes

Added validation for integer numbers format when the number is colored.

e45cac1

Updated the import of the PyTest jump decorator.

23b3cc6

airvzxf commented Apr 18, 2024

View reviewed changes

astanin merged commit 12a805b into astanin:master Sep 26, 2024

airvzxf deleted the bugfix/318/intfmt-string-values branch September 26, 2024 15:22

airvzxf mentioned this pull request Sep 27, 2024

Bug: intfmt error for strings which contain integer values #318

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug fix for issue #318 | `intfmt`: error for strings which contain integer values #319

Bug fix for issue #318 | `intfmt`: error for strings which contain integer values #319

Uh oh!

airvzxf commented Apr 13, 2024

Uh oh!

devdanzin Apr 13, 2024

Uh oh!

airvzxf Apr 13, 2024

Uh oh!

devdanzin commented Apr 13, 2024 •

edited

Loading

Uh oh!

airvzxf commented Apr 13, 2024

Uh oh!

airvzxf commented Apr 13, 2024

Uh oh!

airvzxf commented Apr 13, 2024

Uh oh!

airvzxf commented Apr 13, 2024

Uh oh!

airvzxf left a comment

Uh oh!

airvzxf Apr 13, 2024

Uh oh!

Uh oh!

Bug fix for issue #318 | intfmt: error for strings which contain integer values #319

Bug fix for issue #318 | intfmt: error for strings which contain integer values #319

Uh oh!

Conversation

airvzxf commented Apr 13, 2024

Bug fix for issue #318 | intfmt: error for strings which contain integer values

Description.

Uh oh!

devdanzin Apr 13, 2024

Choose a reason for hiding this comment

Uh oh!

airvzxf Apr 13, 2024

Choose a reason for hiding this comment

Uh oh!

devdanzin commented Apr 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

airvzxf commented Apr 13, 2024

Unit test

Conclusions

Description

test_intfmt_with_string_as_integer

test_intfmt_with_colors

test_alignment_of_colored_cells

test_ansi_color_bold_and_fgcolor

Final note

Uh oh!

airvzxf commented Apr 13, 2024

Uh oh!

airvzxf commented Apr 13, 2024

Uh oh!

airvzxf commented Apr 13, 2024

Note:

Uh oh!

airvzxf left a comment

Choose a reason for hiding this comment

Uh oh!

airvzxf Apr 13, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Bug fix for issue #318 | `intfmt`: error for strings which contain integer values #319

Bug fix for issue #318 | `intfmt`: error for strings which contain integer values #319

Bug fix for issue #318 | `intfmt`: error for strings which contain integer values

devdanzin commented Apr 13, 2024 •

edited

Loading