Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug fix for issue #318 | intfmt: error for strings which contain integer values #319

Merged
merged 3 commits into from
Sep 26, 2024

Conversation

airvzxf
Copy link
Contributor

@airvzxf airvzxf commented Apr 13, 2024

Bug fix for issue #318 | intfmt: error for strings which contain integer values

Description.

Resolved the problem with the intfmt argument.

Added tests for these changes.
Test results: 308 passed, 2 skipped in 1.10s.

@@ -1229,6 +1229,8 @@ def _format(val, valtype, floatfmt, intfmt, missingval="", has_invisible=True):
if valtype is str:
return f"{val}"
elif valtype is int:
if isinstance(val, str):
intfmt = ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe convert val to int instead, similarly to what happens when valtype is float? That way, an int in a string would still get formatted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done as requested.

@devdanzin
Copy link
Contributor

devdanzin commented Apr 13, 2024

Thank you for identifying the issue and proposing a bug fix!

I was wondering that maybe we should rather convert val to int instead, similarly to what happens when valtype is float. That way, an int in a string would still get formatted, like what happens for a float. What do you think?

Also, I see you added a nice test for colored ints. If we choose to format ints passed as string, I think it would be nice to format them when the string has color too, same as happens for float. I'd be willing to provide a new PR for that if we decide it's an improvement worth having.

@airvzxf
Copy link
Contributor Author

airvzxf commented Apr 13, 2024

I was wondering that maybe we should rather convert val to int instead, similarly to what happens when valtype is float. That way, an int in a string would still get formatted, like what happens for a float. What do you think?

Also, I see you added a nice test for colored ints. If we choose to format ints passed as string, I think it would be nice to format them when the string has color too, same as happens for float. I'd be willing to provide a new PR for that if we decide it's an improvement worth having.

In fact, it was one of my first solutions, but this breaks some unit tests. I made several changes before arriving at the proposed solution, but the most important thing is that after all these experiments, I discovered that tabulate may need deeper refactoring.

Unit test

Here are the results of the unit tests with this change:

elif valtype is int:
    # if isinstance(val, str):
    #     intfmt = ""
    return format(int(val), intfmt)

Unit tests results:

==================================================================== FAILURES ====================================================================
_______________________________________________________ test_intfmt_with_string_as_integer _______________________________________________________
                                                                                                                                                  
    def test_intfmt_with_string_as_integer():                                                                                                     
        "Output: integer format"                                                                                                                  
        result = tabulate([[82642], ["1500"], [2463]], intfmt=",", tablefmt="plain")                                                              
        expected = "82,642\n  1500\n 2,463"                                                                                                       
>       assert_equal(expected, result)                                                                                                            
                                                                                                                                                  
test/test_output.py:2646:                                                                                                                         
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
                                                                                                                                                  
expected = '82,642\n  1500\n 2,463', result = '82,642\n 1,500\n 2,463'                                                          10:14:37 [140/612]
                                                                                                                                                  
    def assert_equal(expected, result):                                                                                                           
        print("Expected:\n%s\n" % expected)                                                                                                       
        print("Got:\n%s\n" % result)                                                                                                              
>       assert expected == result                                                                                                                 
E       AssertionError                                                                                                                            
                                                                                                                                                  
test/common.py:8: AssertionError                                                                                                                  
-------------------------------------------------------------- Captured stdout call --------------------------------------------------------------
Expected:                                                                                                                                         
82,642                                                                                                                                            
  1500                                                                                                                                            
 2,463                                                                                                                                            
                                                                                                                                                  
Got:                                                                                                                                              
82,642                                                                                                                                            
 1,500                                                                                                                                            
 2,463                                                                                                                                            
                                                                                                                                                  
____________________________________________________________ test_intfmt_with_colors _____________________________________________________________
                                                                                                                                                  
    def test_intfmt_with_colors():                                                                                                                
        "Regression: Align ANSI-colored values as if they were colorless."                                                                        
        colortable = [                                                                                                                            
            ("abcd", 42, "\x1b[31m42\x1b[0m"),                                                                                                    
            ("elfy", 1010, "\x1b[32m1010\x1b[0m"),                                                                                                
        ]                                                                                                                                         
        colorheaders = ("test", "\x1b[34mtest\x1b[0m", "test")                                                                                    
>       formatted = tabulate(colortable, colorheaders, "grid", intfmt=",")                                                                        
                                                                                                                                                  
test/test_output.py:2664:                                                                                                                         
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tabulate/__init__.py:2198: in tabulate                                                                                                            
    cols = [                                                                                                                                      
tabulate/__init__.py:2199: in <listcomp>                                                                                        10:14:37 [105/612]
    [_format(v, ct, fl_fmt, int_fmt, miss_v, has_invisible) for v in c]                                                                           
tabulate/__init__.py:2199: in <listcomp>                                                                                                          
    [_format(v, ct, fl_fmt, int_fmt, miss_v, has_invisible) for v in c]                                                                           
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
                                                                                                                                                  
val = '\x1b[31m42\x1b[0m', valtype = <class 'int'>, floatfmt = 'g', intfmt = ',', missingval = '', has_invisible = True                           
                                                                                                                                                  
    def _format(val, valtype, floatfmt, intfmt, missingval="", has_invisible=True):                                                               
        """Format a value according to its type.                                                                                                  
                                                                                                                                                  
        Unicode is supported:                                                                                                                     
                                                                                                                                                  
        >>> hrow = ['\u0431\u0443\u043a\u0432\u0430', '\u0446\u0438\u0444\u0440\u0430'] ; \                                                       
            tbl = [['\u0430\u0437', 2], ['\u0431\u0443\u043a\u0438', 4]] ; \                                                                      
            good_result = '\\u0431\\u0443\\u043a\\u0432\\u0430      \\u0446\\u0438\\u0444\\u0440\\u0430\\n-------  -------\\n\\u0430\\u0437       
      2\\n\\u0431\\u0443\\u043a\\u0438           4' ; \                                                                                           
            tabulate(tbl, headers=hrow) == good_result                                                                                            
        True                                                                                                                                      
                                                                                                                                                  
        """  # noqa                                                                                                                               
        if val is None:                                                                                                                           
            return missingval                                                                                                                     
                                                                                                                                                  
        if valtype is str:                                                                                                                        
            return f"{val}"                                                                                                                       
        elif valtype is int:                                                                                                                      
            # if isinstance(val, str):                                                                                                            
            #     intfmt = ""                                                                                                                     
>           return format(int(val), intfmt)                                                                                                       
E           ValueError: invalid literal for int() with base 10: '\x1b[31m42\x1b[0m'                                                               
                                                                                                                                                  
tabulate/__init__.py:1234: ValueError                                                                                                             
________________________________________________________ test_alignment_of_colored_cells _________________________________________________________
                                                                                                                                                  
    def test_alignment_of_colored_cells():                                                                                       10:14:37 [70/612]
        "Regression: Align ANSI-colored values as if they were colorless."                                                                        
        colortable = [                                                                                                                            
            ("test", 42, "\x1b[31m42\x1b[0m"),                                                                                                    
            ("test", 101, "\x1b[32m101\x1b[0m"),                                                                                                  
        ]                                                                                                                                         
        colorheaders = ("test", "\x1b[34mtest\x1b[0m", "test")                                                                                    
>       formatted = tabulate(colortable, colorheaders, "grid")                                                                                    
                                                                                                                                                  
test/test_regression.py:30:                                                                                                                       
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tabulate/__init__.py:2198: in tabulate                                                                                                            
    cols = [                                                                                                                                      
tabulate/__init__.py:2199: in <listcomp>                                                                                                          
    [_format(v, ct, fl_fmt, int_fmt, miss_v, has_invisible) for v in c]                                                                           
tabulate/__init__.py:2199: in <listcomp>                                                                                                          
    [_format(v, ct, fl_fmt, int_fmt, miss_v, has_invisible) for v in c]                                                                           
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
                                                                                                                                                  
val = '\x1b[31m42\x1b[0m', valtype = <class 'int'>, floatfmt = 'g', intfmt = '', missingval = '', has_invisible = True                            
                                                                                                                                                  
    def _format(val, valtype, floatfmt, intfmt, missingval="", has_invisible=True):                                                               
        """Format a value according to its type.                                                                                                  
                                                                                                                                                  
        Unicode is supported:                                                                                                                     
                                                                                                                                                  
        >>> hrow = ['\u0431\u0443\u043a\u0432\u0430', '\u0446\u0438\u0444\u0440\u0430'] ; \                                                       
            tbl = [['\u0430\u0437', 2], ['\u0431\u0443\u043a\u0438', 4]] ; \                                                                      
            good_result = '\\u0431\\u0443\\u043a\\u0432\\u0430      \\u0446\\u0438\\u0444\\u0440\\u0430\\n-------  -------\\n\\u0430\\u0437       
      2\\n\\u0431\\u0443\\u043a\\u0438           4' ; \                                                                                           
            tabulate(tbl, headers=hrow) == good_result                                                                                            
        True                                                                                                                                      
                                                                                                                                                  
        """  # noqa                                                                                                                               
        if val is None:                                                                                                                           
            return missingval                                                                                                                     

        if valtype is str:                                                                                                                        
            return f"{val}"                                                                                                                       
        elif valtype is int:                                                                                                                      
            # if isinstance(val, str):                                                                                                            
            #     intfmt = ""                                                                                                                     
>           return format(int(val), intfmt)                                                                                                       
E           ValueError: invalid literal for int() with base 10: '\x1b[31m42\x1b[0m'                                                               
                                                                                                                                                  
tabulate/__init__.py:1234: ValueError                                                                                                             
________________________________________________________ test_ansi_color_bold_and_fgcolor ________________________________________________________
                                                                                                                                                  
    def test_ansi_color_bold_and_fgcolor():                                                                                                       
        "Regression: set ANSI color and bold face together (issue #65)"                                                                           
        table = [["1", "2", "3"], ["4", "\x1b[1;31m5\x1b[1;m", "6"], ["7", "8", "9"]]                                                             
>       result = tabulate(table, tablefmt="grid")                                                                                                 
                                                                                                                                                  
test/test_regression.py:374:                                                                                                                      
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tabulate/__init__.py:2198: in tabulate                                                                                                            
    cols = [                                                                                                                                      
tabulate/__init__.py:2199: in <listcomp>                                                                                                          
    [_format(v, ct, fl_fmt, int_fmt, miss_v, has_invisible) for v in c]                                                                           
tabulate/__init__.py:2199: in <listcomp>                                                                                                          
    [_format(v, ct, fl_fmt, int_fmt, miss_v, has_invisible) for v in c]                                                                           
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
                                                                                                                                                  
val = '\x1b[1;31m5\x1b[1;m', valtype = <class 'int'>, floatfmt = 'g', intfmt = '', missingval = '', has_invisible = True                          
                                                                                                                                                  
    def _format(val, valtype, floatfmt, intfmt, missingval="", has_invisible=True):                                                               
        """Format a value according to its type.                                                                                                  
                                                                                                                                                  
        Unicode is supported:                                                                                                                     
                                                                                                                                                  
        >>> hrow = ['\u0431\u0443\u043a\u0432\u0430', '\u0446\u0438\u0444\u0440\u0430'] ; \
            tbl = [['\u0430\u0437', 2], ['\u0431\u0443\u043a\u0438', 4]] ; \
            good_result = '\\u0431\\u0443\\u043a\\u0432\\u0430      \\u0446\\u0438\\u0444\\u0440\\u0430\\n-------  -------\\n\\u0430\\u0437             2\\n\\u0431\\u0443\\u043a\\u0438           4' ; \
            tabulate(tbl, headers=hrow) == good_result
        True
    
        """  # noqa
        if val is None:
            return missingval
    
        if valtype is str:
            return f"{val}"
        elif valtype is int:
            # if isinstance(val, str):
            #     intfmt = ""
>           return format(int(val), intfmt)
E           ValueError: invalid literal for int() with base 10: '\x1b[1;31m5\x1b[1;m'

tabulate/__init__.py:1234: ValueError
============================================================ short test summary info =============================================================
FAILED test/test_output.py::test_intfmt_with_string_as_integer - AssertionError
FAILED test/test_output.py::test_intfmt_with_colors - ValueError: invalid literal for int() with base 10: '\x1b[31m42\x1b[0m'
FAILED test/test_regression.py::test_alignment_of_colored_cells - ValueError: invalid literal for int() with base 10: '\x1b[31m42\x1b[0m'
FAILED test/test_regression.py::test_ansi_color_bold_and_fgcolor - ValueError: invalid literal for int() with base 10: '\x1b[1;31m5\x1b[1;m'
==================================================== 4 failed, 304 passed, 2 skipped in 2.06s ====================================================

Conclusions

Description

If we made this change, the strings which contain integers will be formatted. It is not bad, but I was thinking that if I want some number as a string is because maybe I would rather not format as an integer. It will be a hard decision for the owners and maintainers of this Python package.

In the test test_intfmt_with_string_as_integer, the data [[82642], ["1500"], [2463]] is transformed to 82,642 | 1,500 | 2,463.


We need to explore the way to solve the colored numbers. I'll research in the code if there is something that I can reuse or create a new functionality to detect the colored number and format the integer.


Taking advantage of this talk, I would like to recommend that each data value be treated independently, since apparently if it encounters at least one floating number, it takes all numbers as floats, even if they are not.

@pytest.mark.skip(reason="It detects all values as floats but there are strings and integers.")
def test_intfmt_with_string_with_floats():
    "Output: integer format"
    result = tabulate([[82000.38], ["1500.47"], ["2463"], [92165]], intfmt=",", tablefmt="plain")
    expected = "82000.4\n 1500.47\n 2,463\n92,165"
    assert_equal(expected, result)

For this, data [[82000.38], ["1500.47"], ["2463"], [92165]] all the numbers were assigned as a float.

val: 82000.38                                                                                                                                     
    val: <class 'float'>                                                                                                                          
valtype: <class 'float'>                                                                                                                          
                                                                                                                                                  
val: 1500.47                                                                                                                                      
    val: <class 'str'>                                                                                                                            
valtype: <class 'float'>

val: 2463
    val: <class 'str'>
valtype: <class 'float'>

val: 92165
    val: <class 'int'>
valtype: <class 'float'>

Expected:
82000.4
 1500.47
 2,463
92,165

Got:
82000.4
 1500.47
 2463
92165

Furthermore, I noticed that the float 82000.38 is rounded to 82000.4, but not the float 1500.47. Is this an expected behavior?


test_intfmt_with_string_as_integer

tabulate([[82642], ["1500"], [2463]], intfmt=",", tablefmt="plain")

Expected:
82,642
1500
2,463

Got:
82,642
1,500
2,463


test_intfmt_with_colors

tabulate(colortable, colorheaders, "grid", intfmt=",")

ValueError: invalid literal for int() with base 10: \x1b[31m42\x1b[0m

--

test_alignment_of_colored_cells

colortable = [
    ("test", 42, "\x1b[31m42\x1b[0m"),
    ("test", 101, "\x1b[32m101\x1b[0m"),
]
colorheaders = ("test", "\x1b[34mtest\x1b[0m", "test")
formatted = tabulate(colortable, colorheaders, "grid")

ValueError: invalid literal for int() with base 10: \x1b[31m42\x1b[0m


test_ansi_color_bold_and_fgcolor

ValueError: invalid literal for int() with base 10: \x1b[1;31m5\x1b[1;m

Final note

I look forward to your reply or guide to start working on solutions. Otherwise, I could make code, but I really don't know if it aligns with where you want to go.

@airvzxf
Copy link
Contributor Author

airvzxf commented Apr 13, 2024

It would be the solution for the colored numbers, at least at Regular Expression level.

Regular expression: (\\x[0-9a-fA-F]+\[\d+m)([0-9\.]+)(\\.*)

For this example

\x1b[31m42\x1b[0m
\x1b[32m1010\x1b[0m

Match for these examples:

\x1b[31m   42   \x1b[0m
\x1b[32m   1010   \x1b[0m

@airvzxf
Copy link
Contributor Author

airvzxf commented Apr 13, 2024

In the above commit: e45cac1; I added the validation and formatting when the number is colored.

@airvzxf
Copy link
Contributor Author

airvzxf commented Apr 13, 2024

Here other option to format the strings which contain integers. For example, in this data [[82642], ["1500"], [2463]] the result should be 82,642 | 1,500 | 2,463 rather than 82,642 | 1500 | 2,463.

The only change that needs to be performed is adding some elif:

            elif val.isdigit():
                val = format(int(val), intfmt)

In this code:

    if valtype is str:
        return f"{val}"
    elif valtype is int:
        if isinstance(val, str):
            val_striped = val.encode('unicode_escape').decode('utf-8')
            colored = re.search(r'(\\[xX]+[0-9a-fA-F]+\[\d+[mM]+)([0-9.]+)(\\.*)$', val_striped)
            if colored:
                total_groups = len(colored.groups())
                if total_groups == 3:
                    digits = colored.group(2)
                    if digits.isdigit():
                        val_new = colored.group(1) + format(int(digits), intfmt) + colored.group(3)
                        val = val_new.encode('utf-8').decode('unicode_escape')
            elif val.isdigit():
                val = format(int(val), intfmt)
            intfmt = ""
        return format(val, intfmt)

This is the result of the test test_intfmt_with_string_as_integer.

result = tabulate([[82642], ["1500"], [2463]], intfmt=",", tablefmt="plain")
expected = "82,642\n  1500\n 2,463"
Expected:
82,642
  1500
 2,463

Got:
82,642
 1,500
 2,463

Note:

If you think this change is important and relevant to your package, please let me know, so I can add this change to the pull requests.

Copy link
Contributor Author

@airvzxf airvzxf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the suggested changes.

@@ -1229,6 +1229,8 @@ def _format(val, valtype, floatfmt, intfmt, missingval="", has_invisible=True):
if valtype is str:
return f"{val}"
elif valtype is int:
if isinstance(val, str):
intfmt = ""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done as requested.

@astanin astanin merged commit 12a805b into astanin:master Sep 26, 2024
1 check failed
@airvzxf airvzxf deleted the bugfix/318/intfmt-string-values branch September 26, 2024 15:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants