Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Truncate long local variable values with unittest utils #109

Closed
wants to merge 4 commits into from
Closed

Truncate long local variable values with unittest utils #109

wants to merge 4 commits into from

Conversation

vegarsti
Copy link

@vegarsti vegarsti commented Apr 18, 2019

Addresses #75.

The issue suggests using something from unittest.util. This util has _common_shorten_repr, which, given a dictionary, possibly truncates the values in it. We call this directly on the dictionary with local variables. Very open to that not being the optimal solution, but it should start the conversation.

Example use

Make test_shortened_values.py, which is the example docs/examples/extra_locals.py, slightly modified, so the file path becomes long:

import os
import marbles.core


class FileTestCase(marbles.core.TestCase):

    def test_file_size(self):
        file_name = __file__ + __file__  # noqa: F841

        self.assertEqual(os.path.getsize(__file__), 0)


if __name__ == '__main__':
    marbles.core.main()

Now, running this, we get

» python -m marbles test_shortened_values.py
F
======================================================================
FAIL: test_file_size (test_shortened_values.FileTestCase)
----------------------------------------------------------------------
marbles.core.marbles.ContextualAssertionError: 269 != 0

Source (/Users/vegard/Documents/dev/marbles/test_shortened_values.py):
      9
 >   10 self.assertEqual(os.path.getsize(__file__), 0)
     11
Locals:
	file_name = "'/Use[54 chars]py/Users/vegard/Documents/dev/marbles/test_shortened_values.py'"


----------------------------------------------------------------------
Ran 1 test in 0.001s

FAILED (failures=1)

Notice the [54 chars] in there. Is this somewhat in the right direction, @thejunglejane?

I have not added tests, yet, so would love some pointers as to how that might be done. (And I signed and sent a CLA.)

@vegarsti
Copy link
Author

vegarsti commented Apr 18, 2019

I'm not at all sure that this does what you requested...
I'm seeing that there exist tests for multiline locals, and this would, I think, truncate most of those.

Copy link
Contributor

@thejunglejane thejunglejane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's figure out as a group how we want to truncate multiline locals and then decide what to do with the tests.

@@ -339,7 +339,10 @@ def _format_local(cls, name, value):

@classmethod
def _format_locals(cls, locals_):
return '\n'.join(cls._format_local(k, v) for k, v in locals_.items())
local_variable_names, local_variable_values = zip(*locals_.items())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, thanks for submitting your CLA 😄

A couple of notes:

Empty Locals Dict
If locals_ is an empty dict, this variable assignment is going to fail:

>>> empty_dict = dict()
>>> k, v = zip(*emtpy_dict.items())
----------------------------------------------------------------------
Traceback (most recent call last):
      1 empty_dict = dict()
----> 2 k, v = zip(*empty_dict.items())
ValueError: not enough values to unpack (expected 2, got 0)

Variables and Variable Names
These lines are a bit too long, which is causing the flake8 build step to fail. Could we use more concise variable names, and make sure to run the linter locally before pushing? You can see the docs on how to lint with flake8 and/or tox locally here.

_common_shorten_repr
I'm reluctant to rely on an internal function. What do you think about using unittest.util.safe_repr with short=True instead of _common_shorten_repr? If you already considered this and chose _common_shorten_repr anyway, I'd love to hear your thought process (and see it captured in an inline comment).

Move Logic Into _format_local
What do you think about moving this logic into _format_local classmethod instead? I think that would be simpler and is a more natural place to do this shortening. If we do the shortening there, we don't have to worry about unzipping and zipping the keys and values in locals_ (because we're operating on one key, value pair at a time), and can avoid the issue where locals_ is an empty dictionary entirely. We could also handle shortening multi-line locals and single-line locals separately (we already process them differently in _format_local anyway), which we might want to do. @leifwalsh, want to chime in?

Copy link
Author

@vegarsti vegarsti Apr 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for the feedback!
I totally missed the part about linting, thanks for telling me. I will address that!

I definitely agree that it will be cleaner to put the logic in _format_local!

Do I suggest specific code changes in here or rather push a commit for it?

Oh, and safe_repr looks like a better choice, I agree, I didn't consider that.

@vegarsti
Copy link
Author

vegarsti commented Apr 24, 2019

I think I addressed your comments in this change.
I used unittest.safe_repr out of the box, and on a line by line basis for multiline representations.
Out of the box, though, a full truncated string[1] ends up at around 105 in length.
That might be a bit too much?
safe_repr uses an internal _MAX_LENGTH, though, which is 80, so it's sadly not as simple as passing a keyword length argument to make it shorter.

What do you think?

[1]: from the start of the line in the terminal, so including the tab size

Thanks for taking the time!

@@ -330,13 +330,18 @@ def formattedMsg(self): # mimic unittest's name for standardMsg

@classmethod
def _format_local(cls, name, value):
def truncate(s):
return unittest.util.safe_repr(s, short=True)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up doing this to make the lines shorter. Not sure if this is good or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer not defining nested functions like this. There are a couple of other ways to shorten the lines here:

  1. Import unittest.util directly so you can call util.safe_repr (which is a good deal shorter than unittest.util.safe_repr)
from unittest import util
  1. Use line breaking and even more concise variable names!
short_lines = [unittest.util.safe_repr(line, short=True)
               for line in value_str.split('\n')]
short_value_str = '\n'.join(short_lines)
value_str = textwrap.indent(short_value_str, '\t\t')

I want to think more about what exactly we want to do with multiline locals, but since you asked about this piece of code specifically I figured I'd show you these options.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, that makes sense. Thanks!
Are there conventions on when/how to line break? I know many use black to auto format the code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8 has a section about breaking long lines, but it doesn't have a ton of examples. The Hitchiker's Guide to Python has a short section on Line Continuations, too, but again, not a ton of examples.

One of the best things you can do is simply to read Python code and see how long lines are broken. You're more than welcome to read more of marbles the source code to get a sense of our line-breaking style.

You can also use black, and see how it breaks up long lines in different scenarios. We don't use black to format marbles, so running black on the whole codebase may change a lot of things outside of the specific code you're changing here.

Copy link
Author

@vegarsti vegarsti Apr 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks!

@vegarsti
Copy link
Author

vegarsti commented Apr 25, 2019 via email

@thejunglejane
Copy link
Contributor

Thanks again for submitting this PR. Building this out has helped us think through what we actually want to do here, which is super valuable.

Now that I have it in my hands, I think we were too quick in proposing a solution (using string truncating from unittest.util), and didn’t spend enough time describing the problem. The experience we are trying to improve is the case where there are complex locals with long reprs that look ugly in the output: arrays, dictionaries, pandas DataFrames, etc. (These are common among marbles tests because marbles was originally written to write unit tests for datasets.) “Ugly” usually means really long lines that wrap the entire terminal (I can make an example when I’m not typing on my phone). These can actually become a real problem if they get too large, think megabytes of data in a table.

But marbles, at the end of the day, is all about preserving information. Truncating a repr takes information away. I think what we want to do is reformat complex locals like the ones described above to make them easier to read, but try as much as we can not to take information away. The decisions about what information to include and exclude should stay with the test author.

@vegarsti
Copy link
Author

I see! It definitely makes sense that you do not want to truncate, maybe especially as the code author will have no impact on how that is done.

@leifwalsh
Copy link
Contributor

#96 might be a more interesting way to approach this problem. The idea here is to find a way to register repr handlers for common types (list, dict, set, numpy/pandas objects), possibly using reprlib. I think the current way list is handled, you just get a very long single line that textwrap wraps, but not necessarily on object boundaries, with #96, we could try to do something smarter. We could also do things like always sorting sets so they’re easier to visually compare. Would you be interested in trying to work through that idea with us?

@vegarsti
Copy link
Author

vegarsti commented Apr 28, 2019

Yeah, that would be fun! Sounds interesting, and sounds like a good way to do it.

@leifwalsh leifwalsh deleted the branch twosigma:master May 28, 2022 23:59
@leifwalsh leifwalsh closed this May 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants