Truncate long local variable values with unittest utils #109

vegarsti · 2019-04-18T18:15:28Z

Addresses #75.

The issue suggests using something from unittest.util. This util has _common_shorten_repr, which, given a dictionary, possibly truncates the values in it. We call this directly on the dictionary with local variables. Very open to that not being the optimal solution, but it should start the conversation.

Example use

Make test_shortened_values.py, which is the example docs/examples/extra_locals.py, slightly modified, so the file path becomes long:

import os
import marbles.core


class FileTestCase(marbles.core.TestCase):

    def test_file_size(self):
        file_name = __file__ + __file__  # noqa: F841

        self.assertEqual(os.path.getsize(__file__), 0)


if __name__ == '__main__':
    marbles.core.main()

Now, running this, we get

» python -m marbles test_shortened_values.py
F
======================================================================
FAIL: test_file_size (test_shortened_values.FileTestCase)
----------------------------------------------------------------------
marbles.core.marbles.ContextualAssertionError: 269 != 0

Source (/Users/vegard/Documents/dev/marbles/test_shortened_values.py):
      9
 >   10 self.assertEqual(os.path.getsize(__file__), 0)
     11
Locals:
	file_name = "'/Use[54 chars]py/Users/vegard/Documents/dev/marbles/test_shortened_values.py'"


----------------------------------------------------------------------
Ran 1 test in 0.001s

FAILED (failures=1)

Notice the [54 chars] in there. Is this somewhat in the right direction, @thejunglejane?

I have not added tests, yet, so would love some pointers as to how that might be done. (And I signed and sent a CLA.)

vegarsti · 2019-04-18T18:23:26Z

I'm not at all sure that this does what you requested...
I'm seeing that there exist tests for multiline locals, and this would, I think, truncate most of those.

thejunglejane

Let's figure out as a group how we want to truncate multiline locals and then decide what to do with the tests.

thejunglejane · 2019-04-24T15:19:17Z

marbles/core/marbles/core/marbles.py

@@ -339,7 +339,10 @@ def _format_local(cls, name, value):

    @classmethod
    def _format_locals(cls, locals_):
-        return '\n'.join(cls._format_local(k, v) for k, v in locals_.items())
+        local_variable_names, local_variable_values = zip(*locals_.items())


First, thanks for submitting your CLA 😄

A couple of notes:

Empty Locals Dict
If locals_ is an empty dict, this variable assignment is going to fail:

>>> empty_dict = dict() >>> k, v = zip(*emtpy_dict.items()) ---------------------------------------------------------------------- Traceback (most recent call last): 1 empty_dict = dict() ----> 2 k, v = zip(*empty_dict.items()) ValueError: not enough values to unpack (expected 2, got 0)

Variables and Variable Names
These lines are a bit too long, which is causing the flake8 build step to fail. Could we use more concise variable names, and make sure to run the linter locally before pushing? You can see the docs on how to lint with flake8 and/or tox locally here.

_common_shorten_repr
I'm reluctant to rely on an internal function. What do you think about using unittest.util.safe_repr with short=True instead of _common_shorten_repr? If you already considered this and chose _common_shorten_repr anyway, I'd love to hear your thought process (and see it captured in an inline comment).

Move Logic Into _format_local
What do you think about moving this logic into _format_local classmethod instead? I think that would be simpler and is a more natural place to do this shortening. If we do the shortening there, we don't have to worry about unzipping and zipping the keys and values in locals_ (because we're operating on one key, value pair at a time), and can avoid the issue where locals_ is an empty dictionary entirely. We could also handle shortening multi-line locals and single-line locals separately (we already process them differently in _format_local anyway), which we might want to do. @leifwalsh, want to chime in?

Awesome, thanks for the feedback!
I totally missed the part about linting, thanks for telling me. I will address that!

I definitely agree that it will be cleaner to put the logic in _format_local!

Do I suggest specific code changes in here or rather push a commit for it?

Oh, and safe_repr looks like a better choice, I agree, I didn't consider that.

vegarsti · 2019-04-24T20:55:46Z

I think I addressed your comments in this change.
I used unittest.safe_repr out of the box, and on a line by line basis for multiline representations.
Out of the box, though, a full truncated string[1] ends up at around 105 in length.
That might be a bit too much?
safe_repr uses an internal _MAX_LENGTH, though, which is 80, so it's sadly not as simple as passing a keyword length argument to make it shorter.

What do you think?

[1]: from the start of the line in the terminal, so including the tab size

Thanks for taking the time!

vegarsti · 2019-04-24T21:00:22Z

marbles/core/marbles/core/marbles.py

@@ -330,13 +330,18 @@ def formattedMsg(self):  # mimic unittest's name for standardMsg

    @classmethod
    def _format_local(cls, name, value):
+        def truncate(s):
+            return unittest.util.safe_repr(s, short=True)


I ended up doing this to make the lines shorter. Not sure if this is good or not.

I prefer not defining nested functions like this. There are a couple of other ways to shorten the lines here:

Import unittest.util directly so you can call util.safe_repr (which is a good deal shorter than unittest.util.safe_repr)

from unittest import util

Use line breaking and even more concise variable names!

short_lines = [unittest.util.safe_repr(line, short=True) for line in value_str.split('\n')] short_value_str = '\n'.join(short_lines) value_str = textwrap.indent(short_value_str, '\t\t')

I want to think more about what exactly we want to do with multiline locals, but since you asked about this piece of code specifically I figured I'd show you these options.

Great, that makes sense. Thanks!
Are there conventions on when/how to line break? I know many use black to auto format the code.

PEP8 has a section about breaking long lines, but it doesn't have a ton of examples. The Hitchiker's Guide to Python has a short section on Line Continuations, too, but again, not a ton of examples.

One of the best things you can do is simply to read Python code and see how long lines are broken. You're more than welcome to read more of marbles the source code to get a sense of our line-breaking style.

You can also use black, and see how it breaks up long lines in different scenarios. We don't use black to format marbles, so running black on the whole codebase may change a lot of things outside of the specific code you're changing here.

Got it, thanks!

vegarsti · 2019-04-25T05:44:32Z

Great, thanks!

…

On Wed, 24 Apr 2019 at 23:27, Jane Adams ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In marbles/core/marbles/core/marbles.py <#109 (comment)>: > @@ -330,13 +330,18 @@ def formattedMsg(self): # mimic unittest's name for standardMsg @classmethod def _format_local(cls, name, value): + def truncate(s): + return unittest.util.safe_repr(s, short=True) I prefer not defining nested functions like this. There are a couple of other ways to shorten the lines here: 1. Import unittest.util directly so you can call util.safe_repr (which is a good deal shorter than unittest.util.safe_repr) from unittest import util 1. Use line breaking and even more concise variable names! short_lines = [unittest.util.safe_repr(line, short=True) for line in value_str.split('\n')] short_value_str = '\n'.join(short_lines) value_str = textwrap.indent(short_value_str, '\t\t') I want to think more about what exactly we want to do with multiline locals, but since you asked about this piece of code specifically I figured I'd show you these options. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#109 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABLPSNI2HEMMJHDVVUYAXQTPSDGCVANCNFSM4HG7HRAQ> .

thejunglejane · 2019-04-28T14:49:49Z

Thanks again for submitting this PR. Building this out has helped us think through what we actually want to do here, which is super valuable.

Now that I have it in my hands, I think we were too quick in proposing a solution (using string truncating from unittest.util), and didn’t spend enough time describing the problem. The experience we are trying to improve is the case where there are complex locals with long reprs that look ugly in the output: arrays, dictionaries, pandas DataFrames, etc. (These are common among marbles tests because marbles was originally written to write unit tests for datasets.) “Ugly” usually means really long lines that wrap the entire terminal (I can make an example when I’m not typing on my phone). These can actually become a real problem if they get too large, think megabytes of data in a table.

But marbles, at the end of the day, is all about preserving information. Truncating a repr takes information away. I think what we want to do is reformat complex locals like the ones described above to make them easier to read, but try as much as we can not to take information away. The decisions about what information to include and exclude should stay with the test author.

vegarsti · 2019-04-28T14:52:35Z

I see! It definitely makes sense that you do not want to truncate, maybe especially as the code author will have no impact on how that is done.

leifwalsh · 2019-04-28T14:52:43Z

#96 might be a more interesting way to approach this problem. The idea here is to find a way to register repr handlers for common types (list, dict, set, numpy/pandas objects), possibly using reprlib. I think the current way list is handled, you just get a very long single line that textwrap wraps, but not necessarily on object boundaries, with #96, we could try to do something smarter. We could also do things like always sorting sets so they’re easier to visually compare. Would you be interested in trying to work through that idea with us?

vegarsti · 2019-04-28T15:02:55Z

Yeah, that would be fun! Sounds interesting, and sounds like a good way to do it.

Truncate long local variable values with unittest utils

4700b9b

thejunglejane requested changes Apr 24, 2019

View reviewed changes

vegarsti added 2 commits April 24, 2019 22:16

Truncate one-line values with safe_repr with short keyword instead

f1f1082

Truncate ,ultiline values line by line as oneline

3296f32

vegarsti commented Apr 24, 2019

View reviewed changes

Use better variable names and remove nested function

691f28c

leifwalsh mentioned this pull request Apr 28, 2019

Custom local variable display hook #96

Open

leifwalsh deleted the branch twosigma:master May 28, 2022 23:59

leifwalsh closed this May 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Truncate long local variable values with unittest utils #109

Truncate long local variable values with unittest utils #109

vegarsti commented Apr 18, 2019 •

edited

Loading

vegarsti commented Apr 18, 2019 •

edited

Loading

thejunglejane left a comment

thejunglejane Apr 24, 2019

vegarsti Apr 24, 2019 •

edited

Loading

vegarsti commented Apr 24, 2019 •

edited

Loading

vegarsti Apr 24, 2019

thejunglejane Apr 24, 2019

vegarsti Apr 25, 2019

thejunglejane Apr 25, 2019

vegarsti Apr 25, 2019 •

edited

Loading

vegarsti commented Apr 25, 2019 via email

thejunglejane commented Apr 28, 2019

vegarsti commented Apr 28, 2019

leifwalsh commented Apr 28, 2019

vegarsti commented Apr 28, 2019 •

edited

Loading

Truncate long local variable values with unittest utils #109

Truncate long local variable values with unittest utils #109

Conversation

vegarsti commented Apr 18, 2019 • edited Loading

Example use

vegarsti commented Apr 18, 2019 • edited Loading

thejunglejane left a comment

Choose a reason for hiding this comment

thejunglejane Apr 24, 2019

Choose a reason for hiding this comment

vegarsti Apr 24, 2019 • edited Loading

Choose a reason for hiding this comment

vegarsti commented Apr 24, 2019 • edited Loading

vegarsti Apr 24, 2019

Choose a reason for hiding this comment

thejunglejane Apr 24, 2019

Choose a reason for hiding this comment

vegarsti Apr 25, 2019

Choose a reason for hiding this comment

thejunglejane Apr 25, 2019

Choose a reason for hiding this comment

vegarsti Apr 25, 2019 • edited Loading

Choose a reason for hiding this comment

vegarsti commented Apr 25, 2019 via email

thejunglejane commented Apr 28, 2019

vegarsti commented Apr 28, 2019

leifwalsh commented Apr 28, 2019

vegarsti commented Apr 28, 2019 • edited Loading

vegarsti commented Apr 18, 2019 •

edited

Loading

vegarsti commented Apr 18, 2019 •

edited

Loading

vegarsti Apr 24, 2019 •

edited

Loading

vegarsti commented Apr 24, 2019 •

edited

Loading

vegarsti Apr 25, 2019 •

edited

Loading

vegarsti commented Apr 28, 2019 •

edited

Loading