Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Darker changes lines when the previous lines got changed #388

Open
wkentaro opened this issue Aug 29, 2022 · 5 comments · May be fixed by #391
Open

Darker changes lines when the previous lines got changed #388

wkentaro opened this issue Aug 29, 2022 · 5 comments · May be fixed by #391
Labels
enhancement New feature or request

Comments

@wkentaro
Copy link

Describe the bug

  • Darker changes lines when the previous lines got changed.
  • It happens when there are trailing closing parenthesis of the function call.

To Reproduce

darker_tests.zip

unzip darker_tests.zip
cd darker_tests
git --no-pager diff
darker -l 79 spam.py --diff

Expected behavior

  • Only the line that was changed by editing gets updated. (in the below screenshot, only the first call of func1)

Screenshots
image

Environment (please complete the following information):

  • OS: macOS
  • Python version: 3.9.10
  • Git version [e.g. 2.36.0]
% git --version
git version 2.32.1 (Apple Git-133)
hub version refs/heads/master
  • Darker version [e.g. 1.5.0]
% darker --version
1.5.0
  • Black version [e.g. 22.3.0]
% black --version
black, 22.1.0 (compiled: yes)
@wkentaro
Copy link
Author

This is due to difflib not splitting these 2 as different chunks.

In [3]: list(opcodes_to_chunks(opcodes, src, dst))
Out[3]:
[(1,
  ('def func1(a_long_long, b_long_long, c_long_long, d_long_long):',
   '    return a_long_long + b_long_long + c_long_long + d_long_long',
   '',
   '',
   'def main():'),
  ('def func1(a_long_long, b_long_long, c_long_long, d_long_long):',
   '    return a_long_long + b_long_long + c_long_long + d_long_long',
   '',
   '',
   'def main():')),
 (6,
  ('    e = func1(a_long_long=999, b_long_long=2000, c_long_long=3000, d_long_long=4000)',
   '    f = func1(a_long_long=1001, b_long_long=2001, c_long_long=3001, d_long_long=4001)'),
  ('    e = func1(',
   '        a_long_long=999, b_long_long=2000, c_long_long=3000, d_long_long=4000',
   '    )',
   '    f = func1(',
   '        a_long_long=1001, b_long_long=2001, c_long_long=3001, d_long_long=4001',
   '    )')),
 (8,
  ('    print(e, f)', '', '', 'if __name__ == "__main__":', '    main()'),
  ('    print(e, f)', '', '', 'if __name__ == "__main__":', '    main()'))]

@akaihola
Copy link
Owner

Thanks @wkentaro, this is a great summary and example case for a tricky problem. Since difflib doesn't understand the Python syntax, it's impossible for it to reliably tell where statement boundaries are.

In #221, I attempted a more involved approach by digging into the internals of Black and getting a stream of original/reformatted chunks directly from there, but I very quickly ran into a dead end. There was a fundamental reason why this can't be done in a foolproof way even with Black having access to the AST. Unfortunately I didn't write down the details of it, but it would probably be obvious if we went through it again. It had something to do with indented blocks and the extents of AST nodes.

If you're interested to ponder about the problem, we could continue discussion here or maybe in a call.

@akaihola akaihola added the enhancement New feature or request label Aug 29, 2022
@akaihola
Copy link
Owner

Black has evolved quite a bit since I worked on #221, but essentially I tried to reimplement what today is in black._format_str_once() and turn it into a generator of (<original line(s)>, <reformatted line(s)>) pairs instead of collecting all reformatted lines into a string.

I was hoping to get a more granular list of chunks than what difflib can give us by comparing the results of reformatting to the original.

But as said, I failed (or at least was defeated by something that seemed insurmountable).

@akaihola
Copy link
Owner

Also, having to reimplement parts of Black internals would of course make maintenance of Darker vastly heavier for obvious reasons – at least if we'd like to allow users to benefit from features of new Black versions going forward.

@akaihola
Copy link
Owner

Oh, I think I now remember a very very difficult case: non-standard indentation.

Let's say you have this file:

if True:
 for x in range(5):
  try:
   print(10 / x)
  except ZeroDivisionError:
   print("Can't divide")

Currently difflib bundles all the reindented lines into one chunk, and if you e.g. change only the fourth line to print(20/x), Darker will simply still reformat the whole file.

The dream with more granular handling would of course be that the "wrong" indentation be retained and other reformatting applied. A naïve implementation would simply cause the indentation to not match the surrounding code and be either invalid or not match the original AST. The outermost indentation level could perhaps be reverted in a post-processing step to match surrounding code, but somehow I feel there are going to be nasty edge cases where this would break apart.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Development

Successfully merging a pull request may close this issue.

2 participants