Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid too long Git command lines on Windows [WinError 206] #542

Merged
merged 7 commits into from
Mar 5, 2024

Conversation

Svenito
Copy link
Collaborator

@Svenito Svenito commented Feb 28, 2024

As per #490 when running darker on Windows with a large codebase, subprocess fails to run with a large list of files.

This occurs because git_get_modified_python_files is called with a list of all files that are returned by filter_python_files. For a large project this is a long list of files.

What I attempted to do in this commit is to only process changed files, seeing as this is all we care about anyway. The filtering of the files is moved earlier in the process and is passed the directory or paths set on the commandline, instead of the list of files returned from the filter_python_files call. The resulting set is then intersected with all the files Black would process to generate a list of files that should be processed.

While this alleviates the symptom, it could potentially still break if a large amount of files are modified between the current and previous revisions. That said, this is still an improvement on passing the entire list of files to git_get_modified_python_files.

@Svenito
Copy link
Collaborator Author

Svenito commented Feb 28, 2024

Note: this isn't passing tests yet. I realise that my changes are munging the file paths and changing behaviour. Working on a better change

@akaihola akaihola self-requested a review February 28, 2024 20:56
@akaihola akaihola added bug Something isn't working question Further information is requested maybe invalid? Can't reproduce, or seems already fixed, or need more information labels Feb 28, 2024
@akaihola
Copy link
Owner

Also, as mentioned in #490, are you sure #515 doesn't already fix the problem?

@Svenito
Copy link
Collaborator Author

Svenito commented Feb 28, 2024

I will test this properly tomorrow but looking at the diff it’s unlikely. I don’t have a problem with gitignore. The issue is that the list of files passed to the git diff call is too large and causes the command to exceed the max length for windows. All those files are Python files so are expected to be there.
This is a large Django project with a lot of files in sub directory.

@Svenito
Copy link
Collaborator Author

Svenito commented Feb 29, 2024

Just tried with the latest master and the error persists. To elaborate, the list of files passed to git_get_modified_python_files is a set with 1514 entries. It would make sense to reorder how the list of files to check is created. First fetch all the modified files, and then filter that by the files that black would check. This is what I am trying to do with this PR

@Svenito Svenito force-pushed the cmd-length-limit-error branch from 096acb8 to 63a5b67 Compare February 29, 2024 13:31
src/darker/git.py Outdated Show resolved Hide resolved
src/darker/__main__.py Outdated Show resolved Hide resolved
src/darker/__main__.py Outdated Show resolved Hide resolved
src/darker/__main__.py Outdated Show resolved Hide resolved
src/darker/__main__.py Outdated Show resolved Hide resolved
@akaihola akaihola changed the title Attempt to fix [WinError 206] Avoid too long Git command lines on Windows [WinError 206] Mar 5, 2024
@akaihola akaihola added this to the Darker 2.0.0 - Drop py37 milestone Mar 5, 2024
@akaihola
Copy link
Owner

akaihola commented Mar 5, 2024

@Svenito I will now rebase this on master, fix any conflicts and merge. Thanks a lot for your hard work!

Svenito added 7 commits March 5, 2024 22:32
On windows the command can _only_ be 32762 characters. This limit is reached when processing a large project. Instead of passing the whole list of files to the `git_get_modified_python_files` function, pass the paths instead with the repo root as cwd.
Use the black_compat equivalent instead
The test creates a bunch of files in a git repo which causes the subprocess call for getting modified files to exceed the max for Windows
see [the docs](http://msdn.microsoft.com/en-us/library/ms682425%28VS.85%29.aspx)

Also filter out the files that have been modified so that the list only includes the files that are supposed to be processed.
this reduces the number of files for the subprocess test from 1100 to 210. A bit more sane.

Unfortunately we also fall foul of the Windows path length limit here, so we can't have any number of directories.

Also mark the test for this bug to be skipped on non windows systems as it's not an issue there.
Use a single nested directory, not one for each file

Refactor getting changed files to reduce number of temporary variables
Rename the large file list count test to long command test.
Don't skip the long command test on other platforms
@akaihola akaihola force-pushed the cmd-length-limit-error branch from 9fd18cb to c8c61e5 Compare March 5, 2024 20:32
@Svenito
Copy link
Collaborator Author

Svenito commented Mar 5, 2024

Lovely. Thank you. Glad to see it going in.

@akaihola akaihola merged commit 91afe0f into akaihola:master Mar 5, 2024
20 of 21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working maybe invalid? Can't reproduce, or seems already fixed, or need more information question Further information is requested
Projects
Development

Successfully merging this pull request may close these issues.

2 participants