Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Git subprocess call error when too many files in directory, even if gitignore'd #490

Closed
gesslerpd opened this issue Apr 28, 2023 · 7 comments · Fixed by #515
Closed

Git subprocess call error when too many files in directory, even if gitignore'd #490

gesslerpd opened this issue Apr 28, 2023 · 7 comments · Fixed by #515
Assignees
Labels
bug Something isn't working question Further information is requested

Comments

@gesslerpd
Copy link

Describe the bug
On windows, when a repo has too many files in it (e.g. many virtual environments or installed dependencies installed) running darker results in the following error:

$ python -m darker -i . --rev origin/main --check
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\admin\Documents\flatbracket\venv\Lib\site-packages\darker\__main__.py", line 639, in <module>
    RETVAL = main_with_error_handling()
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\admin\Documents\flatbracket\venv\Lib\site-packages\darker\__main__.py", line 631, in main_with_er
    return main()
           ^^^^^^
  File "C:\Users\admin\Documents\flatbracket\venv\Lib\site-packages\darker\__main__.py", line 575, in main
    changed_files_to_reformat = git_get_modified_python_files(
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\admin\Documents\flatbracket\venv\Lib\site-packages\darker\git.py", line 474, in git_get_modified_
    changed_paths = _git_diff_name_only(revrange.rev1, revrange.rev2, paths, cwd)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\admin\Documents\flatbracket\venv\Lib\site-packages\darker\git.py", line 434, in _git_diff_name_on
    lines = _git_check_output_lines(diff_cmd, cwd)
  File "C:\Users\admin\Documents\flatbracket\venv\Lib\site-packages\darker\git.py", line 343, in _git_check_output
    return check_output(  # nosec
           ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\admin\AppData\Local\Programs\Python\Python311\Lib\subprocess.py", line 466, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\admin\AppData\Local\Programs\Python\Python311\Lib\subprocess.py", line 548, in run
    with Popen(*popenargs, **kwargs) as process:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\admin\AppData\Local\Programs\Python\Python311\Lib\subprocess.py", line 1024, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\Users\admin\AppData\Local\Programs\Python\Python311\Lib\subprocess.py", line 1493, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [WinError 206] The filename or extension is too long

The above error isn't due to a long filename as the WinError suggests, seems to be due to the amount of files passed to the subprocess call.

To Reproduce
Steps to reproduce the behavior:

  1. In any git repo, add venv*/ to the .gitignore file
  2. Create 6 new virtualenvs using python -m venv venv1 venv2 venv3 venv4 venv5 venv6
  3. In a fresh virtualenv, pip install darker isort -U
  4. Run Darker with darker .
  5. See error / wrong behavior

Expected behavior
Should not error out and not pass such a large list of files to git subprocess call (especially if they are gitignore'd)

Environment (please complete the following information):

  • OS: Windows
  • Python version: 3.11.1
  • Git version: 2.39.2.windows.1
  • Darker version: 1.7.1
  • Black version: 23.1.0

Additional context:

The files_to_process variable in __main__.py:562 contains files from all ignored venv's and exceeds the amount of process call arguments that are allowed by windows.

@akaihola
Copy link
Owner

akaihola commented May 1, 2023

Thanks @gesslerpd for the detailed report and analysis! Do you agree with my reasoning:

  • The obvious solution would be to make multiple Git calls and ensure the list of files isn't too long.
  • As a separate issue, .gitignored files could be eliminated from the list before calling Git (see also exclude files/directories, how? #323). It seems currently they are eventually ignored by Git.

Do you know whether the limit in Windows is on the

  • number of arguments, or
  • the number of characters on the resulting command line?

Also, I wonder what the exact limit is and whether it's similar across Windows versions.

@akaihola akaihola self-assigned this May 1, 2023
@akaihola akaihola added bug Something isn't working question Further information is requested labels May 1, 2023
@gesslerpd
Copy link
Author

gesslerpd commented May 8, 2023

Sorry for the delay, I was able to do some tests and it seems like the limit is to do with the amount of information in the args in total rather than the number of args, approximately 2 ^ 15 - 6 = 32762.

Another thing I noticed is that when I have a venv/ no files from it end up in the paths list but when I have other ignored virtualenv's then paths from those end up in the variable to be passed to subprocess.

I was initially thinking along the lines of your second bullet as a solution or possibly there's a way to only input the git --revision information to the git subprocess to avoid large path lists altogether but not familiar enough with the codebase to tell. Note: black itself honors gitignores with its default settings.

@gesslerpd
Copy link
Author

Found this that seems to support my findings: https://devblogs.microsoft.com/oldnewthing/20031210-00/?p=41553

@akaihola
Copy link
Owner

when I have a venv/ no files from it end up in the paths list but when I have other ignored virtualenv's then paths from those end up in the variable to be passed to subprocess

I must be misunderstanding something. Do you mean that gitignored paths are included but non-gitignored venv/ is not? How can that be, if src/ or mypackage/ does get included? Darker doesn't care about names of directories.

black itself honors gitignores with its default settings

There must then be code in Black which we could re-use in Darker. (Note: We'll probably want to vendor instead of import useful code from Black due to #251, #304 and #309. There are existing helper imports from Black which should be vendored as well.)

@jedie
Copy link
Contributor

jedie commented Aug 9, 2023

Maybe it's just, because black will not use the .gitignore, see: #323 (comment) ??

Fix: #515

Svenito added a commit to Svenito/darker that referenced this issue Feb 28, 2024
As per akaihola#490 when running darker on Windows with a large codebase
subprocess fails to run with a large list of files.

What I attempted to do in this commit is to only process changed files,
seeing as this is all we care about anyway. The filtering of the files
is moved earlier in the process and is passed the directory or paths set
on the commandline, instead of the list of files returned from the
`filter_python_files` call. The resulting set is then intersected with
all the files Black would process to generate a list of files that
should be processed.

While this alleviates the symptom, it could potentially still break if a
large amount of files are modified in between the current and previous
revisions.
@akaihola
Copy link
Owner

@gesslerpd & @Svenito, could you check whether #515 fixes this problem?

@gesslerpd
Copy link
Author

Latest release 1.7.3 fixes it, thanks! @akaihola

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
Development

Successfully merging a pull request may close this issue.

3 participants