Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting traceback calls & no report is generated. #2908

Open
kiranravindran90 opened this issue Apr 4, 2022 · 28 comments
Open

Getting traceback calls & no report is generated. #2908

kiranravindran90 opened this issue Apr 4, 2022 · 28 comments
Labels

Comments

@kiranravindran90
Copy link

kiranravindran90 commented Apr 4, 2022

Getting traceback calls & no report is generated.

How To Reproduce

While scanning source code files, after encountering a .sh file, this error seemed to be popping out (image below).
MicrosoftTeams-image

Many Traceback calls were followed by a "Multiprocessing Context Timeout Error"

System configuration

  • OS - Windows
  • ScanCode Version - 30.1.0
  • Installation - Source download
@pombredanne
Copy link
Member

@kiranravindran90 Thanks: do you have instructions so we can reproduce your setup? including a full trace?

@kiranravindran90
Copy link
Author

@pombredanne Find the full trace & instruction details.


Scan Instruction :
scancode -clpi -n 2 --timeout 3600 --json .json

Package : (~15GB)
Linux based Flashwares & source code.

File types :
.bin, .bct, .dat, .dtb, .img, .cpio

Trace :
Removing temporary files...done.
Traceback (most recent call last):
File "C:................\scancode-toolkit-30.1.0_py39-windows\scancode-toolkit-30.1.0\Scripts\scancode-script.py", line 33, in
sys.exit(load_entry_point('scancode-toolkit', 'console_scripts', 'scancode')())
File "C:................\scancode-toolkit-30.1.0_py39-windows\scancode-toolkit-30.1.0\lib\site-packages\click\core.py", line 1137, in call
return self.main(*args, **kwargs)
File "C:................\scancode-toolkit-30.1.0_py39-windows\scancode-toolkit-30.1.0\lib\site-packages\commoncode\cliutils.py", line 69, in main
return click.Command.main(
File "C:................\scancode-toolkit-30.1.0_py39-windows\scancode-toolkit-30.1.0\lib\site-packages\click\core.py", line 1062, in main
rv = self.invoke(ctx)
File "C:................\scancode-toolkit-30.1.0_py39-windows\scancode-toolkit-30.1.0\lib\site-packages\click\core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "C:................\scancode-toolkit-30.1.0_py39-windows\scancode-toolkit-30.1.0\lib\site-packages\click\core.py", line 763, in invoke
return __callback(*args, **kwargs)
File "C:................\scancode-toolkit-30.1.0_py39-windows\scancode-toolkit-30.1.0\lib\site-packages\click\decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "c:................\scancode-toolkit-30.1.0_py39-windows\scancode-toolkit-30.1.0\src\scancode\cli.py", line 451, in scancode
success, _results = run_scan(
File "c:................\scancode-toolkit-30.1.0_py39-windows\scancode-toolkit-30.1.0\src\scancode\cli.py", line 887, in run_scan
scan_success = run_scanners(
File "c:................\scancode-toolkit-30.1.0_py39-windows\scancode-toolkit-30.1.0\src\scancode\cli.py", line 1125, in run_scanners
scan_success = scan_codebase(
File "c:................\scancode-toolkit-30.1.0_py39-windows\scancode-toolkit-30.1.0\src\scancode\cli.py", line 1236, in scan_codebase
scan_timings) = next(scans)
File "C:................\scancode-toolkit-30.1.0_py39-windows\scancode-toolkit-30.1.0\lib\site-packages\click_termui_impl.py", line 116, in next
return next(iter(self))
File "C:................\scancode-toolkit-30.1.0_py39-windows\scancode-toolkit-30.1.0\lib\site-packages\commoncode\cliutils.py", line 172, in generator
for rv in self.iter:
File "c:................\scancode-toolkit-30.1.0_py39-windows\scancode-toolkit-30.1.0\src\scancode\pool.py", line 52, in wrap
return func(self, timeout=timeout or 3600)
File "C:\Program Files\Python39\lib\multiprocessing\pool.py", line 865, in next
raise TimeoutError from None
multiprocessing.context.TimeoutError


@Jeeppler
Copy link

This is related to: #2160

@Jeeppler
Copy link

Jeeppler commented Sep 14, 2022

I am facing the same issue.

OS: Linux (Debian)
Python: 3.9
Scancode version: 30.1.0
Installation: download (container Dockerfile is public)

Timeout option set to: --timeout 18000 (5 * 3600 = 18000; 3600 seconds are one hour)

However, the scan still stops after 3600 seconds (60 minutes). I looked at https://github.com/nexB/scancode-toolkit/blob/develop/src/scancode/pool.py#L52. However, I could not really understand how it works.

The first, what is this line actually doing:

func(self, timeout=timeout or 3600)

What exactly does the statement timeout or 3600 does?

Second, what is the purpose of the wrapped method/function?

@pombredanne
Copy link
Member

@Jeeppler there is a new release of ScanCode and I would like to see if you are also experiencing the issue there. Can you try the latest 31.x?
But there is no application archive for Python 3.9 ... only 3.8 for now.
Let me review your Dockerfile at https://github.com/mercedes-benz/sechub/blob/develop/sechub-pds-solutions/scancode/docker/ScanCode-Debian.dockerfile

in https://github.com/nexB/scancode-toolkit/blob/6007301cca4eff424202abf197467ee2f004e139/src/scancode/pool.py#L52
The first, what is this line actually doing:
func(self, timeout=timeout or 3600)
What exactly does the statement timeout or 3600 does?

timeout=timeout or 3600 will assign the timeout argument with the value of the timeout variable if it has a value or with 3600 seconds (aka. 1 hour) otherwise if timeout is None or 0. It essentially provides a default timeout if none was provided.

Second, what is the purpose of the wrapped method/function? in https://github.com/nexB/scancode-toolkit/blob/6007301cca4eff424202abf197467ee2f004e139/src/scancode/pool.py#L39

I reckon it feels a bit arcane. This is a function that returns a function such that calling the inner function will always be subject to a timeout. The code is there because of historical multiprocessing bugs that have been existing in the past and a lot of trial and error to get things working correctly (obviously not enough!):

... long story made short: there are likely some changes and fixes in the code of Python that mean we may not need anymore some of this machinery. The difficult thing here is that we support Windows, macOS and Linux and they have subtly different bugs in their Python implementations at times and not-so-subtle differences on how they each handle forking and spawing processes. We support all of them :|

If you think you could help track down these issues, and have a reliably failing test case, I could may be try to simplify the code in a branch and let you test this?

@Jeeppler
Copy link

Jeeppler commented Oct 6, 2022

Can you try the latest 31.x?
But there is no application archive for Python 3.9 ... only 3.8 for now.

Since Debian 11 comes with Python 3.9 it would be nice to have an archive for Python 3.9 otherwise I would have to install ScanCode via pip. Are there any problems installing ScanCode via pip rather than the archive?

If you think you could help track down these issues, and have a reliably failing test case, I could may be try to simplify the code in a branch and let you test this?

I would love to help fix the bug, it includes coding something as well.

However, I was unable so far to find public test cases/files to test it with. The files I use, I am not allowed to share.
I tried it with ISO images of Linux Distributions, for example with the one from Rocky Linux. Those images are huge, but the scan takes only a few minutes.

In case you have an idea what type of binaries could work let me know. I am willing to test them.

@Jeeppler
Copy link

Jeeppler commented Oct 6, 2022

Related to: #2703

@Jeeppler
Copy link

Jeeppler commented Oct 6, 2022

Can I disable the timeout if I set it to 0? Or does it set the fallback in this case to 3600 seconds?

@Jeeppler
Copy link

Jeeppler commented Oct 7, 2022

I was able to reproduce the issue with the Armbian image of the Jetson Nano:

Armbian_22.08.1_Jetson-nano_jammy_current_5.18.19.img

The image takes more than an hour to scan and on Debian 11 with Scancode-Toolkit version: 30.1.0. I get the following output:

/tools/scancode-toolkit-30.1.0/src/cluecode/copyrights.py:3382: FutureWarning: Possible set difference at position 3
  remove_tags = re.compile(
 Setup plugin: scan:info...
 Setup plugin: scan:licenses...
 Setup plugin: scan:copyrights...
 Setup plugin: scan:packages...
 Setup plugin: scan:emails...
 Setup plugin: scan:urls...
 Setup plugin: output:spdx-tv...
Collect file inventory...
Scan files for: info, licenses, copyrights, packages, emails, urls with 2 process(es)...
Removing temporary files...done.
Traceback (most recent call last):
  File "/tools/scancode-toolkit-30.1.0/bin/scancode", line 33, in <module>
    sys.exit(load_entry_point('scancode-toolkit', 'console_scripts', 'scancode')())
  File "/tools/scancode-toolkit-30.1.0/lib/python3.9/site-packages/click/core.py", line 1137, in __call__
    return self.main(*args, **kwargs)
  File "/tools/scancode-toolkit-30.1.0/lib/python3.9/site-packages/commoncode/cliutils.py", line 69, in main
    return click.Command.main(
  File "/tools/scancode-toolkit-30.1.0/lib/python3.9/site-packages/click/core.py", line 1062, in main
    rv = self.invoke(ctx)
  File "/tools/scancode-toolkit-30.1.0/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/tools/scancode-toolkit-30.1.0/lib/python3.9/site-packages/click/core.py", line 763, in invoke
    return __callback(*args, **kwargs)
  File "/tools/scancode-toolkit-30.1.0/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/tools/scancode-toolkit-30.1.0/src/scancode/cli.py", line 451, in scancode
    success, _results = run_scan(
  File "/tools/scancode-toolkit-30.1.0/src/scancode/cli.py", line 887, in run_scan
    scan_success = run_scanners(
  File "/tools/scancode-toolkit-30.1.0/src/scancode/cli.py", line 1125, in run_scanners
    scan_success = scan_codebase(
  File "/tools/scancode-toolkit-30.1.0/src/scancode/cli.py", line 1236, in scan_codebase
    scan_timings) = next(scans)
  File "/tools/scancode-toolkit-30.1.0/lib/python3.9/site-packages/click/_termui_impl.py", line 116, in __next__
    return next(iter(self))
  File "/tools/scancode-toolkit-30.1.0/lib/python3.9/site-packages/click/_termui_impl.py", line 328, in generator
    for rv in self.iter:
  File "/tools/scancode-toolkit-30.1.0/src/scancode/pool.py", line 52, in wrap
    return func(self, timeout=timeout or 3600)
  File "/usr/lib/python3.9/multiprocessing/pool.py", line 865, in next
    raise TimeoutError from None
multiprocessing.context.TimeoutError

I set --timeout 18000 (5 hours). However, the scan only takes a hour and then I get the multiprocessing.context.TimeoutError.

@pombredanne
Copy link
Member

@Jeeppler Thanks for digging further.
https://redirect.armbian.com/jetson-nano/Jammy_current is 350MB of xzipped data is massive. So I guess that we should rather skipp scanning such an image for anything but file information. We can do it just based on the size, with a default and a possible new CLI option?

What should be the default max size above which it does not make sense to attempt scanning? 100MB? 200MB?

@pombredanne
Copy link
Member

(FWIW, the uncompressed disk image above is 1.7GB ;) )

@pombredanne
Copy link
Member

Can I disable the timeout if I set it to 0? Or does it set the fallback in this case to 3600 seconds?

you can but that may make all fail. You can make it super short as 0.001 second if you want

@Jeeppler
Copy link

Jeeppler commented Oct 7, 2022

So I guess that we should rather skipp scanning such an image for anything but file information

No, no skipping of the file. The requirement is to scan files in that size or bigger for license information. It takes hours, but Scancode can find license information in files of that size.

In case, you have more questions about the use case you can reach me via email using the email address located in this file: https://github.com/mercedes-benz/sechub/blob/develop/MAINTAINERS.md

@Jeeppler
Copy link

Jeeppler commented Oct 7, 2022

The problem is, that even if I set the --timeout above 3600 I get the multiprocessing.context.TimeoutError after exactly one hour (3600 seconds = 1 hour). My expectation, would be that --timeout 18000 will increase the timeout duration to 5 hours and the scan might work.

I encountered the problem under Linux. It apparently works under Windows.

@Jeeppler
Copy link

Jeeppler commented Oct 7, 2022

I ran the test again with the new version of scancode under Ubuntu 22.04 (Mint 21).

Setup:

$ python3 --version
Python 3.10.6

$ scancode --version
ScanCode version: 31.2.1
ScanCode Output Format version: 2.0.0
SPDX License list version: 3.17

I installed scancode via pip.

$ time scancode --timeout 3800 --verbose --strip-root --copyright --license --package --email --url --info --spdx-tv scan.txt Armbian_22.08.1_Jetson-nano_jammy_current_5.18.19.img
 Setup plugin: scan:info...
 Setup plugin: scan:licenses...
 Setup plugin: scan:copyrights...
 Setup plugin: scan:packages...
 Setup plugin: scan:emails...
 Setup plugin: scan:urls...
 Setup plugin: output:spdx-tv...
Collect file inventory...
Scan files for: info, licenses, copyrights, packages, emails, urls with 1 process(es)...
Removing temporary files...done.
Traceback (most recent call last):
  File "/home/user/.local/bin/scancode", line 8, in <module>
    sys.exit(scancode())
  File "/home/user/.local/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/user/.local/lib/python3.10/site-packages/commoncode/cliutils.py", line 70, in main
    return click.Command.main(
  File "/home/user/.local/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/user/.local/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/user/.local/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/user/.local/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/user/.local/lib/python3.10/site-packages/scancode/cli.py", line 452, in scancode
    success, _results = run_scan(
  File "/home/user/.local/lib/python3.10/site-packages/scancode/cli.py", line 905, in run_scan
    scan_success = run_scanners(
  File "/home/user/.local/lib/python3.10/site-packages/scancode/cli.py", line 1144, in run_scanners
    scan_success = scan_codebase(
  File "/home/user/.local/lib/python3.10/site-packages/scancode/cli.py", line 1252, in scan_codebase
    scan_timings) = next(scans)
  File "/home/user/.local/lib/python3.10/site-packages/click/_termui_impl.py", line 116, in __next__
    return next(iter(self))
  File "/home/user/.local/lib/python3.10/site-packages/click/_termui_impl.py", line 328, in generator
    for rv in self.iter:
  File "/home/user/.local/lib/python3.10/site-packages/scancode/pool.py", line 52, in wrap
    return func(self, timeout=timeout or 3600)
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 868, in next
    raise TimeoutError from None
multiprocessing.context.TimeoutError

real	60m43,960s
user	4m13,170s
sys	3m25,289s

Scancode was running 60 minutes, before raising the multiprocessing.context.TimeoutError. However, I specified a timeout of --timeout 3800, therefore scancode should have ran for 63 minutes.

@pombredanne
Copy link
Member

@Jeeppler Thanks.... clearly we need to revisit the timeout story! Thank you for all these efforts.

@pombredanne
Copy link
Member

@Jeeppler BTW https://github.com/mercedes-benz/sechub looks awesome!
You may find https://github.com/nexb/vulnerablecode and aboutcode-org/vulnerablecode#801 of some interest.

@pombredanne
Copy link
Member

@Jeeppler

Since Debian 11 comes with Python 3.9 it would be nice to have an archive for Python 3.9

OK. Let's have a issue to track this. Do you mind to enter one? This should be easy enough now that we automated the whole release build. An alternative could be an appimage of sorts. See also #2836

otherwise I would have to install ScanCode via pip. Are there any problems installing ScanCode via pip rather than the archive?

This should work fine and is tested on each build using the latest version of all deps with https://github.com/nexB/scancode-toolkit/blob/ded56e9120f5fdfb9a1a0309130bb4305a66aacb/azure-pipelines.yml#L194
But I cannot gaurantee that all the versions of the dependencies that otherwise satisfy the constraints work, as we do retest previously released versions with newer deps: that would be practically difficult and resource hungry.

If you want to have the deps "same-as-the-app", then something such as this may be better as it will use the same versions as the app:

pip install --constraint https://raw.githubusercontent.com/nexB/scancode-toolkit/v31.2.1/requirements.txt scancode-toolkit==31.2.1

@Jeeppler
Copy link

Jeeppler commented Oct 11, 2022

@pombredanne

Thanks for explaining what the difference between the package archive and the pip installation is. I did not know about the --constraint flag, that is interesting.

Furthermore:

@Jeeppler
Copy link

Jeeppler commented Nov 2, 2022

I did another test with the Armbian image: Armbian_22.08.1_Jetson-nano_jammy_current_5.18.19.img. What I observed was that, first the scancode process has a waiting channel of 0, meaning it is running. The CPU usage is very high. One core will be at 100% of utilization. Indicating to me: scancode does it's work.

After a few minutes (approx. 3-5), all the memory and the entire swap partition are used. See attached screenshots.

scancode_memory_cpu_usage

Then suddenly, CPU utilization and memory drop. Looking at the processes. The scancode process waiting channel will be stuck at pipe_read. I am not sure what happened with scancode:

  • Is it still working?
  • Did a sub-process crash?
  • How can I find out more?

Scan command used:

time scancode --timeout 36000 --verbose --strip-root --license --spdx-tv Armbian.txt Armbian_22.08.1_Jetson-nano_jammy_current_5.18.19.img

@Jeeppler
Copy link

Jeeppler commented Nov 8, 2022

I see the same pattern with a 64 GB machine.

Is there are way to profile/investigate it?

@Jeeppler
Copy link

Jeeppler commented Nov 18, 2022

I was able to confirm: pool.py#L52 is the problem. We overwrote it with a fixed value: https://github.com/mercedes-benz/sechub/blob/develop/sechub-pds-solutions/scancode/docker/Scancode-Debian.dockerfile#L52. Now, it does work for us.

I am not sure what the issue is with Armbian_22.08.1_Jetson-nano_jammy_current_5.18.19.img and the memory usage. However, Scancode does need a lot of memory to scan a large file.

@pombredanne
Copy link
Member

@Jeeppler Very nicely finding. Thank you.
This may be related to #3143 by @nnobelis
@nnobelis Do you mind to mind this patch on your big scan?
@Jeeppler Do you mind to submit a patch/PR?

@pombredanne
Copy link
Member

I am not sure what the issue is with Armbian_22.08.1_Jetson-nano_jammy_current_5.18.19.img and the memory usage. However, Scancode does need a lot of memory to scan a large file.

Each whole file is loaded in memory indeed

@Jeeppler
Copy link

Jeeppler commented Nov 21, 2022

@pombredanne I am happy to submit a patch/PR and fix this issue.

However, my current solution is to overwrite pool.py with brute force. I would have to look at the code in depth to understand it better. I probably will have more question along the way about the Scancode code base.

My expectation is, that the fix allows the user to set the value using the --timeout parameter. I do not consider hard coding the timeout value to 10 hours as a long term solution.

@nnobelis
Copy link

@pombredanne No sorry I cannot test the changes unless there is a ScanCode release of it (I get ScanCode from PIP when building the Docker image).

@nnobelis
Copy link

Thanks @Jeeppler. I will have a look when I have capacity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants