Skip to content

Helix infrastructure script failures on Linux arm #39244

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
BruceForstall opened this issue Jul 14, 2020 · 7 comments
Closed

Helix infrastructure script failures on Linux arm #39244

BruceForstall opened this issue Jul 14, 2020 · 7 comments
Labels
area-Infrastructure untriaged New issue has not been triaged by the area owner

Comments

@BruceForstall
Copy link
Contributor

Consider:

https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-heads-master-f166106694b64041af/System.Security.Cryptography.Algorithms.Tests/console.c43f3c2f.log?sv=2019-02-02&se=2020-08-02T07%3A37%3A26Z&sr=c&sp=rl&sig=xGk3T6I7LB3HjJTLqeuiEWQ3ohh6Gbeynr3XVrIndxA%3D

(from test run:
https://dev.azure.com/dnceng/public/_build/results?buildId=726692&view=ms.vss-test-web.build-test-results-tab&runId=22502646&resultId=145303&paneView=dotnet-dnceng.dnceng-build-release-tasks.helix-test-information-tab
)

I see several script failures (esp., Python failures). It's not clear to me how fatal these are: do we not get proper reporting, etc.? In addition to the failure, there are lots of warnings.

E.g.:

+ /home/helixbot/.azdo-env/bin/python -c import azure.devops
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'azure'
+ /home/helixbot/.azdo-env/bin/python -c import future
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'future'
+ /home/helixbot/.azdo-env/bin/python -m pip install future==0.17.1
WARNING: The directory '/home/helixbot/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
+ python -B /root/helix/work/correlation/xunit-reporter.py
Traceback (most recent call last):
  File "/root/helix/work/correlation/xunit-reporter.py", line 6, in <module>
    import helix.azure_storage
  File "/root/helix/scripts/helix/azure_storage.py", line 6, in <module>
    from azure.storage.blob import BlobClient, ContentSettings, ExponentialRetry
  File "/root/helix/scripts/azure/storage/blob/__init__.py", line 11, in <module>
    from ._blob_client import BlobClient
  File "/root/helix/scripts/azure/storage/blob/_blob_client.py", line 24, in <module>
    from ._shared.encryption import generate_blob_encryption_data
  File "/root/helix/scripts/azure/storage/blob/_shared/encryption.py", line 19, in <module>
    from cryptography.hazmat.primitives.padding import PKCS7
  File "/root/helix/scripts/cryptography/hazmat/primitives/padding.py", line 13, in <module>
    from cryptography.hazmat.bindings._padding import lib
ImportError: /root/helix/scripts/cryptography/hazmat/bindings/_padding.abi3.so: wrong ELF class: ELFCLASS64
+ exit 1
+ export _commandExitCode=1
+ chmod -R 777 /home/helixbot/dotnetbuild/dumps
+ exit 1

@MattGal can you please route?

@ghost
Copy link

ghost commented Jul 14, 2020

Tagging subscribers to this area: @ViktorHofer
Notify danmosemsft if you want to be subscribed.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Jul 14, 2020
@MattGal
Copy link
Member

MattGal commented Jul 14, 2020

@ulisesh FYI. We've seen this before when the mapped-in python bits involve a native component built differently than the docker container.

This doesn't make your test fail or even not report test results to Azure DevOps though, the only change here is that you won't have the XUnit results ingested into Kusto. We can definitely explore this in a core-eng issue though, as xunit-reporter is not deprecated, much as some might want it to be.

@MattGal
Copy link
Member

MattGal commented Jul 14, 2020

Filed dotnet/arcade#5786 to track updating xunit-reporter to use venv instead of direct python invocation.

@ViktorHofer
Copy link
Member

Closing as this is tracked by linked issue.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 8, 2020
@jozkee
Copy link
Member

jozkee commented Feb 18, 2021

@MattGal
Copy link
Member

MattGal commented Feb 18, 2021

ModuleNotFoundError: No module named 'azure'
ModuleNotFoundError: No module named 'future'

I am consistently hitting the errors reported here on #46101

run: https://dev.azure.com/dnceng/public/_build/results?buildId=990999&view=logs&j=f0c437f4-e81d-5464-8b65-dcbe7e70316d&t=42aef4d2-6bf8-50cc-612b-d0d7fef2403f

helix log: https://helixre8s23ayyeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-46101-merge-6304fdb9d579499197/Common.Tests/console.3a7513c3.log?sv=2019-07-07&se=2021-03-10T01%3A20%3A38Z&sr=c&sp=rl&sig=sIsDilf2iVRQW5gkivi1P5gkwrDWE8MuHbtoYFYcFpw%3D

Should I log a new issue for that or can this one be re-open?

The errors you pasted are expected and come from these lines. That's the output of the system letting it know they're not already installed so it can "pip install them".

From the log you posted, there's nothing wrong with Azure Devops reporting in this log, rather the test crashed:

Collecting future==0.17.1
  Downloading future-0.17.1.tar.gz (829 kB)
Building wheels for collected packages: future
  Building wheel for future (setup.py): started
  Building wheel for future (setup.py): finished with status 'done'
  Created wheel for future: filename=future-0.17.1-py3-none-any.whl size=488729 sha256=a0e2cf3806859ef43d806a0fee776b5710c76203f8a5ae460cd4c56729f27c8f
  Stored in directory: c:\helix\work\workitem\pip\cache\wheels\16\4c\84\8a6161d44282ede60ed233d090156c6109a7ab865e49c1c9f6
Successfully built future
Installing collected packages: future
Successfully installed future-0.17.1
 Thu 02/18/2021- 1:29:50.04
2021-02-18 01:29:50,378: INFO: 1740: run(82): main: Main thread starting 10 workers
Worker 0: starting...
Worker 1: starting...
Worker 2: starting...
Worker 3: starting...
Worker 4: starting...
Worker 5: starting...
Worker 6: starting...
Worker 7: starting...
Worker 8: starting...
Worker 9: starting...
2021-02-18 01:29:50,385: INFO: 1740: run(89): main: Beginning reading of test results.
2021-02-18 01:29:50,385: INFO: 1740: run(98): main: Uploading results in batches of size 1000
2021-02-18 01:29:50,399: INFO: 1740: run(103): main: Main thread finished queueing batches
2021-02-18 01:29:50,399: INFO: 1740: run(107): main: Main thread exiting
Searching 'C:\helix\work\workitem\..' for log files
Generated log list: 
Searching 'C:\helix\work\workitem' for test results files
Searching 'C:\helix\work\workitem\uploads' for test results files
No results file found in any of the following formats: xunit, junit, trx
 Thu 02/18/2021- 1:29:50.42
Did not find dumps, skipping dump docs generation.

[END EXECUTION]
Exit Code:-1073740791

Exit Code:-1073740791 (STATUS_STACK_BUFFER_OVERRUN) being the operative bit here. Crashing with a buffer overrun is not good.

@MattGal
Copy link
Member

MattGal commented Feb 18, 2021

@jozkee another thing to note, the log you posted wasn't even from an ARM machine:

Console log: 'Common.Tests' from job 6304fdb9-d579-4991-97d9-3ec74269c0d4 (windows.10.amd64.serverrs5.open.rt) using docker image mcr.microsoft.com/dotnet-buildtools/prereqs:nanoserver-1809-helix-amd64-08e8e40-20200107182504 on a0023ZE

Given your run was 100% failure with this call stack on every work item and every machine on amd64 that makes this more serious. (EDIT: This seems at least confined to your run, hopefully was just a fluke in payload construction)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Infrastructure untriaged New issue has not been triaged by the area owner
Projects
None yet
Development

No branches or pull requests

5 participants