Skip to content

Sporadic AV in at System.IO.Enumeration.FileSystemEnumerator`1.MoveNext() #46381

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
danmoseley opened this issue Dec 24, 2020 · 5 comments
Closed
Labels
area-System.IO untriaged New issue has not been triaged by the area owner

Comments

@danmoseley
Copy link
Member

eg https://helixre8s23ayyeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-heads-master-9543f189ca2847e3a5/System.IO.FileSystem.Tests/console.2ae12e65.log?sv=2019-07-07&se=2021-01-06T04%3A00%3A30Z&sr=c&sp=rl&sig=DxMutXwZlXIl9LwVIff%2BJoldSZe7zOlgvGkI13JsYCc%3D

  Starting:    System.IO.FileSystem.Tests (parallel test collections = on, max threads = 2)
    System.IO.Tests.File_GetSetTimes.TimesIncludeMillisecondPart_LowTempRes [SKIP]
      Condition(s) not met: "LowTemporalResolution"
Fatal error. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
   at System.IO.Enumeration.FileSystemEnumerator`1[[System.__Canon, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].MoveNext()
   at System.Linq.Enumerable+SelectEnumerableIterator`2[[System.__Canon, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.__Canon, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].ToArray()
   at System.Linq.Buffer`1[[System.__Canon, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]]..ctor(System.Collections.Generic.IEnumerable`1<System.__Canon>)
   at System.Linq.OrderedEnumerable`1+<GetEnumerator>d__17[[System.__Canon, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].MoveNext()
   at System.Linq.Enumerable+SelectIPartitionIterator`2[[System.__Canon, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.__Canon, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].MoveNext()
   at Xunit.Sdk.AssertEqualityComparer`1[[System.__Canon, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].CheckIfEnumerablesAreEqual(System.__Canon, System.__Canon)
   at Xunit.Sdk.AssertEqualityComparer`1[[System.__Canon, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].Equals(System.__Canon, System.__Canon)
   at Xunit.Assert.Equal[[System.__Canon, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]](System.__Canon, System.__Canon, System.Collections.Generic.IEqualityComparer`1<System.__Canon>)
   at Xunit.Assert.Equal[[System.__Canon, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]](System.Collections.Generic.IEnumerable`1<System.__Canon>, System.Collections.Generic.IEnumerable`1<System.__Canon>)
   at System.IO.Tests.FSAssert.EqualWhenOrdered[[System.__Canon, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]](System.Collections.Generic.IEnumerable`1<System.__Canon>, System.Collections.Generic.IEnumerable`1<System.__Canon>)
   at System.IO.Tests.Enumeration.ExampleTests.TestGetFilesWithExtensions()

using this query, it has failed ~43 times in 15 days, which is 1-5% of runs

https://engsrvprod.kusto.windows.net/engineeringdata?query=H4sIAAAAAAAEAG2Qy2rDMBBF9wb%2fw5BVDH6qKS0BZ9HQgLtoKU3JWrGmWEkkBY2M69KPr52nS6uFxDD3zj2albHbwqEi8L0k%2bYamQouwkFpShQLyHAR36KTCMUtZGmUsyu4gnUwZm07SmN1M0u4EvvfHOgNtmnGU3YoAfg%2b3ErXYtc9cIZDj1lEjXQWjt5Y6kng0GDaUdixnSfESL%2bQOT9USydHA9fgp3dwIhFkOGbvvGxsjNWylFrnUGi08mTWB0f1biKvzwXJdVockix%2bUVMgFJYp3MXZ0%2fMPemg2WDl5rrLHnCi95ISzbfXcPqcPLQkKYG01mh%2b9WhsfgEFan9R8hyFgH6%2fa6Q4FUHhq1UtzKL4TS1NqNg171D4Hv%2bd4PE0T9xNIBAAA%3d&web=0

WorkItems
| where Finished > now(-15d)
| where FriendlyName == "System.IO.FileSystem.Tests"
| where ExitCode >= 128
| join kind=inner Jobs on JobId
| where Branch == "refs/heads/master"
| sort by Finished desc
| summarize count(), any(ConsoleUri) by QueueName, ExitCode

QueueName ExitCode count_ any_ConsoleUri
ubuntu.1804.amd64.open.rt 134 15 https://helixre8s23ayyeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-heads-master-448954047d8643a1a8/System.IO.FileSystem.Tests/console.dcb0bd30.log?sv=2019-07-07&se=2021-01-06T03%3A49%3A54Z&sr=c&sp=rl&sig=1%2B7n7Oa5rfSkbxJo6O1HEv%2FzdTrFRWJ13Mh9wqBYqn0%3D
osx.1013.amd64.open 134 12 https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-heads-master-5ea73605af684ce1ae/System.IO.FileSystem.Tests/console.8a2d3775.log?sv=2019-07-07&se=2021-01-06T03%3A47%3A24Z&sr=c&sp=rl&sig=uEw17bd0EOy6cqg%2B55TPONgxS1YdWoXdMel8p55Uvek%3D
ubuntu.1604.amd64.open.rt 134 14 https://helixre8s23ayyeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-heads-master-e8d9bc08dc2441d7a0/System.IO.FileSystem.Tests/console.547b5a51.log?sv=2019-07-07&se=2021-01-06T01%3A52%3A38Z&sr=c&sp=rl&sig=DG1seb%2FgKbIzrNgJNwQwHSbwyf0EG5DTpgG8jpxfwZk%3D
osx.1013.amd64.open 139 2 https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-heads-master-d4522874b9bb414cb7/System.IO.FileSystem.Tests/console.34c81ffa.log?sv=2019-07-07&se=2021-01-05T16%3A21%3A50Z&sr=c&sp=rl&sig=4zJU9E%2BwRaaR8CJm20wM%2B18RLJQK5afOVE4eYjW82xo%3D

unfortunately we aren't getting dumps. On Ubuntu it can't find one, on OSX it doesn't mention looking.

----- end Thu Dec 17 04:21:35 UTC 2020 ----- exit code 134 ----------------------------------------------------------
exit code 134 means SIGABRT Abort. Managed or native assert, or runtime check such as heap corruption, caused call to abort(). Core dumped.
Waiting a few seconds for any dump to be written..
Looking around for any Linux dump..
... found no dump in /root/helix/work/workitem
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.IO untriaged New issue has not been triaged by the area owner labels Dec 24, 2020
@danmoseley
Copy link
Member Author

Not failing on macOS 10.14+

QueueName ExitCode count_
ubuntu.1804.amd64.open.rt 0 653
windows.10.amd64.server19h1.open.rt 0 264
windows.10.amd64.open.rt 0 594
ubuntu.1604.amd64.open.rt 0 1396
windows.7.amd64.open.rt 0 131
sles.12.amd64.open.rt 0 231
windows.10.amd64.server20h1.open.rt 0 148
centos.7.amd64.open.rt 0 232
sles.15.amd64.open.rt 0 232
windows.10.amd64.serverrs5.open.rt 0 395
redhat.7.amd64.open.rt 0 241
debian.9.amd64.open.rt 0 364
windows.81.amd64.open.rt 0 132
windows.10.arm64.open 0 102
ubuntu.1804.armarch.open 0 825
osx.1013.amd64.open 0 319
osx.1015.amd64.open 0 257
osx.1014.amd64.open 0 234
windows.10.arm64v8.open 0 127
ubuntu.1804.amd64.open.rt 134 15
osx.1013.amd64.open 134 12
ubuntu.1604.amd64.open.rt 134 14
osx.1013.amd64.open 139 2
ubuntu.1804.amd64.open.rt 1 1

@danmoseley
Copy link
Member Author

danmoseley commented Jan 2, 2021

@MattGal @safern looks like on some machines it can't write out the dump info:
(ubuntu.1604.amd64.open.rt) using docker image mcr.microsoft.com/dotnet-buildtools/prereqs:alpine-3.12-helix-20200602002622-e06dc59.
https://helixre8s23ayyeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-heads-master-e8d9bc08dc2441d7a0/System.IO.FileSystem.Tests/console.547b5a51.log?sv=2019-07-07&se=2021-01-06T01%3A52%3A38Z&sr=c&sp=rl&sig=DG1seb%2FgKbIzrNgJNwQwHSbwyf0EG5DTpgG8jpxfwZk%3D

+ export 'PYTHONPATH='
+ python /root/helix/work/correlation/gen-debug-dump-docs.py -buildid 924344 -workitem 2338ff9f-07f8-497f-96bc-db66b8c15fd8 -jobid e8d9bc08-dc24-41d7-a0d3-cb82de31f861 -outdir /root/helix/work/workitem/uploads -templatedir /root/helix/work/correlation -dumpdir /home/helixbot/dotnetbuild/dumps
Traceback (most recent call last):
  File "/root/helix/work/correlation/gen-debug-dump-docs.py", line 90, in <module>
    with open(source_file, 'r+') as f:
OSError: [Errno 30] Read-only file system: '/root/helix/work/correlation/debug-dump-template.md'

(ubuntu.1804.armarch.open) using docker image mcr.microsoft.com/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440
https://helixre8s23ayyeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-heads-master-90f9370a472347658b/System.Numerics.Vectors.Tests/console.57db4140.log?sv=2019-07-07&se=2021-01-22T09%3A42%3A53Z&sr=c&sp=rl&sig=9FOefHBBGJloPYkomXAhjcWwH50K6q4PbfSG5cwQYbE%3D

+ python /root/helix/work/correlation/gen-debug-dump-docs.py -buildid 935945 -workitem 3543257d-c8bd-484b-8071-8f823c472b87 -jobid 90f9370a-4723-4765-8bef-1ad4b05bf138 -outdir /root/helix/work/workitem/uploads -templatedir /root/helix/work/correlation -dumpdir /home/helixbot/dotnetbuild/dumps
Traceback (most recent call last):
  File "/root/helix/work/correlation/gen-debug-dump-docs.py", line 90, in <module>
    with open(source_file, 'r+') as f:
OSError: [Errno 30] Read-only file system: '/root/helix/work/correlation/debug-dump-template.md'

which is incidentally followed by

+ python -B /root/helix/work/correlation/xunit-reporter.py
Traceback (most recent call last):
  File "/root/helix/work/correlation/xunit-reporter.py", line 6, in <module>
    import helix.azure_storage
  File "/root/helix/scripts/helix/azure_storage.py", line 6, in <module>
    from azure.storage.blob import BlobClient, ContentSettings, ExponentialRetry
  File "/root/helix/scripts/azure/storage/blob/__init__.py", line 10, in <module>
    from ._blob_client import BlobClient
  File "/root/helix/scripts/azure/storage/blob/_blob_client.py", line 25, in <module>
    from ._shared.encryption import generate_blob_encryption_data
  File "/root/helix/scripts/azure/storage/blob/_shared/encryption.py", line 19, in <module>
    from cryptography.hazmat.primitives.padding import PKCS7
  File "/root/helix/scripts/cryptography/hazmat/primitives/padding.py", line 13, in <module>
    from cryptography.hazmat.bindings._padding import lib
ImportError: /root/helix/scripts/cryptography/hazmat/bindings/_padding.abi3.so: wrong ELF class: ELFCLASS64

@danmoseley
Copy link
Member Author

This particular issue seems to have gone away by itself though...

https://engsrvprod.kusto.windows.net/engineeringdata?query=H4sIAAAAAAAEAGWS30%2fCMBDH30n4Hy48jaSwHxI1xPEgkQQfNEYMj6aspytsLel1gRn%2feDuGdMY%2btNnd7T7f%2b7ZrbXZLiyVBvxeG33DI0SAspJKUo4A0BcEtWllikERJNIqTUXwD0WSaJNNJNE6uJpFbw37v368zUPoQjK4jMQSuhE%2fctYnYxf9CjUQlivqJlwhkubF0kDaHwWtNTuF40IF0S53G35Ll83ghCzx%2frZAsDbqIh6O0cy0QZinEyW3TcKulgp1UIpVKoYFHvSHQqjmXwhPvDVdZfmIZ%2fKAwRy4oLLkDmTNib%2fQWMwsvFVbYKGMXHoNVvXd7Vze7OMJgrhXpAt%2bMZC2Ywfp8Ma0I0sbCpvYmCqTslKjKkhv5hZDpStlg2FRtpAp8c2e0VxKGEIbMa2yu4BuMU%2bVmL6TCLHfGw8n4oM50UZWKIG27vzM4tiEX8QBCNxalftiaHyWlhf5076Lf%2bwFEZ8s2YwIAAA%3d%3d&web=0

WorkItems
| where Finished > now(-60d) and Finished < now(-1d)
| where FriendlyName == "System.IO.FileSystem.Tests"
| join kind=inner Jobs on JobId
| where Branch == "refs/heads/master"
| sort by Finished desc
| summarize count() by bin(Finished, 1d), ExitCode
| render linechart with (ycolumns = count_, xcolumn = Finished, series=ExitCode, yaxis=log)

image

@MattGal
Copy link
Member

MattGal commented Jan 4, 2021

@MattGal @safern looks like on some machines it can't write out the dump info:
(ubuntu.1604.amd64.open.rt) using docker image mcr.microsoft.com/dotnet-buildtools/prereqs:alpine-3.12-helix-20200602002622-e06dc59.
https://helixre8s23ayyeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-heads-master-e8d9bc08dc2441d7a0/System.IO.FileSystem.Tests/console.547b5a51.log?sv=2019-07-07&se=2021-01-06T01%3A52%3A38Z&sr=c&sp=rl&sig=DG1seb%2FgKbIzrNgJNwQwHSbwyf0EG5DTpgG8jpxfwZk%3D

+ export 'PYTHONPATH='
+ python /root/helix/work/correlation/gen-debug-dump-docs.py -buildid 924344 -workitem 2338ff9f-07f8-497f-96bc-db66b8c15fd8 -jobid e8d9bc08-dc24-41d7-a0d3-cb82de31f861 -outdir /root/helix/work/workitem/uploads -templatedir /root/helix/work/correlation -dumpdir /home/helixbot/dotnetbuild/dumps
Traceback (most recent call last):
  File "/root/helix/work/correlation/gen-debug-dump-docs.py", line 90, in <module>
    with open(source_file, 'r+') as f:
OSError: [Errno 30] Read-only file system: '/root/helix/work/correlation/debug-dump-template.md'

Two things going on here: first is dotnet/arcade#5786 which we've discussed separately.

The second is that you shouldn't be modifying files in your correlation directory. It's difficult to prevent this outside a docker container, but it's trivially easy to make it the law inside one since we're just mounting the correlation payload folder as a volume. When you opened the file you opened it read/write despite it only being used for reading.

I went ahead and made a PR removing the + here: #46538 which should solve that part of the errors.

@danmoseley
Copy link
Member Author

Thank you @MattGal.

@ghost ghost locked as resolved and limited conversation to collaborators Feb 3, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.IO untriaged New issue has not been triaged by the area owner
Projects
None yet
Development

No branches or pull requests

3 participants