Skip to content

Build failing on tip : "./build.sh --subsetCategory CoreCLR -c Release -arch x64" #34649

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sdmaclea opened this issue Apr 7, 2020 · 34 comments · Fixed by #114005
Closed

Build failing on tip : "./build.sh --subsetCategory CoreCLR -c Release -arch x64" #34649

sdmaclea opened this issue Apr 7, 2020 · 34 comments · Fixed by #114005
Labels
area-Infrastructure-coreclr in-pr There is an active PR which will close this issue when it is merged
Milestone

Comments

@sdmaclea
Copy link
Contributor

sdmaclea commented Apr 7, 2020

  error: Could not read profile /home/stmaclea/.nuget/packages/optimization.linux-x64.pgo.coreclr/99.99.99-master-20200228.3/data/clrjit.profdata: Unsupported instrumentation profile format version
  [ 75%] Building CXX object src/utilcode/CMakeFiles/utilcode_dac.dir/configuration.cpp.o
  [ 76%] Building CXX object src/utilcode/CMakeFiles/utilcode_dac.dir/collections.cpp.o
  1 error generated.
  make[2]: *** [src/jit/standalone/CMakeFiles/clrjit.dir/__/alloc.cpp.o] Error 1
  make[1]: *** [src/jit/standalone/CMakeFiles/clrjit.dir/all] Error 2
  src/jit/standalone/CMakeFiles/clrjit.dir/build.make:79: recipe for target 'src/jit/standalone/CMakeFiles/clrjit.dir/__/alloc.cpp.o' failed
  CMakeFiles/Makefile2:2423: recipe for target 'src/jit/standalone/CMakeFiles/clrjit.dir/all' failed
  make[1]: *** Waiting for unfinished jobs....
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-Infrastructure-coreclr untriaged New issue has not been triaged by the area owner labels Apr 7, 2020
@janvorli
Copy link
Member

janvorli commented Apr 7, 2020

I have seen this problem in the past and in my case, doing a clean build (git clean -xdf executed in the root of the repo and then rebuild of everything) fixed the problem. Maybe I had to delete the .nuget folder too.

@sdmaclea
Copy link
Contributor Author

sdmaclea commented Apr 7, 2020

I did try ./build.sh --clean which didn't work resolve the issue.

Perhaps --clean also needs to be fixed.

@hoyosjs
Copy link
Member

hoyosjs commented Apr 7, 2020

Clean hasn't worked in a while or been supported I believe, at least in coreclr. I am not sure why it wasn't removed.

@ViktorHofer
Copy link
Member

FWIW clean is working fine but it just cleans the artifacts folder.

@ViktorHofer
Copy link
Member

I'm talking about .\build.cmd/sh -clean: https://github.com/dotnet/runtime/blob/master/eng/common/build.ps1#L123-L129

@hoyosjs
Copy link
Member

hoyosjs commented Apr 7, 2020

The old coreclr clean used to do a bit more, like cleaning the package cache as it used to be directory based. we decided to defer the logic to nuget/dotnet nuget

@sdmaclea
Copy link
Contributor Author

sdmaclea commented Apr 7, 2020

It seems strange that we are using caches outside of the build directory i.e.~/.nuget. It seems to make the build less robust. Perhaps CI doesn't do this?

@ViktorHofer
Copy link
Member

The old coreclr clean used to do a bit more, like cleaning the package cache as it used to be directory based.

Clearing the nuget cache should be done via either dotnet nuget or nuget. There's no need to replicate that logic in dotnet/runtime: dotnet nuget locals -c global-packages. Do you have specific suggestion how to improve the current -clean target?

Perhaps CI doesn't do this?

Right, CI doesn't use the user wide cache but instead puts all packages under runtime/.packages.

@hoyosjs
Copy link
Member

hoyosjs commented Apr 7, 2020

Indeed, ci generates a local .packages folder. Ah, Viktor got to it first :)

Do you have specific suggestion how to improve the current -clean target?

I honestly believe that what we are doing is the right thing. In this case it even feels like deleting the cached version of the optimization package is enough.

@sdmaclea
Copy link
Contributor Author

sdmaclea commented Apr 7, 2020

I honestly believe that what we are doing is the right thing. In this case it even feels like deleting the cached version of the optimization package is enough.

Why?

I see several things we are doing wrong:

  • The build is not reproducible.
  • The build depending on global machine state.
  • The dependency is undocumented.
  • The --clean operation doesn't clean everything and there is no --deep-clean option
  • The dev build is different from the CI build, The dev experience is effectively untested by CI.

@sdmaclea
Copy link
Contributor Author

sdmaclea commented Apr 8, 2020

I retried the build like this and it still failed.

git clean -xdf
rm -rf ~/.nuget
./build.sh --subsetCategory CoreCLR -c Release -arch x64

With the same error.

@jashook
Copy link
Contributor

jashook commented May 11, 2020

Did we ever get to the root cause?

@jashook jashook removed the untriaged New issue has not been triaged by the area owner label May 11, 2020
@sdmaclea
Copy link
Contributor Author

sdmaclea commented May 11, 2020

No I never figured out root cause. I just reverted to an older previously working commit and forgot about it.

When I later updated to tip, the issue was resolved.

I don't think I could repro this anymore... Tempted to close because it cannot be reproed.

@ViktorHofer
Copy link
Member

Ok let's close this then. Feel free to reopen.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 9, 2020
@trylek
Copy link
Member

trylek commented Sep 4, 2021

I'm hitting this to this day when trying to build CoreCLR on Ubuntu in release mode; last time I hit this literally 2 minutes back. I'll be happy to assist any follow-up investigation by collecting artifacts, logs or anything else. @davidwrighton / @janvorli, have you got any suggestions as to how to continue the investigation? Is there an infra problem like pulling down a too old pgo data package or is it a compiler problem, what component consumes the optimization data on Linux, do we need to install a newer clang version, cmake or something else? Does there exist any tooling that would let me dump the pgo data and share its properties?

Thanks a lot

Tomas

@trylek trylek reopened this Sep 4, 2021
@trylek
Copy link
Member

trylek commented Sep 4, 2021

  [ 42%] Building CXX object jit/CMakeFiles/clrjit.dir/alloc.cpp.o
  error: Could not read profile /home/trylek/.nuget/packages/optimization.linux-x64.pgo.coreclr/1.0.0-prerelease.21451.4/data/coreclr.profdata: Unsupported instrumentation profile format version

@ViktorHofer ViktorHofer added this to the 7.0.0 milestone Sep 7, 2021
@akoeplinger
Copy link
Member

I'm hitting this in #61001 as well, had to workaround by passing /p:NativeOptimizationDataSupported=false to build.sh

@davidwrighton
Copy link
Member

I'll take a look at fixing the code flow here

@davidwrighton
Copy link
Member

But my guess is that you have a version of clang that isn't able to consume the profile data we produce. Are you building with the same version of clang that is used in the offical docker build images?

@akoeplinger
Copy link
Member

In #61001 I was updateing from Ubuntu 16.04 to 18.04 so it's possible that there is a different clang version being used.

@agocke agocke removed this from the 7.0.0 milestone Jul 28, 2022
@agocke agocke added this to the Future milestone Jul 28, 2022
@dotnet dotnet unlocked this conversation Nov 24, 2023
ManickaP pushed a commit to ManickaP/runtime that referenced this issue Nov 29, 2023
…ages for enterprise linux pipeline (dotnet#63014)

* Use Ubuntu 18.04 1ES pools and dotnet-buildtools-prereq docker images for enterprise linux pipeline

Ubuntu 16.04 is no longer available on Azure Pipelines, move to 18.04 on 1ES pool and the Docker images from dotnet-buildtools-prereq.

* Workaround dotnet#34649

* Use servicing 1ES pool

* Fix property name for turning off PGO data

It was changed in main with dotnet@4682098

* Fix property value

Co-authored-by: Alexander Köplinger <[email protected]>
@lambdageek
Copy link
Member

Hitting this on Debian trixie/sid while trying to build release/9.0-rc2 with ./build.sh clr+libs -c Release :

...
  error: Error in reading profile /home/aleksey/.nuget/packages/optimization.linux-x64.pgo.coreclr/1.0.0-prerelease.24409.2/data/coreclr.profdata: unsupported instrumentation profile format version
...
$ clang --version
Debian clang version 16.0.6 (27+b1)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

@am11
Copy link
Member

am11 commented Sep 18, 2024

$ clang --version

the relevant version shows up in cmake config logs. Do you see 16 in those logs too?

@krwq
Copy link
Member

krwq commented Mar 26, 2025

Started hitting this suddenly after rebasing. Current commit: 62bd39a and around 2 weeks ago when it was still working: 97a8af9.

Building with: ./build.sh -rc Release -s clr+libs on Ubuntu

  • git clean -fdx + clearing ~/.nuget - didn't help
$ clang --version
Ubuntu clang version 18.1.3 (1ubuntu1)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

Relevant piece of build log:

  [ 50%] Building CXX object jit/CMakeFiles/clrjit_universal_arm64_x64.dir/cmake_pch.hxx.pch
  [ 50%] Building CXX object jit/CMakeFiles/clrjit_win_x64_x64.dir/cmake_pch.hxx.pch
  error: Error in reading profile /home/krwq/.nuget/packages/optimization.linux-x64.pgo.coreclr/1.0.0-prerelease.25171.2/data/coreclr.profdata: unsupported inst                                                                                                                                                                                                rumentation profile format version
  make[2]: *** [jit/CMakeFiles/clrjit.dir/build.make:77: jit/CMakeFiles/clrjit.dir/cmake_pch.hxx.pch] Error 1
  make[1]: *** [CMakeFiles/Makefile2:4338: jit/CMakeFiles/clrjit.dir/all] Error 2

@akoeplinger
Copy link
Member

might be related to the updated packages from f9fc62a ?

@akoeplinger
Copy link
Member

I see that the images were updated to use the mcr.microsoft.com/dotnet-buildtools/prereqs:azurelinux-3.0-net10.0-opt-amd64 container in https://dev.azure.com/dnceng/internal/_git/dotnet-optimization/pullrequest/47824, which uses clang 20 now since dotnet/dotnet-buildtools-prereqs-docker#1369.

Presumably that creates a newer version of the profile format?

@am11
Copy link
Member

am11 commented Mar 26, 2025

The workaround is to just rename profdata mv /home/krwq/.nuget/packages/optimization.linux-x64.pgo.coreclr/1.0.0-prerelease.25171.2/data/coreclr.profdata{,_} for local development. We should probably add an explicit option to disable pgo_support in build.sh, as it's a supported scenario in inner builds

# If we don't have profile data available, gracefully fall back to a non-PGO opt build
if(NOT EXISTS ${ProfilePath})
message("PGO data file NOT found: ${ProfilePath}")

@akoeplinger
Copy link
Member

you can already pass /p:NativeOptimizationDataSupported=false as a workaround. we probably need to add some detection to the build scripts so it ignores the data if it's incompatible

@krwq
Copy link
Member

krwq commented Mar 26, 2025

would upgrading clang to 20 fix this locally? (I already do the workaround but would prefer permanent fix)

@am11
Copy link
Member

am11 commented Mar 26, 2025

would upgrading clang to 20 fix this locally? (I already do the workaround but would prefer permanent fix)

Yup, it would match the CI environment which appears to be working. On Ubuntu/Debian, one quick way to upgrade llvm toolchian is curl -sSL https://apt.llvm.org/llvm.sh | bash -s - 20 all. build.sh has mechanism to pick the latest clang, if multiple installations are found (just delete artifacts/ dir after installing the new toolchain).

@krwq
Copy link
Member

krwq commented Mar 27, 2025

In that case would it be reasonable to just early exit the build with error that clang-20 is required and update build instructions? I think this will be more reasonable than error most of the people don't understand. Or does it need to match specific version the CI is using? (either way I think it's more reasonable to require specific version of something than random error)

@akoeplinger
Copy link
Member

Looks like we already have a check to skip PGO if clang is older than 16, that needs to be bumped:

if((CMAKE_CXX_COMPILER_ID MATCHES "Clang") AND (NOT CMAKE_CXX_COMPILER_VERSION VERSION_LESS 16))

Would be nice if the PGO package could record the clang version so we don't need to manually keep it in sync.

@akoeplinger
Copy link
Member

@LoopedBard3 @caaavik-msft how difficult would adding the clang version be?

@LoopedBard3
Copy link
Member

We are not aware of any way to add the clang version to the PGO package, but we are also not very familiar using packages in this way. If there is a good way to do this, we don't see having any problem with doing it. Otherwise, if there is a different recommended way to flow this kind of information, we can likely do that as well.

@reflectronic
Copy link
Contributor

The codespaces build is failing because of this https://github.com/dotnet/runtime/actions/runs/14099208201/job/39492189942

akoeplinger added a commit that referenced this issue Mar 28, 2025
Contributes to #34649 and fixes the Codespaces prebuild.
@dotnet-policy-service dotnet-policy-service bot added the in-pr There is an active PR which will close this issue when it is merged label Mar 28, 2025
@github-actions github-actions bot locked and limited conversation to collaborators Apr 28, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Infrastructure-coreclr in-pr There is an active PR which will close this issue when it is merged
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.