Skip to content

Create combined GPU tarball and zip file package #8827

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 37 commits into from
Aug 25, 2021

Conversation

chilo-ms
Copy link
Contributor

@chilo-ms chilo-ms commented Aug 24, 2021

Integrate TensorRT EP libraries into existing GPU tarball and zip files.
Following three Artifacts are the final results of combined GPU package:
onnxruntime-linux-x64-gpu
onnxruntime-win-x64-gpu
onnxruntime-java-gpu

previous onnxruntime-xxx-gpu only contains cuda ep library, therefore rename to onnxruntime-xxx-cuda and act as intermediate artifact.

Also, this PR makes small code refactor on copying trt ep libraries for nuget package in c-api-artifacts-package-and-publish-steps-windows.yml.

@chilo-ms chilo-ms requested review from snnn and jywu-msft August 24, 2021 08:22
@chilo-ms chilo-ms requested a review from a team as a code owner August 24, 2021 08:22
@jywu-msft
Copy link
Member

even though the Zip-Nuget-Java Packaging Pipeline reports successful , there are still failures ? Is this expected behavior?

Check failure on line 128 in Build log

@azure-pipelinesazure-pipelines
/ Zip-Nuget-Java Packaging Pipeline
Build log #L128
Error running codesign job: 1 of 1
Check failure on line 129 in Build log

@azure-pipelinesazure-pipelines
/ Zip-Nuget-Java Packaging Pipeline
Build log #L129
GuardianErrorExitCodeException: codesign completed with an Error exit code: 254. No available target files were found and the tool was set to fail under this condition
Check failure on line 132 in Build log

@azure-pipelinesazure-pipelines
/ Zip-Nuget-Java Packaging Pipeline
Build log #L132
Error running tool 1 of 1: codesign
Check failure on line 133 in Build log

@azure-pipelinesazure-pipelines
/ Zip-Nuget-Java Packaging Pipeline
Build log #L133
Error running codesign job: 1 of 1

@chilo-ms
Copy link
Contributor Author

even though the Zip-Nuget-Java Packaging Pipeline reports successful , there are still failures ? Is this expected behavior?

Check failure on line 128 in Build log

@azure-pipelinesazure-pipelines
/ Zip-Nuget-Java Packaging Pipeline
Build log #L128
Error running codesign job: 1 of 1
Check failure on line 129 in Build log

@azure-pipelinesazure-pipelines
/ Zip-Nuget-Java Packaging Pipeline
Build log #L129
GuardianErrorExitCodeException: codesign completed with an Error exit code: 254. No available target files were found and the tool was set to fail under this condition
Check failure on line 132 in Build log

@azure-pipelinesazure-pipelines
/ Zip-Nuget-Java Packaging Pipeline
Build log #L132
Error running tool 1 of 1: codesign
Check failure on line 133 in Build log

@azure-pipelinesazure-pipelines
/ Zip-Nuget-Java Packaging Pipeline
Build log #L133
Error running codesign job: 1 of 1

I saw similar log errors on other build here:
https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=168830&view=results

@snnn
Copy link
Member

snnn commented Aug 24, 2021

I have a PR that has conflict with yours. #8754 I removed bundle_dlls_gpu.bat and put the logic in tools/nuget/generate_nuspec_for_native_nuget.py, for splitting out PDB files to a separated package, otherwise we almost hit the nuget.org 250MB file size limit.

@snnn
Copy link
Member

snnn commented Aug 25, 2021

The "No available target files were found and the tool" can be temporarily ignored. We can address it later.

@@ -357,7 +453,7 @@ jobs:
- task: BatchScript@1
displayName: 'Bundle Native NuGet and other binaries'
inputs:
filename: $(Build.SourcesDirectory)\tools\ci_build\github\windows\bundle_dlls_gpu.bat
filename: $(Build.SourcesDirectory)\tools\ci_build\github\windows\bundle_dlls_gpu_nuget.bat
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this line needs be changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bat file is mainly for bundling libraries for nuget package, simply rename it for clarity.
Also, we need to bundle libraries for GPU zip file, so the name "bundle_dlls_gpu.bat" is better for this case.

@@ -3,20 +3,20 @@ REM Licensed under the MIT License.

REM for available runtime identifiers, see https://github.com/dotnet/corefx/blob/release/3.1/pkg/Microsoft.NETCore.Platforms/runtime.json
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this: we make this file solely for the new jobs you are adding? And remove the nuget part in this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I can remove this link.

snnn
snnn previously approved these changes Aug 25, 2021
@@ -130,14 +133,14 @@ jobs:
displayName: 'Download Pipeline Artifact - Win x64'
inputs:
buildType: 'current'
artifactName: 'drop-onnxruntime-java-win-gpu-x64'
artifactName: 'onnxruntime-java-win-x64-cuda'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep the prefix unchanged. The cpu one is still "drop-onnxruntime-java-win-x64". If you don't change that, it's inconsistent. But if you change that, you will need to change a lot of places in the c-api-cpu.yml. Given we only have 3 days left, I think it's not a good time to do this.

Copy link
Contributor Author

@chilo-ms chilo-ms Aug 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the Artifact names that are related to "GPU" and kept cpu one unchanged.

Copy link
Contributor Author

@chilo-ms chilo-ms Aug 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following three Artifacts are the final results of combined GPU package:
onnxruntime-linux-x64-gpu
onnxruntime-win-x64-gpu
onnxruntime-java-gpu

(previous onnxruntime-xxx-gpu only contains cuda ep library, therefore I rename to onnxruntime-xxx-cuda and act as intermediate artifact)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think there is something wrong with onnxruntime-win-x64-gpu zip
the copy zip file step finds 0 files.

Starting: Copy zip file to: D:\a_work\1\a

Task : Copy files
Description : Copy files from a source folder to a target folder using patterns matching file paths (not folder paths)
Version : 2.190.1
Author : Microsoft Corporation
Help : https://docs.microsoft.com/azure/devops/pipelines/tasks/utility/copy-files

found 0 files
Finishing: Copy zip file to: D:\a_work\1\

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like I said, it is inconsistent. The GPU part of jar packaging looks for "onnxruntime-java-win-x64-cuda", but CPU looks for "drop-onnxruntime-java-win-x64". And you'd need to modify the scripts to be compatible with both.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we identified the issue, it's somewhere else.
But yes, it would be good to keep the naming convention as consistent as possible

@jywu-msft
Copy link
Member

Integrate TensorRT EP libraries into existing GPU tarball and zip files.
The tarball and zip files will be generated and publish under following Azure Pipeline Artifacts:

  • onnxruntime-win-combined-gpu-x64
  • onnxruntime-linux-x64-combined-gpu

Note: Due to original Artifacts onnxruntime-linux-x64-gpu/onnxruntime-win-gpu-x64 act as intermediate file, as well as Azure Pipeline Artifact can't be overwritten, we're going to use the name "combined" as new Artifact name for combined GPU package.

Also, this PR makes small code refactor on copying trt ep libraries for nuget package in c-api-artifacts-package-and-publish-steps-windows.yml.

please edit the description to reflect the final implemenentation.
(the artifacts will not be named -combined-

displayName: 'Copy zip file to: $(Build.ArtifactStagingDirectory)'
inputs:
SourceFolder: '$(Build.BinariesDirectory)\zip-artifacts'
Contents: 'onnxruntime-win-x64-gpu-*.zip'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something wrong here? It's not finding a file. for the copy.

@@ -0,0 +1,23 @@
REM Copyright (c) Microsoft Corporation. All rights reserved.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: I will delete this file in my PR #8754

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the logic will need to be moved to new location then?

Copy link
Member

@snnn snnn Aug 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Instead of injecting the generated DLLs into a nuget package, we should put them in nuspec file and let nuget packaging tool handle it. For two reasons:

  1. It's less risky. We don't manually modify the package
  2. Nuget tool can help us split PDB files out.

@jywu-msft jywu-msft merged commit 32ecbf4 into master Aug 25, 2021
@jywu-msft jywu-msft deleted the trt_ep_tarball_and_zip branch August 25, 2021 20:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants