Skip to content

Nuget PDB files should not be included in the nuget #2742

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
YanYas opened this issue Dec 26, 2019 · 15 comments
Closed

Nuget PDB files should not be included in the nuget #2742

YanYas opened this issue Dec 26, 2019 · 15 comments
Assignees
Labels
feature request request for unsupported feature or enhancement

Comments

@YanYas
Copy link

YanYas commented Dec 26, 2019

I appreciate that debug symbols are important for developers but they Symbol packages can be published separately, saving a lot of space for the reuse of the Nuget. as mentioned here.
To see how to publish the symbol packages, there is a guide here.
https://docs.microsoft.com/en-us/nuget/create-packages/symbol-packages-snupkg

@snnn snnn added contributions welcome external contributions welcome feature request request for unsupported feature or enhancement labels Dec 26, 2019
@nietras
Copy link
Contributor

nietras commented Jan 2, 2020

Just a quick note, we prefer having these symbols part of the package. It makes a lot of things easier. The overhead is also not that big compared to the size of dependencies.

@YanYas
Copy link
Author

YanYas commented Jan 4, 2020

Hi and Happy New year @nietras

I think that the decompressed package now weighs in at 578MBs with 2 symbol packs and the included MacOS dylib.

As @hanabi1224 explains here the size is an issue for redistribution to users rather than developers. I make simplification packages for end user tools so the debugging portion isn't something that my end users should need to worry about. They build the tool and use it. If or when they find a bug I fix it and re-ship the package. I would like them to be able to build from a package that could be around a 5fith of the size,

I appreciate that other users have different needs. So I suggested the symbol packages even though my research suggests they may be a bit cumbersome to use.

@nietras, What overhead do you encounter?

@nietras
Copy link
Contributor

nietras commented Jan 7, 2020

Happy new year to you too @YanYas

I definitely understand and think disk footprint is important, but I do not think the pdb files are the main culprit here. Having pdb files available in a symbol server is an option, but often this becomes a problem, if the server is not available for eternity.

The main problem is how OnnxRuntime employs a "one size fits all" approach to nuget packaging. Made worse by there being "one big" nuget package for different execution providers. This means class library use of OnnxRuntime is problematic since you have to deal with this then defining the execution provider for any users of the class library.

In my humble opinion the packaging and project structure needs to be overhauled to be modular and pluggable, this would match the true scope and benefit of the runtime. Below I try to give an example of how OnnxRuntime could be packaged as a small underlying set of packages that can then be references from meta-packages if need be. Each execution provider should have it's own set of packages.

Prefix below with Microsoft.ML. to match current naming. The structuring below is for C# mainly, but same principles can be applied to C/C++ users.

OnnxRuntime.Interop (basic and general interface for OnnxRuntime from C#, does not refer to any runtimes or similar)
OnnxRuntime.runtimes.win-x64 (native dlls only, onnxruntime.dll)
OnnxRuntime.runtimes.linux-x64 
OnnxRuntime.runtimes.macOS-x64
OnnxRuntime.MKLML.Interop (this defines an extension method to allow adding the execution provider from C#)
OnnxRuntime.MKLML.runtimes.win-x64 (native dlls only, onnxruntime-mklml.dll and mklml.dll etc.)
OnnxRuntime.MKLML.runtimes.linux-x64
OnnxRuntime.MKLML.runtimes.macOS-x64

Naming is just examples :) You can replace MKLML with whatever execution provider there is e.g. CUDA, TensorRT etc. You can pick and chose which runtime you want by picking packages. You can have class libraries refer to just OnnxRuntime.Interop and in the final "composition root" e.g. application you can then add whatever execution runtimes you want for whatever platforms you need. This would both reduce disk footprint a lot for our use case, but also allow for publishing of all supported execution providers without making a huge one size fits all package and not the least avoid the "build from source" solution there is right now for any execution providers not available as nuget packages.

You can still have a single meta-package OnnxRuntime that encompasses the default components for ease of use, but this allows any real world use to be much more flexible.

If there is any interest in this I might make a more formal issue proposing the restructuring to give us the above.

@snnn

@YanYas
Copy link
Author

YanYas commented Jan 7, 2020

I now understand and appreciate your point about only being able to debug if the symbol server is available, which isn't always the case.I think that's a really good proposal. I'd like to hear what others think about this one, but you'd have my vote

@jignparm
Copy link
Contributor

jignparm commented Jan 8, 2020

I would like them to be able to build from a package that could be around a 5fith of the size,

Has a user explicitly complained about the size on disk? The primary disadvantage would be download time -- the loading or running time should not be impacted (only relevant dlls are loaded).

Each execution provider should have it's own set of packages.

Fragmenting a single package into multiple packages is not without drawbacks (e.g. one for native assets, another for managed, a third for PDB files). Besides the user confusion of which package to install, there is also a namespace explosion, which increases maintenance and versioning friction.

@YanYas, can you simply delete the PDB and other unrequired files after the package is installed, if disk space is an issue? This way your application can keep the disk utilization to a bare minimum for the end users.

@hanabi1224
Copy link

I think the package should at least provide a way for ppl to be able to opt-out the large PDB file in CI environment instead of doing it manaually. Personally I use below custom build target in Directory.Build.targets to achive it.

<Target Name="RemoveOnnxRuntimePdb" AfterTargets="AfterBuild">
    <WriteLinesToFile
      File="$(OutDir)onnxruntime.pdb"
      Lines="dummy"
      Overwrite="true"
      Encoding="Unicode" />
  </Target>

  <Target Name="RemoveRuntimePdbFromNuget" AfterTargets="ComputeFilesToPublish">
    <ItemGroup>
      <RuntimePdbFromNugetToRemove
        Include="@(ResolvedFileToPublish)"
        Condition=" '%(ResolvedFileToPublish.PackageName)' != '' 
            and '%(ResolvedFileToPublish.AssetType)' == 'native' 
            and '%(Extension)' == '.pdb'
            " >
        <Dummy>$(MSBuildThisFileDirectory)dummy.pdb</Dummy>
      </RuntimePdbFromNugetToRemove>
      <ResolvedFileToPublish Remove="@(RuntimePdbFromNugetToRemove)" />
      <RuntimeDummyPdbToAdd Include="%(RuntimePdbFromNugetToRemove.Dummy)">
        <AssetType>%(RuntimePdbFromNugetToRemove.AssetType)</AssetType>
        <CopyToPublishDirectory>%(RuntimePdbFromNugetToRemove.CopyToPublishDirectory)</CopyToPublishDirectory>
        <DestinationSubPath>%(RuntimePdbFromNugetToRemove.DestinationSubPath)</DestinationSubPath>
        <PackageName>%(RuntimePdbFromNugetToRemove.PackageName)</PackageName>
        <PackageVersion>%(RuntimePdbFromNugetToRemove.PackageVersion)</PackageVersion>
        <RelativePath>%(RuntimePdbFromNugetToRemove.RelativePath)</RelativePath>
      </RuntimeDummyPdbToAdd>
      <ResolvedFileToPublish Include="@(RuntimeDummyPdbToAdd)" />
      <Message Text="Removed native pdb files from nuget: @(RuntimePdbFromNugetToRemove)" Importance="High" Condition=" '@(RuntimePdbFromNugetToRemove)' != '' " />
    </ItemGroup>
  </Target>

@n8allan
Copy link

n8allan commented Feb 22, 2020

Nuget specifically has separate symbol packages for a reason. It is not good practice to include PDB files in your package, especially when they are 130MB. :-( It is also not reasonable to ask people to manually remove files from a nuget package they've downloaded. All this is to say... yes please on this issue.

@pranavsharma
Copy link
Contributor

@nietras Thanks for writing up the proposal. In the upcoming 1.2 release (early March), we plan on separating the managed assembly into a separate package called Microsoft.ML.OnnxRuntime.Managed. Each execution provider will be delivered in it's own separate package without the managed assembly. This aligns with your proposal partially.

At this time we do not plan to separate the pdb files and x86/linux/mac binaries into separate packages for the sake of simplicity of our first party users.

xref: #2184

@faxu faxu closed this as completed Mar 10, 2020
@faxu faxu added the wontfix label Mar 10, 2020
@LittleLittleCloud
Copy link

I'm having the same issue. Including pdb file vastly increase the pack size, could we put pdb file into seperate folder (at least not runtime..), or put them in a separate symbol package, like what ML.Net do.

@net2cn
Copy link

net2cn commented Sep 7, 2021

I encountered this issue just now. With the PDB files included release size goes beyond acceptable for a small inference project. Glad to hear that this is going to be fixed in the near future.

@snnn
Copy link
Member

snnn commented Sep 7, 2021

Change has been merged. The issue will be fixed in onnxruntime 1.9 release.

@snnn
Copy link
Member

snnn commented Sep 23, 2021

@YanYas Now I know: nuget.org doesn't accept symbol packages for native C/C++ code.

@MaximKalininMS
Copy link
Contributor

@snnn So symbols are unavailable now? Is there an open issue for this?

@snnn
Copy link
Member

snnn commented Oct 12, 2021

You can get it from https://github.com/microsoft/onnxruntime/releases/ .
And we have also published the same files to an internal place that every Microsoft employee can access. Send me an email if you are interested to know.

And in the future releases, we plan to publish them to https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/microsoft-public-symbols

@nietras
Copy link
Contributor

nietras commented Apr 1, 2025

You can get it from https://github.com/microsoft/onnxruntime/releases/ .

@snnn unfortunately releases does not include all releases, that is it does not include CUDA 11 builds which means we cannot find pdbs for these https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/onnxruntime-cuda-11/NuGet/Microsoft.ML.OnnxRuntime.Gpu/versions/1.21.0-dev-20241028-0514-dd28f09ce2

And in the future releases, we plan to publish them to https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/microsoft-public-symbols

Source link and proper symbols upload to server would be greatly appreciated, it's now been over 3 years whats the progress on that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request request for unsupported feature or enhancement
Projects
None yet
Development

No branches or pull requests