Skip to content

[API Proposal]: "*App*" wildcard for [UnsafeAccessor] assembly name #106216

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Sergio0694 opened this issue Aug 9, 2024 · 36 comments
Closed

[API Proposal]: "*App*" wildcard for [UnsafeAccessor] assembly name #106216

Sergio0694 opened this issue Aug 9, 2024 · 36 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Runtime.CompilerServices
Milestone

Comments

@Sergio0694
Copy link
Contributor

Sergio0694 commented Aug 9, 2024

Background

As part of CsWinRT 2.1.1, we introduced AOT support in WinRT scenarios. In order to make this happen, CsWinRT now also generates a fair amount (read: a metric ton) of code to make things such as WinRT generics work. This support comes with several drawbacks though. The first one is that it is viral: we need every single library to reference CsWinRT in order to make types AOT compatible when used for WinRT interop. This doesn't just affect libraries that are specific to Windows, but any library which contains types that will be used in WinRT scenarios. For instance, I've had to add the Windows TFM to the MVVM Toolkit, so that the types in the MVVM Toolkit could get the necessary supporting code generated. This isn't really fixable, it's just the state of things now that WinRT support is lifted.

On top of this however, we have two main problems that we'd like to solve:

  • Versioning: right now CsWinRT generates all this code in all of those libraries, which won't get any new updates no matter what CsWinRT version the final application is using, until every single dependency goes and also updates CsWinRT to regenerate that code, and ships an update. This is completely not scalable. It might take months for a large project to get every single dependency to publish updated versions to gain the codegen improvements.
  • Binary size: there is a ton of duplicate code across every single library you reference, because each library will generate supporting code for every possible generic type instantiation it might need, and there's no way to share this code.

For instance, here's just some of these supporting types generated in the MVVM Toolkit:

image

It's very easy for a large app to have even hundreds of these spread across all your dependencies and projects. And it's not uncommon that a lot of these will just be copies and copies of the same combinations of type arguments (eg. string).

We'd like improve things and move as much code generation as possible down to the published projects.

API proposal

The proposal is not an API, but an extension on top of #90081. We'd like to add a new "*App*" wildcard that can be used in place of assembly names for [UnsafeAccessor] methods. The exact semantics would be controlled via a new feature switch that can be configured via the new UnsafeAccessorOutputAssemblyResolutionType property. There would be two modes:

  • "Static": in this configuration, "*App*" means:

"When using NAOT or self-contained publishing, it refers to the project being published. Otherwise, it refers to the same assembly that Assembly.GetEntryAssembly would return. If it would return null, NotSupportedException is thrown."

  • "Dynamic": in this configuration, "*App*" means:

"Always invoke AssemblyLoadContext.Resolving and use the returned assembly"

Example use

Consider the current CsWinRT code generation for some enumerable type, such as ObservableGroup<TKey, TElement>:

Generated CCW vtable (click to expand):
internal sealed class ObservableGroupedCollection_TKey__TElement_WinRTTypeDetails : IWinRTExposedTypeDetails
{
	public ComWrappers.ComInterfaceEntry[] GetExposedInterfaces()
	{
		_ = IReadOnlyList_System_Collections_IList.Initialized;
		_ = IReadOnlyList_System_ComponentModel_INotifyPropertyChanged.Initialized;
		_ = IReadOnlyList_System_Collections_Specialized_INotifyCollectionChanged.Initialized;
		_ = IReadOnlyList_System_Collections_IEnumerable.Initialized;
		_ = IReadOnlyList_object.Initialized;
		_ = IEnumerable_System_Collections_IList.Initialized;
		_ = IEnumerable_System_ComponentModel_INotifyPropertyChanged.Initialized;
		_ = IEnumerable_System_Collections_Specialized_INotifyCollectionChanged.Initialized;
		_ = IEnumerable_System_Collections_IEnumerable.Initialized;
		_ = IEnumerable_object.Initialized;
		return new ComWrappers.ComInterfaceEntry[14]
		{
			new ComWrappers.ComInterfaceEntry
			{
				IID = IReadOnlyListMethods<IList>.IID,
				Vtable = IReadOnlyListMethods<IList>.AbiToProjectionVftablePtr
			},
			new ComWrappers.ComInterfaceEntry
			{
				IID = IReadOnlyListMethods<INotifyPropertyChanged>.IID,
				Vtable = IReadOnlyListMethods<INotifyPropertyChanged>.AbiToProjectionVftablePtr
			},
			new ComWrappers.ComInterfaceEntry
			{
				IID = IReadOnlyListMethods<INotifyCollectionChanged>.IID,
				Vtable = IReadOnlyListMethods<INotifyCollectionChanged>.AbiToProjectionVftablePtr
			},
			new ComWrappers.ComInterfaceEntry
			{
				IID = IReadOnlyListMethods<IEnumerable>.IID,
				Vtable = IReadOnlyListMethods<IEnumerable>.AbiToProjectionVftablePtr
			},
			new ComWrappers.ComInterfaceEntry
			{
				IID = IReadOnlyListMethods<object>.IID,
				Vtable = IReadOnlyListMethods<object>.AbiToProjectionVftablePtr
			},
			new ComWrappers.ComInterfaceEntry
			{
				IID = IEnumerableMethods<IList>.IID,
				Vtable = IEnumerableMethods<IList>.AbiToProjectionVftablePtr
			},
			new ComWrappers.ComInterfaceEntry
			{
				IID = IEnumerableMethods<INotifyPropertyChanged>.IID,
				Vtable = IEnumerableMethods<INotifyPropertyChanged>.AbiToProjectionVftablePtr
			},
			new ComWrappers.ComInterfaceEntry
			{
				IID = IEnumerableMethods<INotifyCollectionChanged>.IID,
				Vtable = IEnumerableMethods<INotifyCollectionChanged>.AbiToProjectionVftablePtr
			},
			new ComWrappers.ComInterfaceEntry
			{
				IID = IEnumerableMethods<IEnumerable>.IID,
				Vtable = IEnumerableMethods<IEnumerable>.AbiToProjectionVftablePtr
			},
			new ComWrappers.ComInterfaceEntry
			{
				IID = IEnumerableMethods<object>.IID,
				Vtable = IEnumerableMethods<object>.AbiToProjectionVftablePtr
			},
			new ComWrappers.ComInterfaceEntry
			{
				IID = IListMethods.IID,
				Vtable = IListMethods.AbiToProjectionVftablePtr
			},
			new ComWrappers.ComInterfaceEntry
			{
				IID = INotifyCollectionChangedMethods.IID,
				Vtable = INotifyCollectionChangedMethods.AbiToProjectionVftablePtr
			},
			new ComWrappers.ComInterfaceEntry
			{
				IID = INotifyPropertyChangedMethods.IID,
				Vtable = INotifyPropertyChangedMethods.AbiToProjectionVftablePtr
			},
			new ComWrappers.ComInterfaceEntry
			{
				IID = IEnumerableMethods.IID,
				Vtable = IEnumerableMethods.AbiToProjectionVftablePtr
			}
		};
	}
}

This, along with all of those IReadOnlyList_System_Collections_IList etc. implementations (each of those is and lot of code as well). With this proposed feature, we'd be able to not generate any of that, and simply do this instead:

[assembly: WinRTGenericTypeInstantiation(typeof(IReadOnlyList<IList>))]
[assembly: WinRTGenericTypeInstantiation(typeof(IReadOnlyList<INotifyPropertyChanged>))]
// etc.

For any generic type instantiation we need. Then, replace those Initialized accesses with:

[UnsafeAccessor(UnsafeAccessorKind.StaticMethod, Name = "get_Initialized")]
[UnsafeAccessorType("WinRT.GenericHelpers.IReadOnlyList_System_Collections_IList, *App*")]
static extern bool IReadOnlyList_System_Collections_IList();

And just call those, which would call the shared generated generic type instantiation in the final assembly. This support could technically us to go even further and potentially just have the entire vtable implementation call to some well known generated method in the final assembly, if we found that method to be even better for CsWinRT (either would work).

Essentially, this allows us to:

  • Share all these instantiations
  • Minimize the generated code that has to be updated in all dependencies
  • Keep everything fully trimmable when in self-contained mode
  • Still work in ALC scenarios
  • Be predictable (the behavior depends on the feature switch, so it's consistent)

Additional information

This design includes suggestions from @jkoritzinsky, who pointed out that in several scenarios around ALC, there is no "entry assembly", or you might have a given dependency library being loaded in one ALC and then referenced by some other code in another ALC. In those cases, the "Dynamic" mode would allow eg. CsWinRT to implement an appropriate resolver callback.

cc. @AaronRobinsonMSFT @jkotas @MichalStrehovsky @agocke

@Sergio0694 Sergio0694 added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Aug 9, 2024
@ghost ghost added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Aug 9, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Aug 9, 2024
@AaronRobinsonMSFT AaronRobinsonMSFT added this to the Future milestone Aug 9, 2024
@AaronRobinsonMSFT AaronRobinsonMSFT removed untriaged New issue has not been triaged by the area owner needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Aug 9, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-runtime-compilerservices
See info in area-owners.md if you want to be subscribed.

@AaronRobinsonMSFT
Copy link
Member

AaronRobinsonMSFT commented Aug 10, 2024

The #90081 propsal is written in a very narrow way. The input string would simply be passed to Type.GetType(string typeName) and load the target type. This proposal would increase that complexity substantially. Instead of expecting simple Type strings that can easily be computed in the runtime, we would be creating a type of Domain Specific Language (DSL) that we'd need to parse and then subsequently reconstruct with a new name.

The Static version seems okay, but does come with serious costs around manipulating the supplied typeName, which I would prefer to avoid. I could see a case where UnsafeAccessorTypeAttribute has a seperate AssemblyName property/enum that could be set, but that again means we are expected to construct a type name out of parts.

The Dynamic version is ill-advised because calling the same function would seem to change the content of the method. This would be a redesign of the entire feature. Once an UnsafeAccessor is generated, the code is set. This proposal would mean we would need to look up the stack determine what we are calling to know what to generate alternatively we could bake that look up in the generated stub, but that is quite expensive too. I'd be very much against this as it would impose substantial complexity.

I will need to read over the issues here, but the Dynamic version is a non-starter in my opinion. The Static one seems feasible.

@jkoritzinsky
Copy link
Member

In my initial thinking, the Dynamic mode would call the resolution callback once per UnsafeAccessor method (when the IL is generated and the target is resolved). It wouldn't call it every time the method is called.

@jkotas
Copy link
Member

jkotas commented Aug 10, 2024

Why is it not an option to use a fixed assembly name to generate the app specific WinRT interop?

@AaronRobinsonMSFT
Copy link
Member

In my initial thinking, the Dynamic mode would call the resolution callback once per UnsafeAccessor method (when the IL is generated and the target is resolved). It wouldn't call it every time the method is called.

Okay, so that is the latter case. For me the concern comes down to I can call the function with the same inputs, but its behavior changes if Assembly A calls it vs if Assembly B calls it. That is how I read that feature and I'm not a fan. It creates a very complex scenario. Everytime an UnsafeAccessor is JIT, we need to determine details about the caller, that seems less than ideal.

@jkoritzinsky
Copy link
Member

In my initial thinking, the Dynamic mode would call the resolution callback once per UnsafeAccessor method (when the IL is generated and the target is resolved). It wouldn't call it every time the method is called.

Okay, so that is the latter case. For me the concern comes down to I can call the function with the same inputs, but its behavior changes if Assembly A calls it vs if Assembly B calls it. That is how I read that feature and I'm not a fan. It creates a very complex scenario. Everytime an UnsafeAccessor is JIT, we need to determine details about the caller, that seems less than ideal.

It wouldn't change based on caller of the UnsafeAccessor method. Assembly A and B would both have their own UnsafeAccessor methods that happen to be described the same but declared in different assemblies.

In any case, @jkotas's idea would be significantly more straightforward if it would solve the scenario.

@AaronRobinsonMSFT
Copy link
Member

It wouldn't change based on caller of the UnsafeAccessor method. Assembly A and B would both have their own UnsafeAccessor methods that happen to be described the same but declared in different assemblies.

I don't think I understand this. So let's say assembly A and assembly B both have the following public type:

public class Accessor
{
    [UnsafeAccessor(UnsafeAccessorKind.StaticMethod, Name = "get_Initialized")]
    [UnsafeAccessorType("WinRT.GenericHelpers.IReadOnlyList_System_Collections_IList, *App*")]
    public static extern bool IReadOnlyList_System_Collections_IList();
}

If I call A.Accessor.IReadOnlyList_System_Collections_IList() vs B.Accessor.IReadOnlyList_System_Collections_IList(), will they give me different things or the same thing? From what you're saying it sounds like the different because they are using the ALC they are loaded in to resolve, right?

@AaronRobinsonMSFT
Copy link
Member

would be significantly more straightforward if it would solve the scenario.

Why won't it? I thought I had proposed this offline a while ago as well.

@Sergio0694
Copy link
Contributor Author

"Why is it not an option to use a fixed assembly name to generate the app specific WinRT interop?"

Mmh could you elaborate on how would this work? Suppose I have some project B (either an app, or perhaps a WinRT component), where I reference library A.dll, which has that [UnsafeAccessor] for some well known interop .dll it expects. I now build/deploy/publish project B. Who produces that interop .dll, and how? 🤔

@jkoritzinsky
Copy link
Member

Project B would have targets that run after CoreCompile that would generate and compile a sidecar assembly (likely with a temp csproj like how WPF XAML works today) that is only referenced by UnsafeAccessor methods. It would be copied out to the build and publish directories, added to the deps.json, and passed to crossgen2 and ILC as a reference.

@Sergio0694
Copy link
Contributor Author

Mmh I see, that's interesting. Just so I understand, any particular reason why this would have to be after CoreCompile and then manually copied and added to deps.json, etc.? Could it not be compiled before CoreCompile and then the assembly added as <Reference> to the actual application project being built, so that everything else would "just work"?

@jkoritzinsky
Copy link
Member

You'd need to do it after CoreCompile if you want to analyze the "app" project easily and have it's generic helpers in this shared sidecar assembly as well.

@Sergio0694
Copy link
Contributor Author

Sergio0694 commented Aug 10, 2024

I see. Ok this idea is growing on me, and I could see this giving us a way to introduce all kinds of global program view optimizations for CsWinRT in the future. Is there some documented way to do this that we could take a dependency on in CsWinRT? That is, this part slightly worries me:

"It would be copied out to the build and publish directories, added to the deps.json, and passed to crossgen2 and ILC as a reference."

Are there public docs on what properties we'd need to set, where exactly we'd need to copy that .dll to, etc.?
Related: would it make sense to add some built-in SDK/MSBuild support for this type of thing to do it automatically?

@jkoritzinsky
Copy link
Member

This would basically be done with MSBuild targets like how WPF does it.

You'd want to hook after CoreCompile or before GenerateBuildDependencyFile and generate a csproj file in a subdirectory of the obj folder. Then build it and add the resulting assembly as a ReferencePath (you might need Private=true as well, not sure). Then that should flow through into the deps file, get copied into the right places, and passed to R2R and ILC.

There shouldn't be any .NET SDK work necessary here.

@AaronRobinsonMSFT
Copy link
Member

@Sergio0694 Based on the conversation about it doesn't look like this addition to the API is going to be considered. Feel free to re-open if needed, but I'm closing for now.

@Sergio0694
Copy link
Contributor Author

Sergio0694 commented Aug 27, 2024

Yup, that works for me! We'll be working on an implementation using an extension .dll as suggested 🙂
Thank you everyone for the help!

@github-actions github-actions bot locked and limited conversation to collaborators Sep 28, 2024
@sharwell sharwell reopened this Apr 18, 2025
@dotnet dotnet unlocked this conversation Apr 18, 2025
@Sergio0694
Copy link
Contributor Author

@jkotas @AaronRobinsonMSFT I've discussed possible approaches with Roslyn folks (@sharwell, @jaredpar, @jasonmalinowski, and others), and it seems like there's basically no good way to implement the sidecar .dll approach that would've been the replacement for this proposal. The main issues are basically that you end up choosing between two (bad) options:

  • You create some build task that creates the sidecar project and copies it to the output. This is extremely difficult to get right, very brittle (Sam pointed out how all the exact incremental build steps are poorly documented and also have subtle differences eg. between Visual Studio and Visual Studio code, and other IDEs too), and would also not really work with Edit and Continue. It would also not benefit from any of the incrementality features of source generators, so it'd further slow builds down.
  • You can ask developers to create an actual project in their solution that will produce the sidecar .dll, have them reference it from their app, and then setup the generator so that it will produce things in that project. This also means that the generator in that project won't be able to read info from the app project (as it'd be upstream of that), so you'd also have to ask developers to keep some kind of .txt file as an additional file in that project, to instruct the generator on what types to produce. So you'd end up with both an additional very weird project in the solution, and this .txt file with a (massive) list of types to generate. People are always very confused today with CsWinRT 2.x as to why they even need to make types partial. This would be a whole other level of confusion and complexity on top of that, and a very considerable regression in developer UX.

In light of this, could we perhaps reconsider this proposal? Was there any technical limitation that made this not feasible? Note that if it helped make things simpler, it'd also be acceptable for us to have some MSBuild property set in the .csproj to specifically tell the runtime "what .dll is the entry .dll", and we could just set that from the .csproj (or from our .targets in CsWinRT and/or the .NET SDK).

Thank you! 🙂

@AaronRobinsonMSFT
Copy link
Member

For now, I would explore option (1) since it is technically feasible. It does have complexity, but that is true for any number of MSBuild apporaches that we employ - this isn't changing the existing complexity in the system by all that much.

It would also not benefit from any of the incrementality features of source generators, so it'd further slow builds down.

By how much? Is this an existing pain point? Are there other wins in this area for CsWinRT 3.0 that may compensate for this impact?

@jkotas
Copy link
Member

jkotas commented Apr 19, 2025

You may want to take a look at how similar problems are solved in Android and iOS interop. Opinionated interop systems often want to generate some additional code based on the global application view as an optimization. I agree that the msbuild integration is not simple and it has problems, but it works, and it is very flexible. If we were to invent features to enable generation of code based on the global application view, I would like to see Android and iOS to take advantage of them too.

BTW: I think it is unlikely that we will get to implementing #114179 in .NET 10. (@AaronRobinsonMSFT please correct me if you think otherwise.)

@AaronRobinsonMSFT
Copy link
Member

BTW: I think it is unlikely that we will get to implementing #114179 in .NET 10. (@AaronRobinsonMSFT please correct me if you think otherwise.)

I'm also dubious we will get all of those check boxes done. It is going to depend on how/if priorities shift and resourcing.

I'm currently investigating the cost for UnsafeAccessorTypeAttribute. I've been able to implement it on CoreCLR, but there is a corner case that I'll need to think through. The native AOT solution is likely straight forward, but the mono implementation concerns me due to lack of knowledge in the space. Since this is part of the existing UnsafeAccessorAttribute feature, which is supported in mono, I think we need that to work as well.

@Sergio0694
Copy link
Contributor Author

Sergio0694 commented Apr 19, 2025

"By how much? Is this an existing pain point? Are there other wins in this area for CsWinRT 3.0 that may compensate for this impact?"

It's more something that we're worried about in advance, since Roslyn folks have also mentioned this as something to consider. We'd like to keep builds fast enough, if possible. But yes, there definitely are wins here that would compensate with a (hopefully minimal) regression in build times. The whole new architecture for 3.0 would allow every single CCW vtable and every single ComInterfaceEntry array in the entire app domain to be completely pre-initialized by ILC, including for all generics. It should basically bring all that part of the WinRT interop stack to be on par with what .NET Native could do. It would also speed up a lot of marshalling stubs due to us being able to just have direct calls to the right code in the sidecar, rather than having to use some kind of lazy loaded lookup caching strategy like we do in CsWinRT 2.0.

"You may want to take a look at how similar problems are solved in Android and iOS interop. Opinionated interop systems often want to generate some additional code based on the global application view as an optimization. I agree that the msbuild integration is not simple and it has problems, but it works, and it is very flexible."

Right, I've been doing that too and talking with both Roslyn folks, and folks from Android/iOS, to figure out the best approach. The idea is that given that all this stuff in CsWinRT 3.0 would be new, I'd really like to make sure that we get on a properly supported path from the start and don't end up getting ourselves stuck in a corner with an improper setup. It seems that:

  • Android is using Cecil to emit the .dll as just IL (here). This can work, and this approach also has the advantage that it completely sidesteps all the issues from trying to use Roslyn to produce the .dll, however the obvious cost is the complexity of having to generate IL directly instead of producing C# source files.
  • iOS is invoking <Csc> directly from a .targets file (here), which @jaredpar said is basically unsupported and shouldn't be done. So given the stuff in CsWinRT is new, we wouldn't want to use this approach.
  • @jkoritzinsky suggested a possible alternative being to create a .csproj with the source files we need (somewhere in the \obj folder), add some dummy Directory.Build.props and .targets to disable importing things, and disabling all implicit SDK features like automatic framework references and whatnot. Then we pass all the .dll references manually, add the sources we generate, and invoke <MSBuild> on that. I suppose we could try doing this at least as a starting point.
  • Using this *App* thing seemed like a good alternatively to avoid all those issues the Roslyn folks mentioned with respect to how to correctly wiring up MSBuild etc., but Jeremy pointed out that it's not trivial to make things work in hosted scenarios with multiple ALCs being used, and other configurations like that.

It seems at least for now our plan might be:

  • Try the "blank .csproj" approach and see if it works
  • Fallback to using Cecil like Android does, if the first approach doesn't work

"If we were to invent features to enable generation of code based on the global application view, I would like to see Android and iOS to take advantage of them too."

Yeah, that's exactly the kind of thing I had in mind, just like the whole type map was designed to support all the various interop stacks (Android, iOS, WinRT, etc.), I was hoping there could be a system to handle this problem too that could be shared, so that we'd avoid having every single team doing this, needing to reinvent the wheel, and ending up with a standalone system that also means, having to likely stumble upon the same bugs as the other ones (with things such as incremental builds).

"BTW: I think it is unlikely that we will get to implementing #114179 in .NET 10."

Yeah to clarify, we don't expect all those boxes to be completed in .NET 10. As long as the ones that would be blocking for us are implemented (eg. the type map), it would be fine for us. Some of the items there are nice to have but not blocking.

"I'm currently investigating the cost for #90081. [...]"

Oh that's awesome to hear!! I assumed there'd be no chance for that in .NET 10 at all 😄

For context: having [UnsafeAccessorType] would greatly simplify things in CsWinRT, because we do need a system for projections and user code to refer to interop code that will be produced in the sidecar .dll generated during the final build. @jkoritzinsky suggested a workaround we can use if that doesn't make it, which is to define our own "polyfill" attribute, and ship an IL rewriter build task that manually processes assemblies and rewrites those unsafe accessor methods to point to the sidecar .dll (and adds the assembly ref in the metadata table, which is another thing that wouldn't be needed if we could just use [UnsafeAccessorType] instead). However, that would add significant complexity to our architecture, especially given it'd just be needed as a temporary workaround until we got everyone on .NET 11 (or, on whatever version would have [UnsafeAccessorType]). If we could just rely on that from the start, it'd remove a non trivial amount of complexity (and chances of breaking things) on our side 😅

@jkotas
Copy link
Member

jkotas commented Apr 19, 2025

create a .csproj with the source files we need (somewhere in the \obj folder), add some dummy Directory.Build.props and .targets to disable importing things, and disabling all implicit SDK features like automatic framework references and whatnot. Then we pass all the .dll references manually, add the sources we generate, and invoke on that.

Do you think that this solution is less fragile than invoking Csc directly?

Fallback to using Cecil like Android does, if the first approach doesn't work

If you just need to generate a bit of simple IL to stitch things together, I would avoid Cecil dependency for that. You can use System.Reflection.Metadata directly. Of if you want something a bit higher level, use MetadataLoadContext and PersistedAssemblyBuilder.

@Sergio0694
Copy link
Contributor Author

"Do you think that this solution is less fragile than invoking Csc directly?"

Maybe? The main argument would be that it would at least be a supported scenario, whereas manually calling <Csc> is explicitly not supported, and the Roslyn folks have specifically told us not to do that 😅

"If you just need to generate a bit of simple IL to stitch things together, I would avoid Cecil dependency for that. You can use System.Reflection.Metadata directly. Of if you want something a bit higher level, use MetadataLoadContext and PersistedAssemblyBuilder."

The sidecar .dll would need a whole bunch of code being generated (all marshalling code for generic instantiation). But yeah for sure we could look into those other APIs too. I think @jkoritzinsky suggested Cecil mostly because it's what everyone also uses for this kind of thing, so it might simplify some things if we used the same tooling as other similar projects 🤔

@jkotas
Copy link
Member

jkotas commented Apr 20, 2025

manually calling is explicitly not supported

Csc task is documented at https://learn.microsoft.com/en-us/visualstudio/msbuild/csc-task . I do not see anything in that documentation that says that it is unsupported.

It may be a good idea to drill into the concerns with calling Csc task directly. I believe that these concerns are centered around computing list of dependencies from nothing. It is close to impossible to do it correctly without using the SDK/NuGet integration. Once you are using the integration, you are not going to call Csc directly.

I do not think that these concerns apply here. You do have list of dependencies to pass to the compiler, computed using the sanctioned method earlier. You just want to compile one more file using those same dependencies.

Cecil mostly because it's what everyone also uses for this kind of thing

Many older projects use Cecil to emit IL since there was no other viable option years ago when they started. It is not the case anymore.

@Sergio0694
Copy link
Contributor Author

"I do not see anything in that documentation that says that it is unsupported."

I think it's probably better for @jaredpar to chime in on that. I do agree with you that if the task is publicly documented (I actually didn't know that), and given that we're simply forwarding all dependencies and properties from MSBuild for the current project, and we're not actually resolving them ourselves, using Csc directly would make sense. I mean that's pretty much exactly what we're trying to do: "please build an additional .dll with these additional source files in the same way the current .dll for this app/library has been built, ie. with all the same exact .NET SDK, build properties, referenced .dll-s, etc.". Basically those same source files should've built correctly had they been in the same app project too, except we need them to be in this sidecar .dll instead, but the rest is the same.

If we could actually confirm that in this specific scenario, this use of Csc is actually fine, we'll happily take it. Building the sidecar from source would be infinitely more convenient for us rather than trying to emit IL directly, plus this would put us on the same path as the iOS tooling, meaning we could likely share efforts going forwards to address any common issues or concerns.

If we can confirm that this approach is fine and a supported scenario, I'm more than happy to close this issue again and remove it from the epic then. Especially if there's a chance we can make [UnsafeAccessorType] happen, meaning we'd just need to implement and maintain the sidecar .dll logic (but no IL rewriter to polyfill the unsafe accessors), that wouldseem completely reasonable.

Also, I do appreciate you all taking the time to provide advice and help us figure out the right path forwards 🙂

@jaredpar
Copy link
Member

Csc task is documented at https://learn.microsoft.com/en-us/visualstudio/msbuild/csc-task . I do not see anything in that documentation that says that it is unsupported.

I will work with @BillWagner on documentation updates here. It's likely won't be put into fully unsupported but needs to be in the strongly discouraged category. Having an invocation of the <Csc> task is a maintenance burden for the compiler team and .NET ecosystem.

It may be a good idea to drill into the concerns with calling Csc task directly. I believe that these concerns are centered around computing list of dependencies from nothing.

That is certainly part of the problem. The bigger problem though is that the set of arguments that are effectively required to run the <Csc> task correctly are not constant. These can, and do, expand with regularity as the .NET ecosystem evolves. Invocations of <Csc> that the .NET Team does not control end up getting defaults for these arguments and those are not always sensible in the ecosystem. Here are some specific examples that have come up over time:

  • Analyzers: the WPF usage of <Csc> for temp assemblies did not pass along this argument and there is no way for the task to infer the correct ones. This means that builds using analyzers with build suppressors and warn as error encountered broken builds when using WPF.
  • SkipAnalyzers: invocations which pass Anaylzersbut notSkipAnalyzers` subvert the expectations of users on how code should build.
  • CompilerType: this is new argument being added in .NET 10 which controls which .NET compiler is used. This is meant to allow
  • Deterministic: the ecosystem cannot have deterministic builds until every invocation is updated to respect this flag (compiler cannot be deterministic by default unfortunately).

Essentially, there are many user and ecosystem properties which change how the compiler is supposed to be invoked. Custom invocations of <Csc> are rarely up to date on these and hence subvert changes to the .NET ecosystem. It's a constant catch up job by the .NET team to track down these invocations and keep them up to date.

@jkotas
Copy link
Member

jkotas commented Apr 21, 2025

@jaredpar The alternative proposed above is:

create a .csproj with the source files we need (somewhere in the \obj folder), add some dummy Directory.Build.props and .targets to disable importing things, and disabling all implicit SDK features like automatic framework references and whatnot.

Do you believe that the auto generated .csproj file is less fragile compared to direct invocation of Csc? I think there is very similar catch-up job in both cases. If there is a new thing introduced, there is like 50:50 chance that this auto-generated .csproj project will want the new thing disabled or configured in some special way.

The difference between auto-generated .csproj file and direct Csc invocation is in the ease of diagnosing and fixing eventual issues. Direct Csc invocation is straightforward to see and fix. Auto-generated .csproj is behind a layer of indirection. Diagnosing and fixing the issues means debugging and changing the .csproj file auto-generator.

@Sergio0694
Copy link
Contributor Author

@jkotas adding a few more questions here since we were already on the same topic 😄

"If you just need to generate a bit of simple IL to stitch things together, I would avoid Cecil dependency for that. You can use System.Reflection.Metadata directly. Of if you want something a bit higher level, use MetadataLoadContext and PersistedAssemblyBuilder."

PersistedAssemblyBuilder seems nice, but I see it's only available for .NET 9, which my understanding is we can't use, because people might be building with MSBuild Desktop (eg. when building from VS, or from a VS Developer Command prompt). I assume that means we can just use System.Reflection.Metadata, since that's the only one available on .NET Framework?

Additionally, I'm starting to think perhaps we should just bite the bullet and emit IL directly as well. It would allow us to completely bypass all the concerns raised about either using <Csc> directly, or MSBuild to compile some temporary .csproj, and it would also allow us to never worry about type accessibility. For instance, Roslyn folks strongly recommended to do all expensive analysis from a post-build tool that just inspects all upstream .dll-s. But that means that we can only add new code to the sidecar .dll, not to the .dll being built. Which means that we need to find a way to handle internal types too. We could try to see if it'd be possible to add a new constructor for the proxy type map that takes an assembly qualified type name, so we can refer to internal types from the sidecar .dll as well, but my concerns are that even then:

  1. It's a new API to propose, get approval for, plus the work to implement it
  2. We might very well still eventually hit issues around inaccessible types in other scenarios. If we were emitting IL, that wouldn't matter. If we were emitting C# sources instead, we risk getting stuck, which is also not great.

If we went this route, meaning we'd be generating much more IL, would your recommendation still be to just use System.Reflection.Metadata? Or would Cecil make more sense there given it's a higher-level library that you can also use from .NET Framework? I'm going over some code using it and it seems fairly intuitive to use, so perhaps it's not too bad? 🤔

(sidenote, I find it kinda funny how we're basically ending up rewriting MCG from scratch at this point ahah)

@jkotas
Copy link
Member

jkotas commented Apr 22, 2025

If we went this route, meaning we'd be generating much more IL, would your recommendation still be to just use System.Reflection.Metadata?

Observation: Projects that generate a lot of complicated IL get tired of it eventually and switch to generating C#. If you want to generate IL, it needs to be super simple just to stitch things together to survive the test of time.

Or would Cecil make more sense there given it's a higher-level library

The main advantage of Cecil is that it is a mutable object model that enables IL rewriting. If you just want to emit a bunch of IL from scratch, Cecil has less advantage compared to System.Reflection.Metadata,

Note Cecil is a community library. We have internal Cecil fork for IL linker to give us full control over servicing for our purpose.

@Sergio0694
Copy link
Contributor Author

"Observation: Projects that generate a lot of complicated IL get tired of it eventually and switch to generating C#. If you want to generate IL, it needs to be super simple just to stitch things together to survive the test of time."

That is also fair. I suppose it's just not entirely clear to me what the best compromise is between all of these concerns. Emitting IL would avoid the issues around invoking <Csc>, and around type accessibility, but at the cost of being more complicated. Emitting C# would be simpler, but we'd have the concerns around how to compile it, we would at the very least need a new constructor overload for TypeMapAssociationAttribute to be able to specify the source type as an assembly qualified type name, and it's not clear how we'd be able to address other cases (eg. other attributes) where we'd want to refer to types that migth not be accessible 🤔

"The main advantage of Cecil is that it is a mutable object model that enables IL rewriting. If you just want to emit a bunch of IL from scratch, Cecil has less advantage compared to System.Reflection.Metadata"

Ah, I see. That helps, I guess we could just use System.Reflection.Metadata for both reading .dll-s and emitting the sidecar then.

@Sergio0694
Copy link
Contributor Author

Leaving an update here for reference and to resolve this. It seems the approach we're going to go with is:

  • Do all the work as a post-build step that takes all reference .dll-s as inputs and produces the sidecar .dll
  • We can inspect all input types with MetadataLoadContext APIs and similar
    • For inspecting things such as casts/boxing/etc., Roslyn folks strongly recommended to just analyze IL directly
  • We can emit the sidecar .dll using PersistedAssemblyBuilder
    • Much higher level than just System.Reflection.Metadata (using that directly doesn't seem viable)
    • Emitting IL would allow us to completely ignore all accessibility checks, and to target unspeakable types too
      • Meaning we can start supporting synthesized types again too (like .NET Native can)
      • No need to special case internal types with other logic, we can treat all of them exactly the same
      • No need for new constructor for [TypeMapAssociationAttribute<>], we can always reference the source type
      • No risk of getting "stuck" if we hit a case where we'd need to refer to a type that's not accessible in the sidecar
    • @jkoritzinsky suggested we can just publish our build tool with NAOT, so we can use PersistedAssemblyBuilder
  • We will not be needing this proposed *App* wildcard then
  • We resolve the concerns mentioned above about using <Csc> or MSBuild on a .csproj directly

Thank you again everyone for all the help and the advice! 🙂

@Sergio0694 Sergio0694 closed this as not planned Won't fix, can't repro, duplicate, stale Apr 22, 2025
@Sergio0694
Copy link
Contributor Author

@jkotas small follow up question: would you have any recommendation for how to analyze IL? MethodBody only returns the raw IL array, and I'm not seeing any built-in APIs to get a "nice" sequence of opcodes to analyze (as in, as a sequence of opcodes, with arguments you can easily read as eg. string, or a Type/type name, etc.). Would Cecil make sense for this, or perhaps ILSpy, or something else? For context, we mostly need to detect things such as casts and boxing, and analyze the source/target types.

@jkotas
Copy link
Member

jkotas commented Apr 22, 2025

analyze the source/target types.

Do you mean that you need to keep track of the object type that's get passed into the cast? You need to build IL importer for that. kind of like what's in IL verify or the JIT. I do not think Cecil has one. I do not know about ILSpy.

@Sergio0694
Copy link
Contributor Author

Roslyn folks shared a link to their IL visualizer (here), and it seems not too difficult to reuse the logic here to implement an importer that just tracks the evaluation stack. The idea would be to load all import .dll-s with MetadataLoadContext and then plug that into this. I'm working on a prototype locally and it seems fairly doable, except that @reflectronic just pointed out than when actually using MetadataLoadContext to load assemblies, none of the ResolveXXX methods actually work (#87657) 😭

That seems unlikely to make it to .NET 10 I suppose. Perhaps we should explore some other libs to analyze the .dll-s 🤔

@jkotas
Copy link
Member

jkotas commented Apr 23, 2025

reuse the logic here to implement an importer that just tracks the evaluation stack

The code you have linked does not track the evaluation stack. Tracking the state of the evaluation stack is more complicated.

@Sergio0694
Copy link
Contributor Author

Yeah no, of course. I didn't mean that the code was doing everything we needed. I was just saying that I was going to port that (since it already has the basic setup to decode IL blobs into actual opcodes and operands), and then build the evaluation stack analyzer on top of it, by traversing the graph of each method body and tracking what actually happens. And then use the ResolveXXX methods to easily resolve Type-s, signatures, etc. from the operands of opcodes we care about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Runtime.CompilerServices
Projects
None yet
Development

No branches or pull requests

6 participants