|
| 1 | +# Investigation on making interpreter work with ReadyToRun |
| 2 | + |
| 3 | +## Status |
| 4 | + |
| 5 | +This document is preliminary - it only covers the most basic case - it doesn't even cover very often used case (i.e. virtual method calls). |
| 6 | + |
| 7 | +Imagine I am doing a Hackathon overnight trying to get something working, not designing something for the long term, yet. |
| 8 | + |
| 9 | +## Goals |
| 10 | + |
| 11 | +- Figure out how relevant parts of ready to run works. |
| 12 | +- Figure out how to hack it so that we can get into the CoreCLR interpreter. |
| 13 | + |
| 14 | +## Non Goals |
| 15 | + |
| 16 | +- Deliver a working prototype (I just don't have the time - and the CoreCLR interpreter is not the right target) |
| 17 | +- Come up with an optimal design (Same, I just don't have the time) |
| 18 | + |
| 19 | +## High-level observations |
| 20 | + |
| 21 | +We already have a mechanism to call an arbitrary managed method from the native runtime - this mechanism can be used to call ReadyToRun compiled method. So in general, interpreter -> ReadyToRun is not an issue. |
| 22 | + |
| 23 | +The key challenge is to get ReadyToRun code to call into the interpreter. |
| 24 | + |
| 25 | +## Understanding what happened when we are about to make an outgoing call from ReadyToRun |
| 26 | + |
| 27 | +When ReadyToRun code makes a call to a static function, it |
| 28 | + |
| 29 | +- push the arguments on the register/stack as per the calling convention |
| 30 | +- call into a redirection cell |
| 31 | +- get into the runtime. |
| 32 | + |
| 33 | +Inside the runtime, I will eventually get to `ExternalMethodFixupWorker` defined in `prestub.cpp`. |
| 34 | + |
| 35 | +At this point, I have |
| 36 | +- transitionBlock - no idea what it is |
| 37 | +- pIndirection - the address for storing the callee address |
| 38 | +- sectionIndex - a number, pushed by the thunk, and |
| 39 | +- pModule - a pointer to the module containing the call instruction |
| 40 | + |
| 41 | +Since the call comes from a ReadyToRun image, `pModule` must have a ready to run image |
| 42 | + |
| 43 | +We can easily calculate the RVA of the `pIndirection` |
| 44 | + |
| 45 | +If the call provided the `sectionIndex`, we will just use it, otherwise we can still calculate the section index based on the RVA. |
| 46 | + |
| 47 | +The calculation is simply by sequentially scanning the import sections, each section is self describing its address range so we can check |
| 48 | + |
| 49 | +The import section has an array signature - using the rva - beginning rva of the section. we can index into the signature array to find the signature. |
| 50 | + |
| 51 | +The signature is then parsed to become a `MethodDesc` - where the method preparation continues as usual |
| 52 | + |
| 53 | +Last but not least, eventually, the `pIndirection` will be patched with that entry point, and the call proceed by using the arguments already on the stack/restored registers. |
| 54 | + |
| 55 | +## How the potential hack looks like |
| 56 | + |
| 57 | +We keep everything the same up to the method preparation part. |
| 58 | + |
| 59 | +We knew it is possible to produce an `InterpreterMethodInfo` given a `MethodDesc` when the system is ready to JIT, so we should be able to produce the `InterpreterMethodInfo` there. |
| 60 | + |
| 61 | +The arguments are already on the registers, but we can't dynamically generate the `InterpreterStub`, the only reasonable thing is to pre-generate the stubs in the ReadyToRun image itself. |
| 62 | + |
| 63 | +> A stub per signature is necessary because each signature need a different way to populate the arguments (and the interpreter method info). On the other hand, a stub per signature is sufficient because if we knew how to prepare the register to begin with, we must know exactly what steps are needed to put them into a format the `InterpretMethodBody` likes. As people points out, this is going to be a large volume, this is by no means optimal. |
| 64 | +
|
| 65 | +The stub generation code can 'mostly' be exactly the same as `GenerateInterpreterStub` with two twists: |
| 66 | + |
| 67 | +- We need to use indirection to get to the `InterpreterMethodInfo` object. That involves having a slot that the `InterpreterMethodInfo` construction process need to patch. |
| 68 | +- What if the call signature involves unknown struct size (e.g. a method in A.dll take a struct in B.dll where B.dll is considered not in the same version bubble) |
| 69 | + |
| 70 | +Next, we need the data structure that get us to the address of the stub as well as the address of the cell storing the `InterpreterMethodInfo`. What we have is `pIndirection` and therefore `MethodDesc`. |
| 71 | + |
| 72 | +To do that, we might want to mimic how the runtime locate ReadyToRun code. |
| 73 | + |
| 74 | +Here is a stack of how the ready to run code discovery look like: |
| 75 | + |
| 76 | +``` |
| 77 | +coreclr!ReadyToRunInfo::GetEntryPoint+0x238 [C:\dev\runtime\src\coreclr\vm\readytoruninfo.cpp @ 1148] |
| 78 | +coreclr!MethodDesc::GetPrecompiledR2RCode+0x24e [C:\dev\runtime\src\coreclr\vm\prestub.cpp @ 507] |
| 79 | +coreclr!MethodDesc::GetPrecompiledCode+0x30 [C:\dev\runtime\src\coreclr\vm\prestub.cpp @ 443] |
| 80 | +coreclr!MethodDesc::PrepareILBasedCode+0x5e6 [C:\dev\runtime\src\coreclr\vm\prestub.cpp @ 412] |
| 81 | +coreclr!MethodDesc::PrepareCode+0x20f [C:\dev\runtime\src\coreclr\vm\prestub.cpp @ 319] |
| 82 | +coreclr!CodeVersionManager::PublishVersionableCodeIfNecessary+0x5a1 [C:\dev\runtime\src\coreclr\vm\codeversion.cpp @ 1739] |
| 83 | +coreclr!MethodDesc::DoPrestub+0x72d [C:\dev\runtime\src\coreclr\vm\prestub.cpp @ 2869] |
| 84 | +coreclr!PreStubWorker+0x46d [C:\dev\runtime\src\coreclr\vm\prestub.cpp @ 2698] |
| 85 | +coreclr!ThePreStub+0x55 [C:\dev\runtime\src\coreclr\vm\amd64\ThePreStubAMD64.asm @ 21] |
| 86 | +coreclr!CallDescrWorkerInternal+0x83 [C:\dev\runtime\src\coreclr\vm\amd64\CallDescrWorkerAMD64.asm @ 74] |
| 87 | +coreclr!CallDescrWorkerWithHandler+0x12b [C:\dev\runtime\src\coreclr\vm\callhelpers.cpp @ 66] |
| 88 | +coreclr!MethodDescCallSite::CallTargetWorker+0xb79 [C:\dev\runtime\src\coreclr\vm\callhelpers.cpp @ 595] |
| 89 | +coreclr!MethodDescCallSite::Call+0x24 [C:\dev\runtime\src\coreclr\vm\callhelpers.h @ 465] |
| 90 | +``` |
| 91 | + |
| 92 | +The interesting part, of course, is how `GetEntryPoint` works. Turn out it is just a `NativeHashtable` lookup given a `VersionResilientMethodHashCode`, so we should be able to encode the same hash table for the stubs as well. |
| 93 | + |
| 94 | +Note that `GetEntryPoint` has the fixup concept, maybe we can use the same concept to patch the slot for `InterpreterMethodInfo`. |
| 95 | + |
| 96 | +## How to implement the potential hack |
| 97 | + |
| 98 | +From the compiler side: |
| 99 | + |
| 100 | +### When do we need to generate the stubs? |
| 101 | +When the ReadyToRun compiler generate a call, the JIT will call back into crossgen2 to create a slot for it. At that point, we should know what we need to make sure a stub is available for it by working with the dependency tracking engine. |
| 102 | + |
| 103 | +### Actually generate the stubs |
| 104 | + |
| 105 | +To stub generation should mostly work the same as in `GenerateInterpreterStub` today with a couple twists |
| 106 | +- We don't need to generate the `InterpreterMethodInfo`, that work is left until runtime. |
| 107 | +- If the stub involve types with unknown size, we need to generate the right stub code for it (e.g. A.dll call a function that involves a struct defined in `B.dll` where they are not in the same version bubble) |
| 108 | +- The stub needs an instance of `InterpreterMethodInfo`, it cannot be hardcoded, the pointer of it must be read from somewhere else. |
| 109 | +- Whenever we generate the stub, we need to store it somewhere so that we can follow the logic as in `MethodEntryPointTableNode` |
| 110 | + |
| 111 | +From the runtime side: |
| 112 | + |
| 113 | +### Locating the stub |
| 114 | +- When we reach `ExternalMethodFixupWorker`, we need to use the table to get back to the generated stubs |
| 115 | + |
| 116 | +### Preparing the data |
| 117 | +- We need to create the `InterpreterMethodInfo` and make sure the stub code will be able to read it. |
| 118 | + |
| 119 | +## Alternative designs |
| 120 | +Following the thought on the earlier prototype for tagged pointers, we could envision a solution that ditch all those stubs, e.g. |
| 121 | + |
| 122 | +1. Changing the call convention for every method so that it is the same as what the interpreter method likes. |
| 123 | + |
| 124 | + Pros: |
| 125 | + - Consistency, easily to understand |
| 126 | + - No need for stubs, efficient for interpreter calls |
| 127 | + |
| 128 | + Cons: |
| 129 | + - Lots of work to have a different calling convention |
| 130 | + - Inefficient for non interpreter calls |
| 131 | + |
| 132 | +2. Changing the call site so that it detects tagged pointers and call differently |
| 133 | + |
| 134 | + Pros: |
| 135 | + - Similar with what we have in the tagged pointer prototype |
| 136 | + - No need for stubs, efficient for interpreter calls |
| 137 | + |
| 138 | + Cons: |
| 139 | + - Every call involves dual call code |
| 140 | + |
| 141 | +3. The approach described in this document (i.e. using stubs) |
| 142 | + |
| 143 | + Pros: |
| 144 | + - Probably cheapest to implement |
| 145 | + |
| 146 | + Cons: |
| 147 | + - Lots of stubs |
| 148 | + - Inefficient for interpreter call (involve stack rewriting) |
| 149 | + - Unclear how it could work with virtual or interface calls |
| 150 | + |
| 151 | +I haven't put more thoughts into these alternative solutions, but I am aware they exists. |
0 commit comments