-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Any" code model with link-time instruction selection? #322
Comments
It would indeed be possible to construct such a universe. But unless the linker is also an optimizing compiler, you'd lose out on some important optimization opportunities. Splitting addressing sequences into independently schedulable instructions avails the compiler of more code-motion opportunities. For example, if a global variable is referenced in a loop body, the If, instead of emitting the best-case sequence and expanding it in the linker, you emitted the worst-case sequence and contracted it, the situation would improve. (My previous example could then be optimized as we'd hope.) But it still would give the compiler less-accurate information in some cases, misguiding the optimizer. The effect would be most pronounced for a fully general large code model for RV64, where addressing sequences have several instructions, so that scheduling them will materially affect register-allocation decisions. |
I don't think it prevents compilers from emitting optimized code. Here is an example: imagine that there is a function that access thread-local variables A and B. If the compiler knows that both A and B will end up be in the same ELF module (e.g. they are hidden symbols), it can compute the TLS base address of the current module and just add offsets from it to access A and B. But such optimization is orthogonal to the access model; the code sequence emitted for this kind of optimized accesses are not affected by the current code model. If the compiler for example knows that the upper 42 bits are the same for variable A and B, it can emit code that materializes the upper 42 bits in the code-model-less model and then reuse that value in the following code. We don't have to live only with the hypothetical |
I also think that the division into "medlow" and "medany" seems to be quite unnecessary. There only need to be defacto 2 "code models":
Also, considering things like riscvarchive/riscv-code-size-reduction/issues/158 |
Linker can only remove instruction during the relaxation process, and can't insert new instruction, or more specifically, we can't grow the code size since that might cause linker relaxation can't convergence and made the relaxation become more complicated, inserting new instruction might cause branch/jump become out-of-range, and we need using indirect-jump to resolve once jump is out-of-range, but it's almost impossible at the linker stage due that require extra temp register, RISC-V ISA has relative short branch and jump range compare to other architecture so this will happen frequently then we expect. |
Ideally we should select LUI rather than AUIPC if possible, but current toolchain implementation has separated compilation and linking phase (true for LLVM and GCC), so we don't know when we can use LUI or AUIPC since we don't have symbol address during compilation stage - that's why we still need -mcmodel option, and let user to decide that. |
I think all the points you mentioned are technical problems we can just solve. The linkers do not currently expect sections to grow in size, but that's not a fundamental limitation. After all, someone has to select instructions. We do it in compiler at the moment with an algorithm that's guaranteed to converge. We can run the same or a similar algorithm in the linker. As to the issue that a branch can be out-of-range if we add bytes between the instruction to its destination, there are many solutions. One obvious way to solve it is to reserve a register for the linker-synthesized long branch instructions. The other would be making the compiler to emit short jump instructions with some "safety margin" (say, if the destination is <95% of the instruction's reach) to allow the linker to optimize within that safety margin. There might be more solutions to this problem. |
I don't like the idea of reserve a register for linker, but I admit one of possible solution, and that would made this become an new ABI rather than just a new code model. But I am very happy to discuss this further, it's kind of brain storming, and that might not applicable on UABI/Unix ABI, but that would be worth to consider that for embedded ABI. |
and here is another one:
not able to use
But can't the linker optimize auipc into lui, as it's already doing a lot of similar optimization steps? |
You can mix medlow and medany code just fine. The only problem is if you try to link medlow code at an address that isn't in (-2 GiB, +2 GiB), which is true irrespective of whether you're mixing medany code with it (which can support that just fine). I don't understand this note, it only makes sense if they have a linker script that sets the base address on RV64 to be outside that range, but that's impossible on RV32. The whole point of compiling library code as medany is to make it usable in more cases. |
So that turns out to be just a minor missing-opt in library code on RV32, if those |
I've been wondering since I started working on a linker for RISC-V why its psABI needed the notion of "code model" in the first place.
I understand that code model is required if a compiler emits a sequence of machine code and the linker can only mutate it without changing its size. For example, if a compiler for x86-64 emits an immediate load instruction to set a symbol address to a register, we can't change it to a GOT-indirect load because we can't replace a single instruction with two instructions in-place. There's simply no enough space to rewrite instructions.
However, it's not an issue for the RISC-V psABI since it allows the linker to add or remove instructions in the middle of a section. Then, why did we want a compiler to emit a specialized instruction sequence instead of a generic one?
Let me show you an example. Imagine that we have a hypothetical relocation
R_RISCV_SYMBOL_ADDRESS
that materializes a symbol address in a register. A compiler emits code like this:where
addi
is a placeholder instruction (I chose this instruction arbitrarily). The linker is free to expand it toif
foo
is defined locally, orif
foo
needs to be accessed via GOT, or evenif the linker knows
foo
's address is very small.We can do the same thing for
call
. The point is, we can let the linker to choose the instructions that works best for the final binary form.I'm not sure if this was discussed before, but I don't think the above scheme have a fundamental flaw. Am I missing something?
The text was updated successfully, but these errors were encountered: