-
Notifications
You must be signed in to change notification settings - Fork 420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zephyr - new FPU context switch not working in VexRiscV-smp CPU #297
Comments
Hi, I wasn't aware of that issue. So, there is a few things we can do to have a better diagnostic :
val fs = Reg(Bits(2 bits)) init(1)
Would become
val fs = U"11"
One possibility is that it come from a "recently" added privileged feature missing in VexRiscv, or just something buggy in the core. Thanks ^^ |
The latest Zephyr code relies on two things from the spec to work properly:
- any modification to the FPU register file must change the mstatus_fs bits to "dirty" (0b11);
- any access to the FPU (including fcsr, frm and fflags) when mstatus_fs is set to "off" (0b00) must raise an illegal instruction fault instead, preferably with the actual instruction value stored in MTVAL as well.
And, per the spec, mstatus_fs being "off" doesn't mean the FPU is powered off. It must retain its state. This is only about accessing it.
Accommodations are possible in software if the above is not possible, but overall performance would be degraded.
|
Hi Charles, ad 1) I've changed the FpuPlugin as instructed, but I think the scala compiler didn't execute (note that the project is a LiteX project) - or I didn't notice it. How can I double-check if the change really arrived? How can I force to rebuild from the scala code in LiteX? The change in Zephyr is recent - with the previous version it has been working. I don't think this has something to do with privileged execution; See here I'm really a rookie on this level of programming - so no surprise that I have to bother with that basics! |
Hi Charles,
This won't compile unfortunately:
Reverting the change and building results in a working bitstream. So I assume, my build-system to be OK. bye, pottendo |
...one more observation: I ported my test-program (mandelbrot set) to Linux, booted on the same machine. *) Maybe this helps to reproduce on other VexRiscV-smp systems? bye, pottendo *) managed to get this project running on my FPGA board. |
Ahhh that's the missing feature ! |
@Dolu1990, I see your dev branch is incorporating a few more changes - shall I apply the full change or just the Fpu related one? |
Just that specific change should be good. |
Hi Charles, However, not yet perfect though as the zephyr test-suite still results in some fails:
You find the tests here. More serious is that Linux is not booting anymore - the patch may break other Linux-systems as well!:
I double checked booting the previous version (without the FPU plugin patch) succefully booting the same Linux Image. All the best and thanks for this great project - all efforts here are much appreciated! |
Hi @pottendo
This one is fine, basicaly, VexRiscv dirty tracking is very pessimitic and assume that any interraction with the FPU or CSR write to the FPU will make it dirty.
Same
Same, the test assume that will not make the FPU dirty, but in VexRiscv it will.
That one is actualy VexRiscv missspec, CSR read write should now also trap when FS==0 with that additional patch :
That's more worrying I have to check how linux handle the MMU :) Let's me know how it goes with the cbc8909 patch :D Best regards |
"Implementations may choose to track the dirtiness of the floating-point register state imprecisely by reporting the state to be dirty even when it has not been modified."
How hard would it be to set the dirty state on writes only as opposed to set it on any access?
I'll adjust the Zephyrtest in the mean time.
|
The amended Zephyr test is in branch "rvfputestdirty" of
https://github.com/npitre/zephyr/.
@pottendo: Could you give it a try?
|
@npitre Shouldn't be that hard, i will give a try as soon as the other issues in Vex are fixed (to avoid mixing fixes) ^^ |
hi, |
Got the VexRiscv implementation for more a much more accurate dirty bit : |
Hi all,
This is the result after applying all proposed patches - so, Charles, one of the last two patches obviously made the system booting again - also Zephyr. Great job guys! |
Nice ^^
So, do you mean it is still broken for linux ? or that you need test ? Thanks :) |
pottendo wrote:
"Also @npitre's patch on the test-program is applied here."
Looks like it was unnecessary. The imprecise dirty tracking was not
detected according to the test output.
|
@npitre, I've reverted your change and can confirm that all tests still pass.
I didn't change my buildroot and Linux kernel (yet) - I don't know if and how this would be needed. |
On Wed, 8 Feb 2023, pottendo wrote:
But maybe it's something to be deeper investigated in the Linux kernel, how FPU management is done there.
According to the crash dump, something attempted a "fsd fs0,56(a0)" instruction while the FPU was disabled. The CPU raised an illegal instruction exception as it should.
Do you have CONFIG_FPU=y in your kernel config?
|
@npitre, For me this looks promising - good to commit to the respective main branches! |
@npitre omg nice catch XDXD Thanks :D |
hi,
upfront, I hope this isn't too off-topic for this project. Let me know and in case direct me to the correct place, e.g. the LiteX repo... thanx.
Summary: Zephyr 3.3-RC1 introduced a new (optimized) handling of FPU register saving in mutli-threaded applications (context switch).
My Litex-project uses a VexRiscV-smp SoftCPU core.
I got basic Zephyr running on this project by just providing a matching devicetree - find the links in the issue report.
Find all details to this issue in the Zephyr repository.
The Zephyr author @npitre suggests:
So as I haven't managed to get it going on my own, I ask you experts here!
As a fallback I could revert to 3.2.99 version of Zephyr's FPU code; but I'd like to understand if the problem is either in my Zephyr port, my RiscV Litex project or even something which would be required to be fixed/added in the VexRiscV-smp CPU itself.
If it's the very last reason, other's may encounter this problem as well.
Thanks for any feedback, pottendo
The text was updated successfully, but these errors were encountered: