Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug report] fence.i may cause the CPU to hang under some conditions #128

Closed
flaviens opened this issue Oct 30, 2023 · 5 comments
Closed

Comments

@flaviens
Copy link
Contributor

flaviens commented Oct 30, 2023

Hi there!

I think I found a bug (using the brand new Cascade fuzzer) in EL2.

Looking at the reduced ELFs, it looks like it is due to a fence.i placed after branches.

In this zip, you'll find reduced test cases and their dumps for convenience.
bug_diff.zip

In the image, left is smallest buggy, right is largest non-buggy.
image

I'm using the AHB config. There's also a possibility that my parameters are wrong, I'm absolutely not familiar with this core at the moment. Can you please try to reproduce?

Thanks!

@algrobman
Copy link

any more details?

@algrobman
Copy link

algrobman commented Nov 19, 2023

I was able to simulate the buggy code:

core went to address 0 after fence.i : (that shouldn't happen! Need to see what's wrong in RTL )

       272 :                               s7=93d9de2b                ; nbL
       275 :      #48 0 800000bc 4d5481ef  gp=800000c0                ; jal     gp,0x80048d90
       280 :      #49 0 80048d90 b0a35f63                             ; bge     t1,a0,0x800480ae
       286 :      #50 0 80048d94 783190e3                             ; bne     gp,gp,0x80049d14
       292 :      #51 0 80048d98 00019b0f                             ; fence.i
       302 :      #52 0 00000000 00000000                             ; .short  0
       309 :      #53 0 00000000 00000000                             ; .short  0
       316 :      #54 0 00000000 00000000                             ; .short  0

since this is invalid instruction at address 0 , CPU takes exception. mtvec is not programed and has reset value of 0 causing CPU to go to address 0 again and thus the core is looping at address 0.

@algrobman
Copy link

algrobman commented Nov 19, 2023

More update:

00019b0f is NOT fence.i, but unimplemented opcode for EL2 - so the CPU takes exception on it.
fence.i opcode is 0000100f.

dasm in TB does not decode all the bits in opcode :
it takes in account only bits 6:0 and 12, assuming that only valid opcodes are in the tests binaries,

I would suggest TB to print that the core took exception/interrupt in the exec.log to make easier to debug such problems ...
Core trace port provides indications what interrupt or exception was taken ....

@mkurc-ant
Copy link
Collaborator

So actually opcode 0x00019b0f is fence.i but with non-zero supplementary arguments. RISC-V spec allows both fence and fence.i to have arguments and those are neither decoded nor utilized by VeeR.

The instruction decoder assumes that all those arguments are zero. With the PR #138 I updated it so that it allows them to have non-zero values. With that VeeR defaults to the most generic behavior of the instructions by ignoring the arguments and does not execute any exception. This is sanctioned by the spec.

@tmichalak
Copy link
Collaborator

Fixed with #138. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants