Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply new EVM opcode updates #2602

Open
wants to merge 3 commits into
base: next
Choose a base branch
from

Conversation

andelf
Copy link

@andelf andelf commented Jan 12, 2025

Your checklist for this pull request

  • I've documented or updated the documentation of every API function and struct this PR changes.

  • I've added tests that prove my fix is effective or that my feature works (if possible)

  • No API change

  • If it's required to add test cased to new opcodes, I'm glad to do so.

Detailed description

  • Add new opcodes
  • Rename EVM_INS_SUICIDE, EVM_INS_CALLBLACKBOX
  • Revise gas usage definitions of opcodes

Test plan

...

Closing issues

...

Copy link
Collaborator

@Rot127 Rot127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Please add some test cases though in tests/details/evm.yaml. (They are desperately needed in general.)

@andelf andelf force-pushed the enhance/update-evm branch from 49a1352 to 0a7e356 Compare January 12, 2025 16:59
@andelf
Copy link
Author

andelf commented Jan 12, 2025

@Rot127 Thanks. I've added simple test cases. PTAL.

Copy link
Collaborator

@Rot127 Rot127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not all instructions which push and pop are covered yet I think.
E.g. EVMDisassembler.c::289 should also include PUSH0?

Also, I think instructions like ChainID also push onto the stack (feel free to correct me if I am wrong, never used EVM).

Sorry, you are the first contributor for a while looking at the EVM code. Now I see all the missing cases (and as you see, nothing was tested before). It would be nice if you could add a test case for each new instruction. So we get step by step to a better module.

@andelf
Copy link
Author

andelf commented Jan 13, 2025

EVMDisassembler.c::289 should also include PUSH0?

PUSH0 is to replace PUSH 0x00 for space efficiency and has no payload. EVMDisassembler.c::289 is used to handle PUSH1 to PUSH32, which has 1 to 32 payload data.

Not all instructions which push and pop are covered yet I think.

I just followed the old test style. I'll add test cases for all introduced new opcodes.

Also, I think instructions like ChainID also push onto the stack

Yes. Handled in EVMMappingInsn.inc. If push or pop count is not zero, the groups will be added in EVMDisassembler.c::305

@andelf
Copy link
Author

andelf commented Jan 13, 2025

Another question: For minor EVM forks (e.g., TRON added some opcodes, no conflicts, in 0xd0 to 0xdf range), what's the recommended implementation strategy in Capstone?

@Rot127
Copy link
Collaborator

Rot127 commented Jan 13, 2025

Another question: For minor EVM forks (e.g., TRON added some opcodes, no conflicts, in 0xd0 to 0xdf range), what's the recommended implementation strategy in Capstone?

Feature flags (or modes) is the way to go. They are defined in enum cs_mode. You can check the usage of CS_MODE_PS to see how it is done. Bit 0 is the little endian flag. But bit 1:30 can be used by you.

I just followed the old test style. I'll add test cases for all introduced new opcodes.

Thank you! It doesn't even need to be all opcodes (shr, add are pretty much the same as it looks like). It is enough if it covers all paths you touched.

The old test coverage was barely there (for details at least). And only done for bug fixes. So we have to catch up quite a lot unfortunately. With all archs, not just EVM.

@andelf andelf force-pushed the enhance/update-evm branch from 0a7e356 to bb63a03 Compare January 13, 2025 16:55
Copy link
Author

@andelf andelf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test cases added.

tests/details/evm.yaml Show resolved Hide resolved
tests/details/evm.yaml Show resolved Hide resolved
@andelf
Copy link
Author

andelf commented Jan 13, 2025

Feature flags (or modes) is the way to go. They are defined in enum cs_mode. You can check the usage of CS_MODE_PS to see how it is done. Bit 0 is the little endian flag. But bit 1:30 can be used by you.

Thanks, this is helpful. I'll implement the alternative architecture support in a separate PR.

@Rot127
Copy link
Collaborator

Rot127 commented Jan 13, 2025

2025-01-13T17:50:22.1165390Z INFO  - Run test: TestInput { arch: evm, options: ['CS_OPT_DETAIL'], addr: 0, bytes: [ 0x5b,0x5f,0x1c,0x47,0x46,0x15,0x90,0x20,0x54,0x42,0x1a,0x6d,0x15,0xc1,0xfd,0xdc,0xd5,0x55,0x64,0x73,0x6f,0x6c,0x63,0x43,0x00,0x08,0x00,0x33,0xff ] }
2025-01-13T17:50:22.1167170Z WARNING - CS_ARCH: Capstone doesn't have the attribute 'evm'
2025-01-13T17:50:22.1168350Z ERROR - Compare expected failed with: Information irrelevant for 'data' instruction in SKIPDATA mode (CS_ERR_SKIPDATA)
2025-01-13T17:50:22.1169400Z INFO  - TestResult.ERROR
2025-01-13T17:50:22.1169660Z 
[...]
2025-01-13T17:50:22.1170680Z 
2025-01-13T17:50:22.1179970Z Traceback (most recent call last):
2025-01-13T17:50:22.1191010Z   File "/private/var/folders/vr/pjd30pz953q8qrl27dlgxb280000gn/T/cibw-run-kbjh4noj/cp38-macosx_x86_64/venv-test-x86_64/lib/python3.8/site-packages/cstest_py/cstest.py", line 305, in test
2025-01-13T17:50:22.1193020Z     return self.expected.compare(insns, self.input.arch_bits)
2025-01-13T17:50:22.1194860Z   File "/private/var/folders/vr/pjd30pz953q8qrl27dlgxb280000gn/T/cibw-run-kbjh4noj/cp38-macosx_x86_64/venv-test-x86_64/lib/python3.8/site-packages/cstest_py/cstest.py", line 259, in compare
2025-01-13T17:50:22.1196610Z     if not compare_details(a_insn, e_insn.get("details")):
2025-01-13T17:50:22.1198480Z   File "/private/var/folders/vr/pjd30pz953q8qrl27dlgxb280000gn/T/cibw-run-kbjh4noj/cp38-macosx_x86_64/venv-test-x86_64/lib/python3.8/site-packages/cstest_py/details.py", line 219, in compare_details
2025-01-13T17:50:22.1200440Z     if not compare_uint32(len(actual.groups), len(expected["groups"]), "group"):
2025-01-13T17:50:22.1202360Z   File "/private/var/folders/vr/pjd30pz953q8qrl27dlgxb280000gn/T/cibw-run-kbjh4noj/cp38-macosx_x86_64/venv-test-x86_64/lib/python3.8/site-packages/capstone/__init__.py", line 885, in groups
2025-01-13T17:50:22.1204260Z     raise CsError(CS_ERR_SKIPDATA)
2025-01-13T17:50:22.1205100Z capstone.CsError: Information irrelevant for 'data' instruction in SKIPDATA mode (CS_ERR_SKIPDATA)

@Rot127
Copy link
Collaborator

Rot127 commented Jan 13, 2025

Error likely happens because the first value in the instruction enum EVM_INS_STOP = 0, is a valid instruction. But Capstone implicitly assumes everywhere (also in the Python code), that 0 == INVALID. Can you please fix the enum and check.

@kabeor Please review this as well. It is basically done.

@andelf
Copy link
Author

andelf commented Jan 14, 2025

Please correct me if I misunderstood:

  • The original EVM implementation is buggy because it uses enum position 0 for a legal opcode, whereas using 0 as invalid is a strict requirement(at least for the python binding).
  • The test case fails because I added a test case for the STOP opcode.
  • I need to change the enum value of STOP to another empty slot and add parsing for it.
  • I need to change const definitions of Java/OCaml binding as well.

@Rot127
Copy link
Collaborator

Rot127 commented Jan 14, 2025

In essence, yes. This is correct. But I just recognized that the enum of EVM instructions are probably bound to their definition in the RFCs. So changing STOP to a non-0 value messes up the logic.

I think alternatively, we could add a method to the Python bindings which checks the id if it is invalid.

Something along the line:

(in bindings/python/capstone/__init__.py)

class CsInsn:
  def is_invalid_insn(self) -> bool:
     arch = self._cs.arch
     if arch == CS_ARCH_EVM:
       return self.id() == EVM_INS_INVALID
     else:
       # Probably better to write it out for all architectures though.
       return self.id() == 0

Then replace all checks of if self._raw.id == 0 and self._raw.id != 0 with it.

@andelf
Copy link
Author

andelf commented Jan 14, 2025

Agreed. The python side approach is better.

@Rot127
Copy link
Collaborator

Rot127 commented Jan 15, 2025

Exceptions are gone! One case is still left:

bindings/python/capstone/__init__.py:789:        if self._cs._detail and self._raw.id != 0:

And the group seems incorrect?

2025-01-15T15:01:07.0318255Z INFO  - Run test: TestInput { arch: evm, options: ['CS_OPT_DETAIL'], addr: 0, bytes: [ 0x5b,0x5f,0x1c,0x47,0x46,0x15,0x90,0x20,0x54,0x42,0x1a,0x6d,0x15,0xc1,0xfd,0xdc,0xd5,0x55,0x64,0x73,0x6f,0x6c,0x63,0x43,0x00,0x08,0x00,0x33,0xff ] }
2025-01-15T15:01:07.0319230Z WARNING - CS_ARCH: Capstone doesn't have the attribute 'evm'
2025-01-15T15:01:07.0326051Z ERROR - group: 16 != 1
2025-01-15T15:01:07.0326569Z INFO  - TestResult.FAILED

@Rot127
Copy link
Collaborator

Rot127 commented Jan 16, 2025

Please rebase onto the latest next branch to run the labeler again.

@andelf
Copy link
Author

andelf commented Jan 17, 2025

@Rot127 Now all checks have passed. PTAL

Copy link
Collaborator

@Rot127 Rot127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent! Thanks a lot, also for the repairs :)

I can't merge without the permission from @kabeor. But he said he will find time reviewing and merging all the PRs before Lunar New Year holidays.

@notxvilka
Copy link

@kabeor could you please take a look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants