Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Value overflow on sbiret? #158

Closed
leokolln opened this issue Aug 8, 2024 · 12 comments · Fixed by #184
Closed

Value overflow on sbiret? #158

leokolln opened this issue Aug 8, 2024 · 12 comments · Fixed by #184

Comments

@leokolln
Copy link

leokolln commented Aug 8, 2024

Binary Encoding chapter defines sbiret.value as long.

Extension DBCN functions Console Write and Console Read first parameter num_bytes is unsigned long.

The number of bytes written/read is returned in sbiret.value.

If a value greater than LONG_MAX is used as argument for num_bytes , what is the expected behavior?

  • Overflow on sbiret.value?
  • Caller should clamp the argument to LONG_MAX?
  • Callee should only operate up to LONG_MAX bytes per call?
  • Caller should consider Binary Encoding's definition of struct sbiret{} only as a general suggestion, but each extension can have an actual different definition of it?
  • Something else?

I know the question is a bit pedantic, but maybe relevant for other/future extensions where the possible amount of data "worked on" and its reporting may be less forgiving.

I did not check the remaining documentation to see if other such cases exists...

@atishp04
Copy link
Collaborator

atishp04 commented Aug 9, 2024

The spec also says
"This is a non-blocking SBI call and it may do partial/no writes if the debug console is not able to accept more bytes."

Thus, the caller is supposed to verify the number of bytes returned and make multiple calls if it doesn't match num_bytes.

That's why #3 is what callee should do.
Cc: @avpatel @jones-drew : We should add this to the kvm-unit-test DBCN test.

I checked the openSBI implementation which allows more than LONG_MAX. So that should be fixed if my understanding above is correct.

@jones-drew
Copy link
Contributor

The spec also says "This is a non-blocking SBI call and it may do partial/no writes if the debug console is not able to accept more bytes."

Thus, the caller is supposed to verify the number of bytes returned and make multiple calls if it doesn't match num_bytes.

That's why #3 is what callee should do. Cc: @avpatel @jones-drew : We should add this to the kvm-unit-test DBCN test.

Yup, I got Cade to implement the basic write buffer test with a loop on num_bytes. Also, I started writing a couple more tests to see what happens when we cross page boundaries and, for 32-bit, what happens when we use addresses > 4G or cross from under 4G address ranges into 4G address ranges (that found a couple opensbi bugs that I still need to patch)

I checked the openSBI implementation which allows more than LONG_MAX. So that should be fixed if my understanding above is correct.

The spec doesn't say anything about limits on num_bytes, which means up to ULONG_MAX being allowed is implied. So SBI implementations should ensure that base_addr + num_bytes isn't greater than HIGHEST_PHYSICAL_ADDRESS and return INVALID_PARAM if it is.

I'm not sure about #3, I feel like using unsigned long for the parameters at the SBI interface level, just as the ecall uses xlen wide registers, is fine. The expected limits on each parameter should be specified though, allowing SBI implementations to know how to validate the input registers and S-mode to implement wrappers which use appropriate types.

@atishp04
Copy link
Collaborator

@jones-drew : I am not sure what's your suggestions here. Unless we modify the calling convention, SBI implementation can't return the correct number of bytes read/written via sbiret if it is > LONG_MAX.

We can provide a clarification about the limitation in the spec but don't we still need to fix the current behavior in the implementation to comply with the spec ?

@SiFiveHolland
Copy link
Contributor

I checked the openSBI implementation which allows more than LONG_MAX. So that should be fixed if my understanding above is correct.

I agree with this interpretation, assuming from the context in your comment that the fix is to write fewer bytes, not return an error.

The spec doesn't say anything about limits on num_bytes, which means up to ULONG_MAX being allowed is implied.

It's allowed for the implementation to accept num_bytes values > LONG_MAX. It is not allowed for the implementation to actually write more bytes than LONG_MAX, because then it would be unable to fulfill the requirement to return number of bytes written. Therefore, if num_bytes is > LONG_MAX, the implementation must write only a portion of the input memory to the console.

@atishp04
Copy link
Collaborator

I checked the kvmtool and qemu-kvm implementation as well. Both doesn't check the size of the num_bytes and just return whatever bytes written/read (> LONG_MAX). It is unlikely that a caller would invoke that large buffer for a debug console. However, we should fix the behaviors for spec compliance.

@jones-drew
Copy link
Contributor

I misunderstood the issue being pointed out at first. Thanks to Samuel's description I now understand it and agree with Atish that we need to fix the implementations. It would probably be good to point this issue out in the spec, perhaps in a note, to help future implementations avoid making the same error.

@atishp04
Copy link
Collaborator

atishp04 commented Nov 1, 2024

Reading the spec again, there are more instances there are more instances of the overflow scenario. E.g. sbi_mpxy_send_message_with/without_response also returns the number the number of bytes written will be returned through sbiret.value.

PMU sbi_pmu_counter_get_info also assumes unsigned long return value but it describes the bitfields explicitly. There shouldn't be any problem with it.

Keeping these in mind and future extensions where similar case arise, @avpatel suggested to allow unsigned long as return value. The calling convention says
"SBI functions must return a pair of values in a0 and a1, with a0 returning an error code" This will remain true. However, we can change the analogous C structure to something like this

struct sbiret {
long error;
union {
long value;
unsigned long uvalue;
};
};

This will allow backward compatibility as well.

@SiFiveHolland
Copy link
Contributor

I think we can go further if we want. From reading through the spec again, I don't see any function that wants to return a negative number in spirit.value, but as you mentioned there are several that want to use all XLEN bits for an unsigned value (or a raw CSR value). So we could skip the union and have just unsigned long value;. What is the likelihood that some future SBI function will want to return a negative value?

@atishp04
Copy link
Collaborator

atishp04 commented Nov 5, 2024

IMO, I do not think there is any need for -ve values to be returned in sbiret.value. -ve values are usually attributed with error and we have sbiret.error for that. However, changing definition of the sbiret.value from long to unsigned long may indicate breaking ABI (even though the ABI says return values are provided in a0 and a1). The language around the "analogous C structure" is bit ambiguous anyways. That's why, the union option allows it to be backward compatible while providing the necessary clarification for the current and future extensions. @avpatel @jones-drew Wdyt ?

@avpatel
Copy link
Contributor

avpatel commented Nov 5, 2024

Some future SBI function returning signed sbiret.value is very unlikely but possible. In addition to backward compatibility, the union option is more future proof but we will certainly need to change DBCN and MPXY to use sbiret.uvalue in-place of sbiret.value.

@jones-drew
Copy link
Contributor

I agree we should continue to allow signed sbiret.value. For example, the FWFT features are given full freedom over their sbiret.value's and, especially when considering platform specific features, it's not possible to know if unsigned values will always be sufficient.

@atishp04
Copy link
Collaborator

atishp04 commented Nov 5, 2024

Sounds good. I will send a patch with the union suggestion.

atishp04 added a commit to atishp04/riscv-sbi-doc that referenced this issue Nov 8, 2024
As per the calling convention a1 will have sbiret.value on ecall
return. However, the current specification describes an anlogous
C structure describing as long. Some of the SBI extension may require
an "unsigned long" data type instead of "long".

Relax the analgous C structure definition to allow that.

Fixes: riscv-non-isa#158

Signed-off-by: Atish Patra <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants