Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] StreamInput integer types are unsigned #88

Closed
dbwiddis opened this issue Nov 20, 2023 · 4 comments · Fixed by #87
Closed

[BUG] StreamInput integer types are unsigned #88

dbwiddis opened this issue Nov 20, 2023 · 4 comments · Fixed by #87
Assignees
Labels
bug Something isn't working

Comments

@dbwiddis
Copy link
Member

What is the bug?

The read_byte(), read_short(), read_int(), and read_long() methods in StreamInput return unsigned integers.

How can one reproduce the bug?

Attempt to deserialize any value with a negative sign bit, for example

input = StreamInput(b"\xff")
self.assertEqual(input.read_byte(), -1)

AssertionError: 255 != -1

input = StreamInput(b"\xff\xff\xff\xff\")
self.assertEqual(input.read_int(), -1)

AssertionError: 4294967295 != -1

What is the expected behavior?

Python int value matches the signed Java value from OpenSearch

Do you have any additional context?

image

@dbwiddis dbwiddis added bug Something isn't working untriaged Issues that require attention from the maintainers. good first issue Good for newcomers help wanted Extra attention is needed and removed untriaged Issues that require attention from the maintainers. labels Nov 20, 2023
@dbwiddis
Copy link
Member Author

I see (at least) 3 possible approaches here, listed roughly in my preference order

  • use int.from_bytes()
  • use a conditional like return b - 256 if b >= 128 else b
  • use numpy dependency

@dblock
Copy link
Member

dblock commented Nov 21, 2023

(1) works for me. Do you have an an XKCD for (2) and (3)?

@dbwiddis
Copy link
Member Author

(1) works for me. Do you have an an XKCD for (2) and (3)?

2:
image

3:
image

@dbwiddis dbwiddis self-assigned this Nov 21, 2023
@dbwiddis dbwiddis removed good first issue Good for newcomers help wanted Extra attention is needed labels Nov 21, 2023
@dbwiddis
Copy link
Member Author

Bonus, the int.from_bytes() approach is so much better than this mess:

def read_int(self) -> int:
    return ((self.read_byte() & 0xFF) << 24) | ((self.read_byte() & 0xFF) << 16) | ((self.read_byte() & 0xFF) << 8) | (self.read_byte() & 0xFF)

def read_long(self) -> int:
    return self.read_int() << 32 | self.read_int() & 0xFFFFFFFF

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants