Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_from_stream_buffered produces different results than other readers #57

Open
tjallingt opened this issue Sep 6, 2023 · 5 comments · May be fixed by #66
Open

read_from_stream_buffered produces different results than other readers #57

tjallingt opened this issue Sep 6, 2023 · 5 comments · May be fixed by #66

Comments

@tjallingt
Copy link

tjallingt commented Sep 6, 2023

See the following example:

use speedy::Readable;
use std::io;

#[derive(Debug, Readable)]
struct Test {
    #[speedy(default_on_eof)]
    list: Vec<u8>,
    #[speedy(default_on_eof)]
    value_kind: ValueKind,
}

#[derive(Debug, Default, Readable)]
#[speedy(tag_type = u8)]
enum ValueKind {
    #[default]
    Default = 0,
    Other = 1,
}

fn main() {
    let test_buffer: Vec<u8> = vec![
        0, 32, 0, 0, // list length
        // list data (missing)
        2, // value_kind
    ];

    let test_stream1 = io::Cursor::new(&test_buffer);
    let test_stream2 = io::Cursor::new(&test_buffer);

    let result_buffer = Test::read_from_buffer(&test_buffer);
    dbg!(result_buffer); // Err(Error { kind: InvalidEnumVariant })

    let result_stream_unbuffered = Test::read_from_stream_unbuffered(test_stream1);
    dbg!(result_stream_unbuffered); // Err(Error { kind: InvalidEnumVariant })

    let result_stream_buffered = Test::read_from_stream_buffered(test_stream2);
    dbg!(result_stream_buffered); // Ok(Test { list: [], value_kind: Default })
}

Because read_from_file will switch between read_from_stream_buffered and read_from_buffer_copying depending on the target (Linux x86 or not) this can also cause differences in parsing results between platforms. I happened to run into this when my tests would error on CI but not on my Windows dev machine.

Couple of notes:

  1. All readers return the same error if the list length is smaller, for example vec![0, 31, 0, 0, 2]
  2. The list length is consumed even though default_on_eof was used. I would have expected the reader to reset to the position before it errored.
@koute
Copy link
Owner

koute commented Sep 6, 2023

So this discrepancy only happens when default_on_eof is used?

@tjallingt
Copy link
Author

tjallingt commented Sep 6, 2023

EDIT: used wrong data, updated to show what i used.

Just tested, they all produce different errors if i remove default_on_eof from both fields

[src/main.rs:29] &test_buffer = [
    0,
    31,
    0,
    0,
]
[src/main.rs:32] result_buffer = Err(
    Error {
        kind: InputBufferIsTooSmall {
            actual_size: 4,
            expected_size: 5,
        },
    },
)
[src/main.rs:35] result_stream_unbuffered = Err(
    Error {
        kind: IoError(
            Error {
                kind: UnexpectedEof,
                message: "failed to fill whole buffer",
            },
        ),
    },
)
[src/main.rs:38] result_stream_buffered = Err(
    Error {
        kind: UnexpectedEndOfInput,
    },
)

This is also a bit strange though, both the streaming readers error with assert_eq!(error.is_eof(), true), however read_from_buffer would not...

Changing the data to make the "array length" bigger again changes the error for the "stream buffered" reader.

[src/main.rs:29] &test_buffer = [
    0,
    32,
    0,
    0,
]
[src/main.rs:32] result_buffer = Err(
    Error {
        kind: InputBufferIsTooSmall {
            actual_size: 4,
            expected_size: 5,
        },
    },
)
[src/main.rs:35] result_stream_unbuffered = Err(
    Error {
        kind: IoError(
            Error {
                kind: UnexpectedEof,
                message: "failed to fill whole buffer",
            },
        ),
    },
)
[src/main.rs:38] result_stream_buffered = Err(
    Error {
        kind: IoError(
            Error {
                kind: UnexpectedEof,
                message: "failed to fill whole buffer",
            },
        ),
    },
)

@koute
Copy link
Owner

koute commented Sep 6, 2023

Right, so this needs to be mostly unified, and at the very least they should definitely be consistent whether they return Ok or Err. Thanks for the report.

@tjallingt
Copy link
Author

Wait sorry my test data was wrong (i was changing it to see if i could understand more).
This is the actual output for the data in the issue:

[src/main.rs:28] &test_buffer = [
    0,
    32,
    0,
    0,
    2,
]
[src/main.rs:31] result_buffer = Err(
    Error {
        kind: UnexpectedEndOfInput,
    },
)
[src/main.rs:34] result_stream_unbuffered = Err(
    Error {
        kind: IoError(
            Error {
                kind: UnexpectedEof,
                message: "failed to fill whole buffer",
            },
        ),
    },
)
[src/main.rs:37] result_stream_buffered = Err(
    Error {
        kind: IoError(
            Error {
                kind: UnexpectedEof,
                message: "failed to fill whole buffer",
            },
        ),
    },
)

@tjallingt
Copy link
Author

tjallingt commented Sep 6, 2023

Right, so this needs to be mostly unified, and at the very least they should definitely be consistent whether they return Ok or Err. Thanks for the report.

Yea I agree, if some data causes an error when deserializing it doesn't matter what the error is exactly but at least the parsed output should always be consistent. Maybe you have an idea of that the issue could be? I'll investigate a bit further :)

On another note; I wonder what you think of default_on_eof consuming bytes when the data will be discarded.
I do imagine it could be tricky to solve (resetting the reader to the location where it "entered" the default_on_eof may not be possible?). Also a change in behaviour like that could be breaking probably.

Luckily i think its possible to work around it by implementing Readable myself for the parent struct. I can then just grab the remaining bytes as a vec and try to parse it with the different sub-types. So maybe all this needs is a note in the documentation?

@tjallingt tjallingt linked a pull request Feb 14, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants