Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support enum variants that have aliases #53

Merged
merged 2 commits into from
Dec 17, 2024
Merged

Conversation

dtolnay
Copy link
Contributor

@dtolnay dtolnay commented Oct 28, 2024

Summary

Closes #52. Previously, in the case of an enum with aliases:

#[derive(Deserialize)]
enum Enum {
    A,
    #[serde(alias = "B1")]
    B,
}

serde-reflection would look at the list of expected variant names ("A", "B", "B1") and decide that it should expect to deserialize them using the same number of variant indices 0, 1, 2. This isn't correct because "B" and "B1" actually share the same index 1, and there does not exist any variant with index 2. Thus tracer.trace_type::<Enum> would fail with an error like Failed to deserialize value: "invalid value: integer '2', expected variant index 0 ≤ i < 2".

This PR improves trace_type's deserializer to first try deserializing a variant using each of the supplied names ("A", "B", "B1"), record the enum discriminant of each resulting enum value, then deserializing again using sequential indices (0, 1, …) until it has found a result with the same discriminant as each of the named variants. From this it can see that both "A" and 0 deserialize to the same variant Enum::A, while all of "B" and "B1" and 1 deserialize to Enum::B.

Test Plan

  • cd serde-reflection; cargo test

  • use serde_derive::Deserialize;
    use serde_reflection::{Samples, Tracer, TracerConfig};
    
    #[derive(Deserialize, Debug)]
    enum Enum {
        A,
        #[serde(alias = "B1")]
        B,
    }
    
    fn main() {
        let samples = Samples::new();
        let mut tracer = Tracer::new(TracerConfig::default());
        tracer.trace_type::<Enum>(&samples).unwrap();
        println!("{:#?}", tracer.registry().unwrap());
    }
    {
        "Enum": Enum(
            {
                0: Named {
                    name: "A",
                    value: Unit,
                },
                1: Named {
                    name: "B",
                    value: Unit,
                },
            },
        ),
    }

@ma2bd
Copy link
Contributor

ma2bd commented Oct 28, 2024

@dtolnay Thanks for the PR! I'll try to have comments this week

@juntyr
Copy link

juntyr commented Oct 28, 2024

Thank you @dtolnay for coming up with a fix so quickly!

Copy link
Contributor

@ma2bd ma2bd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's clearly more complicated now but I think I get the idea.

I don't suppose we could infer the index from the discriminant?

let value = seed.deserialize(index.into_deserializer())?;
let value = match self.variant_id {
VariantId::Index(index) => seed.deserialize(U32Deserializer::new(index)),
VariantId::Name(name) => seed.deserialize(BorrowedStrDeserializer::new(name)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting

self.samples,
VariantId::Index(index),
&mut value,
))?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, we removed .expect("variant indexes must be a non-empty range 0..variants.len()")) but the assumption is still there, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. Serde-reflection will try indices 0.. until it has matched each named variant to an index. So if there are variants.len() named variants, then there are at least 1 and at most variants.len() distinct possible discriminants. If each index deserializes to a unique discriminant (which is always the case in derived impls) then this would never exceed variants.len() - 1 by trying indices in the 0.. sequence.

There are other weird things that a handwritten visit_enum implementation could do: discontiguous index ranges, or multiple indices deserializing to the same variant. Neither of these is supported by the previous implementation either.


// If the enum is marked as incomplete, just visit the first index
// because we presume it avoids recursion.
if self.tracer.incomplete_enums.contains_key(enum_name) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps also match variants.len() == 0 while we're at it (to avoid an underflow below)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For performance reasons (in project where there are many enums), it would be nice to have a criteria to quickly skip the analysis when it was completed previously.

I wonder if we could structure self.tracer.discriminants with nested maps and write a stop condition like:

self.tracer.discriminants.get(&enum_type_id).or_default().len() == variants.len() && known_variants.range(provisional_min..).is_empty()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a check for variants.is_empty(). Before this PR, this would have panicked. After this PR it will return a NotSupported error.

Comment on lines +472 to +473
// If there are no provisional entries waiting for an index, just go
// with index 0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above. It would be nice to not reach that line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every call to deserialize_enum has to either call visit_enum or return an error. There is no existing short-circuit mechanism in serde-reflection (every deserialization error is propagated to the caller of trace_type). One can be added by introducing a kind of error that Tracer knows how to suppress, but that seems orthogonal to this PR.

Before PR 53:

    thread 'main' panicked at /git/tmp/serde-reflection/serde-reflection/src/de.rs:431:18:
    variant indexes must be a non-empty range 0..variants.len()

First draft of PR 53, debug mode (overflow checks):

    thread 'main' panicked at /git/tmp/serde-reflection/serde-reflection/src/de.rs:435:42:
    attempt to subtract with overflow

First draft of PR 53, release mode:

    Failed to deserialize value: "invalid value: integer `0`, expected variant index 0 <= i < 0"

This commit:

    Not supported: deserialize_enum with 0 variants
VariantId::Name(variant_name),
&mut value,
))?;
let discriminant = Discriminant::of(&enum_value);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't suppose we could infer the index from the discriminant?

You can turn a discriminant into an integer value as follows, but that is not necessarily the same index that would deserialize to that variant.

#[derive(Debug, PartialEq)]
enum Discriminant {
    U8(u8),
    U16(u16),
    U32(u32),
    U64(u64),
    U128(u128),
    Usize(usize),
    I8(i8),
    I16(i16),
    I32(i32),
    I64(i64),
    I128(i128),
    Isize(isize),
}

fn get_discriminant<T>(value: &T) -> Discriminant {
    struct GetDiscriminant(Option<Discriminant>);

    impl std::hash::Hasher for GetDiscriminant {
        fn write_u8(&mut self, i: u8) {
            assert!(self.0.replace(Discriminant::U8(i)).is_none());
        }
        fn write_u16(&mut self, i: u16) {
            assert!(self.0.replace(Discriminant::U16(i)).is_none());
        }
        fn write_u32(&mut self, i: u32) {
            assert!(self.0.replace(Discriminant::U32(i)).is_none());
        }
        fn write_u64(&mut self, i: u64) {
            assert!(self.0.replace(Discriminant::U64(i)).is_none());
        }
        fn write_u128(&mut self, i: u128) {
            assert!(self.0.replace(Discriminant::U128(i)).is_none());
        }
        fn write_usize(&mut self, i: usize) {
            assert!(self.0.replace(Discriminant::Usize(i)).is_none());
        }
        fn write_i8(&mut self, i: i8) {
            assert!(self.0.replace(Discriminant::I8(i)).is_none());
        }
        fn write_i16(&mut self, i: i16) {
            assert!(self.0.replace(Discriminant::I16(i)).is_none());
        }
        fn write_i32(&mut self, i: i32) {
            assert!(self.0.replace(Discriminant::I32(i)).is_none());
        }
        fn write_i64(&mut self, i: i64) {
            assert!(self.0.replace(Discriminant::I64(i)).is_none());
        }
        fn write_i128(&mut self, i: i128) {
            assert!(self.0.replace(Discriminant::I128(i)).is_none());
        }
        fn write_isize(&mut self, i: isize) {
            assert!(self.0.replace(Discriminant::Isize(i)).is_none());
        }
        fn write(&mut self, _bytes: &[u8]) {
            unreachable!()
        }
        fn finish(&self) -> u64 {
            unreachable!()
        }
    }

    let mut hasher = GetDiscriminant(None);
    std::hash::Hash::hash(&std::mem::discriminant(value), &mut hasher);
    hasher.0.unwrap()
}

fn main() {
    enum Enum { First, Second }
    println!("{:?}", get_discriminant(&Enum::Second));
}

@jerel
Copy link
Contributor

jerel commented Nov 27, 2024

This PR is very interesting to me, I think that it may also move us closer to advanced functionality like being able to exhaustively trace enums which are nested as discussed in #21? I'd be interested in any thoughts on that angle. After a cursory read through it seems like this approach could maybe be extended to enable a branch to be repeatedly explored until its children are completed instead of the current behavior of panicking with incomplete enums:

pub(crate) enum EnumProgress {
    /// There are children which are incomplete
    ChildTypesRemaining(VariantId),
    /// There are variant names that have not yet been traced.
    NamedVariantsRemaining,
    /// There are variant numbers that have not yet been traced.
    IndexedVariantsRemaining,
}

@dtolnay
Copy link
Contributor Author

dtolnay commented Nov 28, 2024

@jerel, yes your intuition is correct. As I was working on this PR I was surprised to find out the underpowered approach to trace_type that this crate was using (although I did not come across #21). I had heard about the existence of this crate a while ago, and what you're saying matches how I had always assumed this crate already works.

@ma2bd
Copy link
Contributor

ma2bd commented Nov 28, 2024

@jerel @dtolnay I don't quite see how this PR changes anything regarding #21. The most desirable approach (processing all enum variants right away the first time we visit an enum type) is still prevented by the definition of the Deserializer trait, and notably the fact that the given visitors are not clonable (for good reasons).

This is what I responded on #21:

Consider nested enums: enum A { ... SomethingUsingB(.. B ..) } and enum B { .. many variants .. }. To efficiently trace B when A is the top user type, one would have to repeatedly force the same path from A to B until all the variants of B are covered. This path from A to B could be quite complex and would need to be recorded in a Rust datatype so that it can be replayed. (Also, the first path found from A to B might not be the shortest.)

Feel free to continue the discussion on #21 if you disagree.

@dtolnay Not sure if you remember but indeed we discussed the design of this crate at its very beginning where we were both Facebook employees. You suggested the idea of trace_type, so thank you for that! This crate has been used by
at least three new blockchain projects since then.

gregnazario added a commit to gregnazario/serde-reflection that referenced this pull request Dec 13, 2024
@ma2bd ma2bd merged commit 05da6e6 into zefchain:main Dec 17, 2024
2 checks passed
ma2bd added a commit to ma2bd/serde-reflection that referenced this pull request Dec 17, 2024
@ma2bd ma2bd mentioned this pull request Dec 17, 2024
ma2bd added a commit to ma2bd/serde-reflection that referenced this pull request Dec 17, 2024
ma2bd added a commit to ma2bd/serde-reflection that referenced this pull request Dec 17, 2024
ma2bd added a commit to ma2bd/serde-reflection that referenced this pull request Dec 18, 2024
ma2bd added a commit that referenced this pull request Dec 18, 2024
## Summary

Minor code improvements after #53

## Test Plan

```
cargo test -p serde-reflection
```
@ma2bd
Copy link
Contributor

ma2bd commented Dec 18, 2024

Released in serde-reflection v0.5.0

cc @gregnazario @JoshLind @dtolnay @juntyr

@JoshLind
Copy link

Awesome, thanks @ma2bd 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Enum tracing breaks for serde > 1.0.210
5 participants