Add support for Enums #29 #44

atomicptr · 2023-11-21T06:02:40Z

I like to use enums in Python so this seemed important to me, I added a few ways to use enums to the tests.

The only thing missing would be being able to use enums as keys for dicts that seemed a lot more complicated so I didn't implement this. Although right now I think only strings and ints can be used for that either way.

I was waiting in line for a coffee for like 30m so I hacked this together on my phone, I hope the tests actually pass 😅

mthuurne · 2023-11-21T12:29:08Z

Thanks for your contribution!

I'll look into getting CI to run on pull requests as well. You're the first, so this part of the infrastructure hadn't been tested before.

It's good to see elaborate test cases as well. One thing that seems to be missing from the tests though is checking the error handling. For example, does the library produce a useful error message if the TOML value isn't a string, or if it is a string but it doesn't exist as a member of the enum?

Should the mapping from TOML string to Python enum member be done case-sensitive or case-insensitive? From your test case, it seems you expect case-insensitive matching, but I don't see anything in the code that implements that.

tests/test_parsing.py

mthuurne · 2023-11-21T13:01:59Z

Should the mapping from TOML string to Python enum member be done case-sensitive or case-insensitive? From your test case, it seems you expect case-insensitive matching, but I don't see anything in the code that implements that.

Actually, it seems you're trying to match enum members by their value ("red", 1) rather than by their name (RED, ONE).

I tend to think of the names as the interface of the enum and the value as something that is mainly used for debugging. In that perspective, it would be better to match on the names instead of values. But I don't know whether everyone uses enums in Python like that.

When enum.auto() is used, the value does not have any meaning, so we should be matching those on name. However, I don't think there is any way to determine whether auto() was used through introspection of the Enum subtype. Therefore I think we have to support mapping by name in any case.

Then the question remains whether we also want to support mapping by value. I'm leaning towards "no", to keep things simple. But if there is a compelling use case that can't be handled with mapping by name, I can be persuaded otherwise.

atomicptr · 2023-11-21T13:29:48Z

Thanks for the review! I'll look into the requested changes later or tomorrow!

Then the question remains whether we also want to support mapping by value. I'm leaning towards "no", to keep things simple. But if there is a compelling use case that can't be handled with mapping by name, I can be persuaded otherwise.

Mapping by name you mean to always expose the key as a string inside the TOML?

One use case I often face and the primary reason why I wanted enum support in the first place is a toml where you have a pre defined set of values (could probably also be solved with typing.Literal) aka

# this only accepts pre defined values 
# I am only consuming this (or maybe also updating) but this is defined by a different app
state = 3
# same concept
mode = "install"

In this case having dataclass binder use values like

state = "OFF"
mode = "INSTALL"

Is not helping me at least. The structure/values of this toml are pre defined I merely don't want to handle strings or ints on my end.

If I am both consumer and producer of the TOML I am also not sure if using the enum key names is useful because if I'd want that I could just name them this way in my enum values IMHO

mthuurne · 2023-11-21T16:07:39Z

Then the question remains whether we also want to support mapping by value. I'm leaning towards "no", to keep things simple. But if there is a compelling use case that can't be handled with mapping by name, I can be persuaded otherwise.

Mapping by name you mean to always expose the key as a string inside the TOML?

Yes. Let's modify the color enum from the unit test to make the name and value more distinct:

class Color(Enum):
    RED = "#f00"
    GREEN = "#0f0"
    BLUE = "#00f"

In this case, matching by name would mean mapping the TOML string "RED" (or also "red" if we map case-insensitive) to Color.RED. Matching by value would mean mapping the TOML string "#f00" to Color.RED.

One use case I often face and the primary reason why I wanted enum support in the first place is a toml where you have a pre defined set of values (could probably also be solved with typing.Literal) aka

Adding support for typing.Literal could be useful; I've filed that under #45.

# this only accepts pre defined values 
# I am only consuming this (or maybe also updating) but this is defined by a different app
state = 3
# same concept
mode = "install"
In this case having dataclass binder use values like
state = "OFF"
mode = "INSTALL"
Is not helping me at least. The structure/values of this toml are pre defined I merely don't want to handle strings or ints on my end.

Ah, this is a use case I hadn't considered: if the TOML format is fixed and uses integers or strings that are not valid Python identifiers, then indeed matching by value is useful.

Maybe we could use IntEnum, IntFlag and StrEnum to force matching by value? That would be similar to how those enumeration subtypes operate within Python itself.

For example with this code:

from dataclasses import dataclass
from enum import Enum, IntEnum, auto

class Verbosity(Enum):
    QUIET = auto()
    NORMAL = auto()
    DETAILED = auto()

class IntVerbosity(IntEnum):
    QUIET = 0
    NORMAL = 1
    DETAILED = 2

@dataclass
class Settings:
    verbose_name: Verbosity
    verbose_value: IntVerbosity

The corresponding TOML would look like this:

verbose-name = 'quiet'
verbose-value = 2

Note that StrEnum was introduced in Python 3.11, so it won't be available in 3.10. However, if we support case-insensitive name matching, that would probably be sufficient in the majority of cases. It is the integer case that benefits most from value matching and IntEnum and IntFlag have been around since the enum module got added to the standard library.

If I am both consumer and producer of the TOML I am also not sure if using the enum key names is useful because if I'd want that I could just name them this way in my enum values IMHO

There are scenarios in which name matching is useful:

when the enum class already exists and is using auto() for its values
when the enum's values are of a type unsupported by TOML
when the enum's values have a meaning inside the program but shouldn't be considered part of its public interface, like the RED = "#f00" example, where the exact RGB value assigned might change in the future

Since both name matching and value matching are useful, the question becomes how to support both while keeping the behavior of dataclass-binder predictable to the user. Would using IntEnum, IntFlag and StrEnum to select value matching be a workable solution for you?

atomicptr · 2023-11-21T23:28:47Z

That would indeed work for me and I agree having the ability to use keys seems very useful! I'm just not sure yet as to what is more intuitive for the user because I don't think using the specialized sub classes to differentiate between keys/values is something I'd expect, they also have some other implications in their usage that might be unwanted. The values of auto are also highly predictable (counting upwards starting with 1 or in case of string enums just the key in lower case), probably wouldn't want to use this when you actively care about the values (consuming foreign toml).

I think we should decide for one of these two cases to be the default (key vs value) and have the other available through either a different base class or a decorator making this choice deliberate.

For example if values where default (names are just examples)

class Color(Enum, BindEnumKey):
    RED = "#f00"
    #.... 

# or 

@bind_enum_key(case_sensisitive=True) 
class Color(Enum):
    RED = "#f00"
    #....

I do like being able to configure the behavior via decorators.

We could also force users to annotate all enums and reject them if they haven't defined this behavior explicitly

@bind_enum(as="key", case_sensisitive=True) 
class Color(Enum):
    RED = "#f00"
    #....

Downside is that this is special behavior for dataclass binder, although some way of annotating this seems sensible 🤔

mthuurne · 2023-11-22T16:19:10Z

That would indeed work for me and I agree having the ability to use keys seems very useful! I'm just not sure yet as to what is more intuitive for the user because I don't think using the specialized sub classes to differentiate between keys/values is something I'd expect, they also have some other implications in their usage that might be unwanted.

With IntEnum/IntFlag/StrEnum, the user communicates that the values are important. So I think matching by value when these types are used is likely to produce the behavior the user is looking for.

The values of auto are also highly predictable (counting upwards starting with 1 or in case of string enums just the key in lower case), probably wouldn't want to use this when you actively care about the values (consuming foreign toml).

Indeed: while the values are predictable, by using auto() the user expresses that they don't really care about the values and that the values might change in future versions of the software. So in those situations matching by name would make more sense.

I think we should decide for one of these two cases to be the default (key vs value) and have the other available through either a different base class or a decorator making this choice deliberate.
...
Downside is that this is special behavior for dataclass binder, although some way of annotating this seems sensible 🤔

So far I've managed to avoid introducing any new syntax for dataclasses in dataclass-binder. I'm not sure this is feasible forever, but I'd like to keep it to a minimum:

to reduce the amount of new syntax that people who write new dataclasses have to learn
to increase the chance of existing dataclasses working with dataclass-binder with no or minimal changes
to make the dataclasses easier to read by people who don't necessarily know all the details of the binding process
to have the option to support other dataclass-like solutions in the future (such as attrs)

In some cases the behavior comes naturally, such as dataclass fields without defaults being mandatory in TOML. However, in the case of selecting name versus value matching for enumerations, I don't think a very intuitive solution exists. I hope that "IntEnum/IntFlag/StrEnum for value matching" is at least intuitive enough that people will have an easier time remembering it after reading the docs.

I'm not certain that using conventions rather than explicit annotations is the right way to go, but it is consistent with the way the library has been designed so far, so I'd like to give it a try.

…rEnums

atomicptr · 2023-11-23T02:03:49Z

So far I've managed to avoid introducing any new syntax for dataclasses in dataclass-binder. I'm not sure this is feasible forever, but I'd like to keep it to a minimum:
* to reduce the amount of new syntax that people who write new dataclasses have to learn

* to increase the chance of existing dataclasses working with `dataclass-binder` with no or minimal changes

* to make the dataclasses easier to read by people who don't necessarily know all the details of the binding process

* to have the option to support other dataclass-like solutions in the future (such as `attrs`)
In some cases the behavior comes naturally, such as dataclass fields without defaults being mandatory in TOML. However, in the case of selecting name versus value matching for enumerations, I don't think a very intuitive solution exists. I hope that "IntEnum/IntFlag/StrEnum for value matching" is at least intuitive enough that people will have an easier time remembering it after reading the docs.

I'm not certain that using conventions rather than explicit annotations is the right way to go, but it is consistent with the way the library has been designed so far, so I'd like to give it a try.

I actually very much appreciate this.

I pushed the changes, StrEnum, IntEnum and IntFlag all inherit ReprEnum so this seemed like a good candidate

Edit: Also added some more tests which test the case insensitivity working and some error cases

atomicptr · 2023-12-02T02:59:15Z

@mthuurne Any more changes you would like to see?

mthuurne · 2023-12-04T15:41:18Z

Sorry for the delay.

I pushed the changes, StrEnum, IntEnum and IntFlag all inherit ReprEnum so this seemed like a good candidate

ReprEnum was introduced in 3.11, while dataclass-binder supports 3.10 as well, so we need an alternative code path for 3.10. I created one in commit 5c32aa3, maybe you could cherry-pick that?

I couldn't get CI to work on this PR, but it should now be enabled for future PRs. So I ran the tests locally and got one failing test case and two lines missing test coverage. I'll add comments on the specific lines.

src/dataclass_binder/_impl.py

tests/test_formatting.py

atomicptr · 2023-12-18T02:21:49Z

Sorry for the delay, I was a bit busy.

Edit: Fixed all the open changes!

mthuurne

Almost there, but the error handling still needs a bit of work.

tests/test_parsing.py

src/dataclass_binder/_impl.py

mthuurne · 2024-01-25T11:52:41Z

src/dataclass_binder/_impl.py

+                if not isinstance(value, str):
+                    raise TypeError(f"'{value}' is not a valid key for enum '{field_type}', must be of type str")
+                for enum_value in field_type:
+                    if enum_value.name.lower() == value.lower():


Small optimization: you could store value.lower() in a local variable, to avoid converting the same string multiple times.

Since there is no other case where this is used in this function I'm not sure this would help 🤔

src/dataclass_binder/_impl.py

atomicptr · 2024-01-31T10:24:09Z

Hey! Thanks for the review, I have added the requested changes :)

wyleung · 2024-04-22T08:42:42Z

@mthuurne Is there a timeline when this PR can be merged? As far I can see, there will be no blockers curently?

atomicptr added 4 commits November 21, 2023 14:57

Add support for Enums ProtixIT#29

a493d26

Add formatting test and move parsing test into correct file

4c06b31

Fix test class shadowing id

5afa686

Make formatter happy

3da4c2c

mthuurne reviewed Nov 21, 2023

View reviewed changes

tests/test_parsing.py Outdated Show resolved Hide resolved

mthuurne reviewed Nov 21, 2023

View reviewed changes

tests/test_parsing.py Outdated Show resolved Hide resolved

atomicptr added 2 commits November 23, 2023 10:34

adjust unit tests

0da89a5

use enum key to represent Enums, but use value based approach for Rep…

219fff3

…rEnums

add more tests

9dc36a4

mthuurne reviewed Dec 4, 2023

View reviewed changes

src/dataclass_binder/_impl.py Outdated Show resolved Hide resolved

mthuurne reviewed Dec 4, 2023

View reviewed changes

src/dataclass_binder/_impl.py Outdated Show resolved Hide resolved

mthuurne reviewed Dec 4, 2023

View reviewed changes

tests/test_formatting.py Show resolved Hide resolved

atomicptr added 2 commits December 18, 2023 10:56

add ReprEnum workaround for Python 3.10

f3485a9

add missing test case

a4ba84c

fix enum related formatting bug in Python 3.10

56b051d

atomicptr requested a review from mthuurne January 21, 2024 16:49

mthuurne added 2 commits January 25, 2024 12:07

Work around formatting inconsistency between isort and Ruff

1bbdda5

Merge remote-tracking branch 'origin/main' into task/enum-support

82f7569

mthuurne reviewed Jan 25, 2024

View reviewed changes

src/dataclass_binder/_impl.py Outdated Show resolved Hide resolved

fix remarks from code review

a5a70b6

atomicptr requested a review from mthuurne February 8, 2024 08:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Enums #29 #44

Add support for Enums #29 #44

atomicptr commented Nov 21, 2023

mthuurne commented Nov 21, 2023 •

edited

Loading

mthuurne commented Nov 21, 2023

atomicptr commented Nov 21, 2023

mthuurne commented Nov 21, 2023 •

edited

Loading

atomicptr commented Nov 21, 2023

mthuurne commented Nov 22, 2023

atomicptr commented Nov 23, 2023 •

edited

Loading

atomicptr commented Dec 2, 2023

mthuurne commented Dec 4, 2023

atomicptr commented Dec 18, 2023 •

edited

Loading

mthuurne left a comment

mthuurne Jan 25, 2024

atomicptr Jan 31, 2024

atomicptr commented Jan 31, 2024

wyleung commented Apr 22, 2024

Add support for Enums #29 #44

Are you sure you want to change the base?

Add support for Enums #29 #44

Conversation

atomicptr commented Nov 21, 2023

mthuurne commented Nov 21, 2023 • edited Loading

mthuurne commented Nov 21, 2023

atomicptr commented Nov 21, 2023

mthuurne commented Nov 21, 2023 • edited Loading

atomicptr commented Nov 21, 2023

mthuurne commented Nov 22, 2023

atomicptr commented Nov 23, 2023 • edited Loading

atomicptr commented Dec 2, 2023

mthuurne commented Dec 4, 2023

atomicptr commented Dec 18, 2023 • edited Loading

mthuurne left a comment

Choose a reason for hiding this comment

mthuurne Jan 25, 2024

Choose a reason for hiding this comment

atomicptr Jan 31, 2024

Choose a reason for hiding this comment

atomicptr commented Jan 31, 2024

wyleung commented Apr 22, 2024

mthuurne commented Nov 21, 2023 •

edited

Loading

mthuurne commented Nov 21, 2023 •

edited

Loading

atomicptr commented Nov 23, 2023 •

edited

Loading

atomicptr commented Dec 18, 2023 •

edited

Loading