Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Enums #29 #44

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

atomicptr
Copy link

I like to use enums in Python so this seemed important to me, I added a few ways to use enums to the tests.

The only thing missing would be being able to use enums as keys for dicts that seemed a lot more complicated so I didn't implement this. Although right now I think only strings and ints can be used for that either way.

I was waiting in line for a coffee for like 30m so I hacked this together on my phone, I hope the tests actually pass 😅

@mthuurne
Copy link
Member

mthuurne commented Nov 21, 2023

Thanks for your contribution!

I'll look into getting CI to run on pull requests as well. You're the first, so this part of the infrastructure hadn't been tested before.

It's good to see elaborate test cases as well. One thing that seems to be missing from the tests though is checking the error handling. For example, does the library produce a useful error message if the TOML value isn't a string, or if it is a string but it doesn't exist as a member of the enum?

Should the mapping from TOML string to Python enum member be done case-sensitive or case-insensitive? From your test case, it seems you expect case-insensitive matching, but I don't see anything in the code that implements that.

tests/test_parsing.py Outdated Show resolved Hide resolved
tests/test_parsing.py Outdated Show resolved Hide resolved
@mthuurne
Copy link
Member

Should the mapping from TOML string to Python enum member be done case-sensitive or case-insensitive? From your test case, it seems you expect case-insensitive matching, but I don't see anything in the code that implements that.

Actually, it seems you're trying to match enum members by their value ("red", 1) rather than by their name (RED, ONE).

I tend to think of the names as the interface of the enum and the value as something that is mainly used for debugging. In that perspective, it would be better to match on the names instead of values. But I don't know whether everyone uses enums in Python like that.

When enum.auto() is used, the value does not have any meaning, so we should be matching those on name. However, I don't think there is any way to determine whether auto() was used through introspection of the Enum subtype. Therefore I think we have to support mapping by name in any case.

Then the question remains whether we also want to support mapping by value. I'm leaning towards "no", to keep things simple. But if there is a compelling use case that can't be handled with mapping by name, I can be persuaded otherwise.

@atomicptr
Copy link
Author

Thanks for the review! I'll look into the requested changes later or tomorrow!

Then the question remains whether we also want to support mapping by value. I'm leaning towards "no", to keep things simple. But if there is a compelling use case that can't be handled with mapping by name, I can be persuaded otherwise.

Mapping by name you mean to always expose the key as a string inside the TOML?

One use case I often face and the primary reason why I wanted enum support in the first place is a toml where you have a pre defined set of values (could probably also be solved with typing.Literal) aka

# this only accepts pre defined values 
# I am only consuming this (or maybe also updating) but this is defined by a different app
state = 3
# same concept
mode = "install"

In this case having dataclass binder use values like

state = "OFF"
mode = "INSTALL"

Is not helping me at least. The structure/values of this toml are pre defined I merely don't want to handle strings or ints on my end.

If I am both consumer and producer of the TOML I am also not sure if using the enum key names is useful because if I'd want that I could just name them this way in my enum values IMHO

@mthuurne
Copy link
Member

mthuurne commented Nov 21, 2023

Then the question remains whether we also want to support mapping by value. I'm leaning towards "no", to keep things simple. But if there is a compelling use case that can't be handled with mapping by name, I can be persuaded otherwise.

Mapping by name you mean to always expose the key as a string inside the TOML?

Yes. Let's modify the color enum from the unit test to make the name and value more distinct:

class Color(Enum):
    RED = "#f00"
    GREEN = "#0f0"
    BLUE = "#00f"

In this case, matching by name would mean mapping the TOML string "RED" (or also "red" if we map case-insensitive) to Color.RED. Matching by value would mean mapping the TOML string "#f00" to Color.RED.

One use case I often face and the primary reason why I wanted enum support in the first place is a toml where you have a pre defined set of values (could probably also be solved with typing.Literal) aka

Adding support for typing.Literal could be useful; I've filed that under #45.

# this only accepts pre defined values 
# I am only consuming this (or maybe also updating) but this is defined by a different app
state = 3
# same concept
mode = "install"

In this case having dataclass binder use values like

state = "OFF"
mode = "INSTALL"

Is not helping me at least. The structure/values of this toml are pre defined I merely don't want to handle strings or ints on my end.

Ah, this is a use case I hadn't considered: if the TOML format is fixed and uses integers or strings that are not valid Python identifiers, then indeed matching by value is useful.

Maybe we could use IntEnum, IntFlag and StrEnum to force matching by value? That would be similar to how those enumeration subtypes operate within Python itself.

For example with this code:

from dataclasses import dataclass
from enum import Enum, IntEnum, auto

class Verbosity(Enum):
    QUIET = auto()
    NORMAL = auto()
    DETAILED = auto()

class IntVerbosity(IntEnum):
    QUIET = 0
    NORMAL = 1
    DETAILED = 2

@dataclass
class Settings:
    verbose_name: Verbosity
    verbose_value: IntVerbosity

The corresponding TOML would look like this:

verbose-name = 'quiet'
verbose-value = 2

Note that StrEnum was introduced in Python 3.11, so it won't be available in 3.10. However, if we support case-insensitive name matching, that would probably be sufficient in the majority of cases. It is the integer case that benefits most from value matching and IntEnum and IntFlag have been around since the enum module got added to the standard library.

If I am both consumer and producer of the TOML I am also not sure if using the enum key names is useful because if I'd want that I could just name them this way in my enum values IMHO

There are scenarios in which name matching is useful:

  • when the enum class already exists and is using auto() for its values
  • when the enum's values are of a type unsupported by TOML
  • when the enum's values have a meaning inside the program but shouldn't be considered part of its public interface, like the RED = "#f00" example, where the exact RGB value assigned might change in the future

Since both name matching and value matching are useful, the question becomes how to support both while keeping the behavior of dataclass-binder predictable to the user. Would using IntEnum, IntFlag and StrEnum to select value matching be a workable solution for you?

@atomicptr
Copy link
Author

That would indeed work for me and I agree having the ability to use keys seems very useful! I'm just not sure yet as to what is more intuitive for the user because I don't think using the specialized sub classes to differentiate between keys/values is something I'd expect, they also have some other implications in their usage that might be unwanted. The values of auto are also highly predictable (counting upwards starting with 1 or in case of string enums just the key in lower case), probably wouldn't want to use this when you actively care about the values (consuming foreign toml).

I think we should decide for one of these two cases to be the default (key vs value) and have the other available through either a different base class or a decorator making this choice deliberate.

For example if values where default (names are just examples)

class Color(Enum, BindEnumKey):
    RED = "#f00"
    #.... 

# or 

@bind_enum_key(case_sensisitive=True) 
class Color(Enum):
    RED = "#f00"
    #.... 

I do like being able to configure the behavior via decorators.

We could also force users to annotate all enums and reject them if they haven't defined this behavior explicitly

@bind_enum(as="key", case_sensisitive=True) 
class Color(Enum):
    RED = "#f00"
    #.... 

Downside is that this is special behavior for dataclass binder, although some way of annotating this seems sensible 🤔

@mthuurne
Copy link
Member

That would indeed work for me and I agree having the ability to use keys seems very useful! I'm just not sure yet as to what is more intuitive for the user because I don't think using the specialized sub classes to differentiate between keys/values is something I'd expect, they also have some other implications in their usage that might be unwanted.

With IntEnum/IntFlag/StrEnum, the user communicates that the values are important. So I think matching by value when these types are used is likely to produce the behavior the user is looking for.

The values of auto are also highly predictable (counting upwards starting with 1 or in case of string enums just the key in lower case), probably wouldn't want to use this when you actively care about the values (consuming foreign toml).

Indeed: while the values are predictable, by using auto() the user expresses that they don't really care about the values and that the values might change in future versions of the software. So in those situations matching by name would make more sense.

I think we should decide for one of these two cases to be the default (key vs value) and have the other available through either a different base class or a decorator making this choice deliberate.
...
Downside is that this is special behavior for dataclass binder, although some way of annotating this seems sensible 🤔

So far I've managed to avoid introducing any new syntax for dataclasses in dataclass-binder. I'm not sure this is feasible forever, but I'd like to keep it to a minimum:

  • to reduce the amount of new syntax that people who write new dataclasses have to learn
  • to increase the chance of existing dataclasses working with dataclass-binder with no or minimal changes
  • to make the dataclasses easier to read by people who don't necessarily know all the details of the binding process
  • to have the option to support other dataclass-like solutions in the future (such as attrs)

In some cases the behavior comes naturally, such as dataclass fields without defaults being mandatory in TOML. However, in the case of selecting name versus value matching for enumerations, I don't think a very intuitive solution exists. I hope that "IntEnum/IntFlag/StrEnum for value matching" is at least intuitive enough that people will have an easier time remembering it after reading the docs.

I'm not certain that using conventions rather than explicit annotations is the right way to go, but it is consistent with the way the library has been designed so far, so I'd like to give it a try.

@atomicptr
Copy link
Author

atomicptr commented Nov 23, 2023

So far I've managed to avoid introducing any new syntax for dataclasses in dataclass-binder. I'm not sure this is feasible forever, but I'd like to keep it to a minimum:

* to reduce the amount of new syntax that people who write new dataclasses have to learn

* to increase the chance of existing dataclasses working with `dataclass-binder` with no or minimal changes

* to make the dataclasses easier to read by people who don't necessarily know all the details of the binding process

* to have the option to support other dataclass-like solutions in the future (such as `attrs`)

In some cases the behavior comes naturally, such as dataclass fields without defaults being mandatory in TOML. However, in the case of selecting name versus value matching for enumerations, I don't think a very intuitive solution exists. I hope that "IntEnum/IntFlag/StrEnum for value matching" is at least intuitive enough that people will have an easier time remembering it after reading the docs.

I'm not certain that using conventions rather than explicit annotations is the right way to go, but it is consistent with the way the library has been designed so far, so I'd like to give it a try.

I actually very much appreciate this.

I pushed the changes, StrEnum, IntEnum and IntFlag all inherit ReprEnum so this seemed like a good candidate

Edit: Also added some more tests which test the case insensitivity working and some error cases

@atomicptr
Copy link
Author

@mthuurne Any more changes you would like to see?

@mthuurne
Copy link
Member

mthuurne commented Dec 4, 2023

Sorry for the delay.

I pushed the changes, StrEnum, IntEnum and IntFlag all inherit ReprEnum so this seemed like a good candidate

ReprEnum was introduced in 3.11, while dataclass-binder supports 3.10 as well, so we need an alternative code path for 3.10. I created one in commit 5c32aa3, maybe you could cherry-pick that?

I couldn't get CI to work on this PR, but it should now be enabled for future PRs. So I ran the tests locally and got one failing test case and two lines missing test coverage. I'll add comments on the specific lines.

@atomicptr
Copy link
Author

atomicptr commented Dec 18, 2023

Sorry for the delay, I was a bit busy.

Edit: Fixed all the open changes!

@atomicptr atomicptr requested a review from mthuurne January 21, 2024 16:49
Copy link
Member

@mthuurne mthuurne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there, but the error handling still needs a bit of work.

tests/test_parsing.py Outdated Show resolved Hide resolved
src/dataclass_binder/_impl.py Show resolved Hide resolved
src/dataclass_binder/_impl.py Outdated Show resolved Hide resolved
src/dataclass_binder/_impl.py Outdated Show resolved Hide resolved
if not isinstance(value, str):
raise TypeError(f"'{value}' is not a valid key for enum '{field_type}', must be of type str")
for enum_value in field_type:
if enum_value.name.lower() == value.lower():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small optimization: you could store value.lower() in a local variable, to avoid converting the same string multiple times.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there is no other case where this is used in this function I'm not sure this would help 🤔

@atomicptr
Copy link
Author

Hey! Thanks for the review, I have added the requested changes :)

@atomicptr atomicptr requested a review from mthuurne February 8, 2024 08:50
@wyleung
Copy link
Member

wyleung commented Apr 22, 2024

@mthuurne Is there a timeline when this PR can be merged? As far I can see, there will be no blockers curently?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants