Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QUESTION: Order of config sources and overriding earlier state #505

Open
danizen opened this issue Dec 17, 2024 · 5 comments
Open

QUESTION: Order of config sources and overriding earlier state #505

danizen opened this issue Dec 17, 2024 · 5 comments
Assignees

Comments

@danizen
Copy link

danizen commented Dec 17, 2024

In my application, the CLI arguments --env and --region are needed to connect to correctly parse a configuration file, whose settings should yield to CLI argument overrides, and to connect to an external configuration source whose settings should override CLI arguments.

How do I accomplish this?

My current configuration of the settings sources looks like the below. Both of these configuration sources are custom - the structure of the configuration file is governed by a pydantic BaseModel so that YAML is tightly constrained.

class MySettings(BaseSettings):
    env: Literal['dev', 'qa', 'perf', 'prod']
    region: Literal[some AWS regions]

    # ...other stuff...

    @classmethod
    def settings_customise_sources(
            cls,
            settings_cls: Type[BaseSettings],
            init_settings: PydanticBaseSettingsSource,
            env_settings: PydanticBaseSettingsSource,
            dotenv_settings: PydanticBaseSettingsSource,
            file_secret_settings: PydanticBaseSettingsSource,
    ) -> tuple[PydanticBaseSettingsSource, ...]:
        # NOTE: not quite sure how to make the file path configurable, see #259
        return (
            init_settings,
            env_settings,
            CliSettingsSource(settings_cls, cli_parse_args=True, cli_ignore_unknown_args=True),
            ConfigFileSettingsSource(settings_cls, 'appconfig.yaml'),    # CLI takes precedence
            GlueWorkflowConfigSource(settings_cls),                      # Glue Workflow parameters take precedence
        )

Basically, the env and region changes what parts of the configuration file take precedence, and then the region is passed to the GlueWorkflow boto3.client('glue') which we don't need in the AWS environment but do need on the command-line. Maybe have two custom children of CliSettingsSource - one that parses the env and region early on, and another later after the GlueWorkflow to parse the other arguments.

Let me also ask whether there is a better forum such as discord, slack IRC, mailing lists, or UUNet to ask questions like this... since pydantic overall says questions are welcome as issues, I've been doing that.

@danizen
Copy link
Author

danizen commented Dec 17, 2024

So, looks like I'll be solving this by adding a custom CLI Settings Source (which actually is not a child of CLI settings source) that only parses the env and region

class MySettings(BaseSettings):
    env: Literal['dev', 'qa', 'perf', 'prod']
    region: Literal[some AWS regions]
    job_name: str

    # ...other stuff...

    @classmethod
    def settings_customise_sources(
            cls,
            settings_cls: Type[BaseSettings],
            init_settings: PydanticBaseSettingsSource,
            env_settings: PydanticBaseSettingsSource,
            dotenv_settings: PydanticBaseSettingsSource,
            file_secret_settings: PydanticBaseSettingsSource,
    ) -> tuple[PydanticBaseSettingsSource, ...]:
        # NOTE: not quite sure how to make the file path configurable, see #259
        return (
            init_settings,
            env_settings,
            CustomCliSettingsSource(settings_cls),                       # Only parses "env" and "region"
            GlueWorkflowConfigSource(settings_cls),                      # Glue Workflow parameters take precedence over CLI
            CliSettingsSource(settings_cls, cli_parse_args=True, cli_ignore_unknown_args=True),
            ConfigFileSettingsSource(settings_cls, 'appconfig.yaml'),    # CLI takes precedence
        )

@danizen
Copy link
Author

danizen commented Dec 17, 2024

It looks like this could be a PR because all I need to implement is a LimitedCliSettingsSource which also takes a new field cli_include_only: Collection[str] | None. This would affect how _sort_arg_fields would operate as it would check whether the field is not None or contains the fields to include.

Once I have implemented this locally, I may attempt to make a PR after hours.

@danizen
Copy link
Author

danizen commented Dec 17, 2024

I have an implementation good enough for my professional purposes. I think a more general implementation might also need to be able to exclude fields as well as include fields. Typing this runs into the curious way that Generic is used in CliSettingsSource. I am not sure I understand why it needs it, since it usually returns dictionary from str to object (str|None, complex, etc.).

I am also looking for some feedback on the fact that the filtering would be limited to the base fields, and not nested fields. The reason that made sense to me is that the complex fields if provided on the command-line would be presumed to be Json encoded - this is what prepare_field_values does.

# Custom settings source to filter CLI fields
class FilteredCliSettingsSource(CliSettingsSource[dict[str, object]]):
    """
    A custom CliSettingsSource that will parse only some fields which we need to bootstrap
    the other settings sources.  This is useful when you want some CLI fields to take precedence
    and some to have lower precedence. This is works only on fields of the base settings, and not
    on any nested models for the simple reason that nested models would be expected to be encoded
    within the command-line arguments.

    Attributes:
        cli_includes (Collection[str]): A list of field names to include
    """
    def __init__(
            self,
            settings_cls: type[BaseSettings],
            cli_includes: Collection[str],
            **kwargs: Any
    ) -> None:
        self.cli_includes = cli_includes
        super().__init__(settings_cls, **kwargs)

    def _sort_arg_fields(self, model: type[BaseModel]) -> list[tuple[str, FieldInfo]]:
        """
        Wrapper method that filters the sorted fields and keeps only those in an optional list,
        or returns all of them.
        """
        sorted_arg_fields = super()._sort_arg_fields(model)
        sorted_arg_fields = [(name, info) for name, info in sorted_arg_fields if name in self.cli_includes]
        return sorted_arg_fields

@hramezani
Copy link
Member

Thanks @danizen for this issue.

@kschwab could you please take a look here?

@kschwab
Copy link
Contributor

kschwab commented Dec 20, 2024

Hi @danizen, this is a nice idea. Currently, the only thing that is similar to the above is CLI_SUPPRESS or CliSuppress field, which just hides fields from the help interface, but will still parse the CLI if present. Your suggestion above is nice in that it would actually skip parsing certain fields, i.e. a more forceful variant.

I wouldn't limit it to just base fields as settings ultimately pull from multiple sources. The CLI also breaks complex fields out individually, so it's not just limited to JSON encoded input. If parsing and _sort_arg_fields were updated to track the current field path, I think it would be pretty straightforward and look similar to what you have above.

Lastly, I think the inverse (e.g., cli_excludes) would also be nice to have. I know we have similar cases internally that use CliSuppress but something like cli_excludes would be preferable. Could probably just make it mutually exclusive with cli_includes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants