Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model Sanity Checking #108

Open
2 of 3 tasks
lu-pl opened this issue Oct 22, 2024 · 4 comments
Open
2 of 3 tasks

Model Sanity Checking #108

lu-pl opened this issue Oct 22, 2024 · 4 comments
Assignees
Labels
documentation Improvements or additions to documentation enhancement New feature or request

Comments

@lu-pl
Copy link
Contributor

lu-pl commented Oct 22, 2024

rdfproxy.ModelBindingsMapper should run general sanity checks on models.

E.g. an error should be raised / a warning should be emitted if

  • a model defines group_by but doesn't specify a list field (see #95)

  • a model specifies a list field but doesn't define an applicable group_byconfig

  • a model does not define fields for all SPARQL bindings - this must not be an error condition, but a warning/info signal could be useful.

@lu-pl
Copy link
Contributor Author

lu-pl commented Oct 22, 2024

Note that some sanity checks are already in place, e.g. in rdfproxy.utils.utils._get_group_by, exceptions are raised if

  • a model has a list field but does not specify group_by
  • the group_by value does not reference a (SPARQLBinding/alias-resolved) name

For example removing model_config from the example Author model in the README like so

class Author(BaseModel):

    gnd: str
    surname: Annotated[str, SPARQLBinding("nameLabel")]
    works: list[Work]
    education: Annotated[list[str], SPARQLBinding("educated_atLabel")]

rightfully crashes with rdfproxy.utils._exceptions.MissingModelConfigException: Model config with 'group_by' value required for field-based grouping behavior.

And attempting to group by a name that does not exist, e.g.

class Author(BaseModel):
    model_config = ConfigDict(group_by="dne")

    gnd: str
    surname: Annotated[str, SPARQLBinding("nameLabel")]
    works: list[Work]
    education: Annotated[list[str], SPARQLBinding("educated_atLabel")]

also crashes and lists the applicable (resolved) grouping keys: rdfproxy.utils._exceptions.UnboundGroupingKeyException: Requested grouping key 'dne' not in SPARQL binding projection. Applicable grouping keys: gnd, nameLabel, educated_atLabel, work_name.

@lu-pl
Copy link
Contributor Author

lu-pl commented Oct 22, 2024

a model does not define fields for all SPARQL bindings - this must not be an error condition, but a warning/info signal could be useful.

Or maybe there should be a strict flag?

@lu-pl lu-pl added the enhancement New feature or request label Oct 22, 2024
@lu-pl lu-pl self-assigned this Oct 22, 2024
@lu-pl
Copy link
Contributor Author

lu-pl commented Oct 28, 2024

Probably, model sanity checks should be centralized.

The checks mentioned above run as part of rdfproxy.utils.utils._get_group_by and get triggered in the list code paths of rdfproxy.mapper.ModelBindingsMapper._generate_binding_pairs.

Other model checks may need to run in other places though and generally a single model sanity checking pipeline with pluggable/extensible model checks would be preferable.

@lu-pl lu-pl mentioned this issue Nov 25, 2024
3 tasks
@lu-pl
Copy link
Contributor Author

lu-pl commented Nov 29, 2024

Quick sketch for a model checker:

def check_model(model: BaseModel) -> BaseModel:
    model_group_by_value: str | None = model.model_config.get("group_by")
    model_has_list_field: bool = any(
        _is_list_type(value.annotation) for value in model.model_fields.values()
    )

    match model_group_by_value, model_has_list_field:
        case None, True:
            raise MissingModelConfigException(
                f"Model '{model.__name__}' does not specify 'group_by' in its model_config."
            )

        case str(), False:
            raise MissingGroupingTargetException(
                f"Model '{model.__name__}' does not specify a grouping target (i.e. a list-annotated field)."
            )

        case str(), True:
            if model_group_by_value not in model.model_fields.keys():
                applicable_keys = [
                    k
                    for k, v in model.model_fields.items()
                    if not _is_list_type(v.annotation)
                ]
                raise UnboundGroupingKeyException(
                    f"Requested grouping key '{model_group_by_value}' does not denote a model field.\n"
                    f"Applicable grouping keys: {', '.join(applicable_keys)}."
                )

    return model

Obviously, the exception messages need to be much more informative.

@kevinstadler kevinstadler added the documentation Improvements or additions to documentation label Dec 18, 2024
lu-pl added a commit that referenced this issue Jan 20, 2025
Some model sanity checking was implemented in the old
ModelBindingsMapper, all checking should be done in a dedicated model
sanity checking pipeline, see issue #108.

So these tests - for now - are xfails.
lu-pl added a commit that referenced this issue Jan 20, 2025
Some model sanity checking was implemented in the old
ModelBindingsMapper, all checking should be done in a dedicated model
sanity checking pipeline, see issue #108.

So these tests - for now - are xfails.
lu-pl added a commit that referenced this issue Jan 20, 2025
As indicated in the docstring for ModelBindingsMapper, the class is
somewhat coupled to SPARQLModelAdapter, because ModelBindinsMapper
does not run model sanity by itself - sanity checking should happen in
SPARQLModelAdapter, i.e. as early as possible. See issue #108. Once
sanity checking is implemented, RDFProxy will make a public
ModelBindingsMapper class available which will run model sanity
checking itself.
lu-pl added a commit that referenced this issue Jan 21, 2025
Some model sanity checking was implemented in the old
ModelBindingsMapper, all checking should be done in a dedicated model
sanity checking pipeline, see issue #108.

So these tests - for now - are xfails.
lu-pl added a commit that referenced this issue Jan 21, 2025
As indicated in the docstring for ModelBindingsMapper, the class is
somewhat coupled to SPARQLModelAdapter, because ModelBindinsMapper
does not run model sanity by itself - sanity checking should happen in
SPARQLModelAdapter, i.e. as early as possible. See issue #108. Once
sanity checking is implemented, RDFProxy will make a public
ModelBindingsMapper class available which will run model sanity
checking itself.
lu-pl added a commit that referenced this issue Jan 21, 2025
Some model sanity checking was implemented in the old
ModelBindingsMapper, all checking should be done in a dedicated model
sanity checking pipeline, see issue #108.

So these tests - for now - are xfails.
lu-pl added a commit that referenced this issue Jan 21, 2025
As indicated in the docstring for ModelBindingsMapper, the class is
somewhat coupled to SPARQLModelAdapter, because ModelBindinsMapper
does not run model sanity by itself - sanity checking should happen in
SPARQLModelAdapter, i.e. as early as possible. See issue #108. Once
sanity checking is implemented, RDFProxy will make a public
ModelBindingsMapper class available which will run model sanity
checking itself.
lu-pl added a commit that referenced this issue Jan 21, 2025
Some model sanity checking was implemented in the old
ModelBindingsMapper, all checking should be done in a dedicated model
sanity checking pipeline, see issue #108.

So these tests - for now - are xfails.
lu-pl added a commit that referenced this issue Jan 21, 2025
As indicated in the docstring for ModelBindingsMapper, the class is
somewhat coupled to SPARQLModelAdapter, because ModelBindinsMapper
does not run model sanity by itself - sanity checking should happen in
SPARQLModelAdapter, i.e. as early as possible. See issue #108. Once
sanity checking is implemented, RDFProxy will make a public
ModelBindingsMapper class available which will run model sanity
checking itself.
lu-pl added a commit that referenced this issue Jan 21, 2025
Some model sanity checking was implemented in the old
ModelBindingsMapper, all checking should be done in a dedicated model
sanity checking pipeline, see issue #108.

So these tests - for now - are xfails.
lu-pl added a commit that referenced this issue Jan 21, 2025
As indicated in the docstring for ModelBindingsMapper, the class is
somewhat coupled to SPARQLModelAdapter, because ModelBindinsMapper
does not run model sanity by itself - sanity checking should happen in
SPARQLModelAdapter, i.e. as early as possible. See issue #108. Once
sanity checking is implemented, RDFProxy will make a public
ModelBindingsMapper class available which will run model sanity
checking itself.
lu-pl added a commit that referenced this issue Jan 21, 2025
Some model sanity checking was implemented in the old
ModelBindingsMapper, all checking should be done in a dedicated model
sanity checking pipeline, see issue #108.

So these tests - for now - are xfails.
lu-pl added a commit that referenced this issue Jan 21, 2025
As indicated in the docstring for ModelBindingsMapper, the class is
somewhat coupled to SPARQLModelAdapter, because ModelBindinsMapper
does not run model sanity by itself - sanity checking should happen in
SPARQLModelAdapter, i.e. as early as possible. See issue #108. Once
sanity checking is implemented, RDFProxy will make a public
ModelBindingsMapper class available which will run model sanity
checking itself.
lu-pl added a commit that referenced this issue Jan 22, 2025
Some model sanity checking was implemented in the old
ModelBindingsMapper, all checking should be done in a dedicated model
sanity checking pipeline, see issue #108.

So these tests - for now - are xfails.
lu-pl added a commit that referenced this issue Jan 22, 2025
As indicated in the docstring for ModelBindingsMapper, the class is
somewhat coupled to SPARQLModelAdapter, because ModelBindinsMapper
does not run model sanity by itself - sanity checking should happen in
SPARQLModelAdapter, i.e. as early as possible. See issue #108. Once
sanity checking is implemented, RDFProxy will make a public
ModelBindingsMapper class available which will run model sanity
checking itself.
lu-pl added a commit that referenced this issue Jan 22, 2025
Some model sanity checking was implemented in the old
ModelBindingsMapper, all checking should be done in a dedicated model
sanity checking pipeline, see issue #108.

So these tests - for now - are xfails.
lu-pl added a commit that referenced this issue Jan 22, 2025
As indicated in the docstring for ModelBindingsMapper, the class is
somewhat coupled to SPARQLModelAdapter, because ModelBindinsMapper
does not run model sanity by itself - sanity checking should happen in
SPARQLModelAdapter, i.e. as early as possible. See issue #108. Once
sanity checking is implemented, RDFProxy will make a public
ModelBindingsMapper class available which will run model sanity
checking itself.
lu-pl added a commit that referenced this issue Jan 22, 2025
Some model sanity checking was implemented in the old
ModelBindingsMapper, all checking should be done in a dedicated model
sanity checking pipeline, see issue #108.

So these tests - for now - are xfails.
lu-pl added a commit that referenced this issue Jan 22, 2025
As indicated in the docstring for ModelBindingsMapper, the class is
somewhat coupled to SPARQLModelAdapter, because ModelBindinsMapper
does not run model sanity by itself - sanity checking should happen in
SPARQLModelAdapter, i.e. as early as possible. See issue #108. Once
sanity checking is implemented, RDFProxy will make a public
ModelBindingsMapper class available which will run model sanity
checking itself.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants