Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

otk/utils: introduce HiddenAttr* objects #271

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

schuellerf
Copy link
Contributor

@schuellerf schuellerf commented Oct 7, 2024

The now objects represent a dict and a list
but with hidden attributes to be able to
track the source of the data from the originating
yaml file.

Within this PR there is also a first example of the usage within an error message. (more to come in following PRs)

Copy link
Contributor

@mvo5 mvo5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is very cool, my main use-case for this would be to have better errors, we could be super precise with this, I played a bit and we could have e.g. in test/data/error/03-lonely-keys.err

-otk file contains top-level keys ['targetless'] without a target
+otk file contains top-level keys 'targetless' at test/data/error/03-lonely-keys.yaml:6 without a target

I looked into if we can avoid the wrappers and use a side-store with references to the id() but that idea does not work for us as we manipulate/copy/transform the resulting objects so new python objects are created all the time and the id() mapping breaks down.

I would love a broken out version of this that does not use "explain" yet but just adds the annotations as it would be a great way to improve the error reporting.

@supakeen
Copy link
Member

supakeen commented Oct 8, 2024

The general idea is good and I agree with @mvo5 that the primary usecase for this is user friendliness through good error reporting.

Since this is a pre-draft I'll just say two things about the code since it's likely to change. I'd like it more if this was called 'Annotation'; I also don't think we need the set_attribute, get_attribute it seems to me that a single object implementing the appropriate dunders should be enough to wrap any 'complex' (I think that's the YAML terminology?) type.

@schuellerf
Copy link
Contributor Author

@supakeen I thought the two classes are needed as I wanted them to really act like list and dict (aso for isinstance…)
so the code would need really minor changes only where needed.

@supakeen
Copy link
Member

supakeen commented Oct 8, 2024

Let's not get into this deeply now; it is possible to do this with a single Annotated class but let's first see everything it'd need to support.

@schuellerf
Copy link
Contributor Author

I will definitely refine and remove the # DEBUG - TBD: DELETE ME but it now contains an example error message @mvo5 please check

@schuellerf
Copy link
Contributor Author

@mvo5 @supakeen should I clean this up (remove the --explain but keep the test/data/error/03-lonely-keys.err example to get the class merged and continue in separate PRs for more error message improvements and the real --explain functionality?

@schuellerf
Copy link
Contributor Author

I think keeping the example error message makes sense as a starting point

@schuellerf
Copy link
Contributor Author

schuellerf commented Oct 8, 2024

(btw. I can not reproduce the mypy and pylint errors locally - I'll continue investigation)

Update: now I can :-| … they behave differently when run in an ubuntu container 🤷‍♂️, which has an older python inside by default :-/

@schuellerf schuellerf force-pushed the implement-otk-explain branch 4 times, most recently from c8e2702 to abc2329 Compare October 8, 2024 15:42
@schuellerf schuellerf marked this pull request as ready for review October 8, 2024 15:44
@schuellerf schuellerf changed the title otk/utils: introduce HiddenAddr* objects otk/utils: introduce HiddenAttr* objects Oct 8, 2024
The Objects represent a dictionary and a list
but with hidden attributes to be able to
track the source of the data from the originating
yaml file.
use the new hidden attributes for nicer error messages
showing the source of the problem
Copy link
Member

@supakeen supakeen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is now on non-draft state, some remarks from me before I look into the code more deeply:

  • Let's maybe call .utils -> .annotation.
  • Let's maybe call HiddenAttrDict -> AnnotatedDict.
  • Let's maybe call HiddenAttrList -> AnnotatedList.

I'll think about consolidation and sharing of protocol between those two types in the morning.

Comment on lines +67 to +83
def _add_hidden_attributes(obj, key, key_data):
line_number = key_data.start_mark.line + 1
column = key_data.start_mark.column + 1
filename = os.path.relpath(key_data.start_mark.name, os.path.curdir)
obj.set_attribute(key, "src", f"{filename}:{line_number}")
obj.set_attribute(key, "filename", filename)
obj.set_attribute(key, "linenumber", line_number)
obj.set_attribute(key, "column", column)


def hidden_attr_dict_constructor(loader, node):
data = loader.construct_mapping(node, deep=True)
hidden_data_node = HiddenAttrDict(data)
for key, _ in hidden_data_node.items():
key_data = _find_data(node.value, key)
_add_hidden_attributes(hidden_data_node, key, key_data)
return hidden_data_node
Copy link
Member

@supakeen supakeen Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (_add_hidden_attributes) feels like an @classmethod's for their respective AnnotatedType's.

Comment on lines +88 to +89
hidden_data_node = HiddenAttrList(data)
_add_hidden_attributes(hidden_data_node, "self", node)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, this could be HiddenAttrList.with_hidden_attributes(data, hidden_data_node, "self", node) or such if the above as a classmethod.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was that the HiddenAttr* objects are generic and this is a usecase specific type/usage
That's why it's in transform.py but for sure transform.py looks nicer when this is a classmethod



class HiddenAttrList(list):
def __init__(self, *args, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not have __setitem__/__getitem__ and the slice dunders here to mirror how stuff is done in the dict variant?

Copy link
Contributor

@mvo5 mvo5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is a very interesting feature. I added some suggestins inline, I don't really have proper time so it's all a bit terse. One thing I think we should do is have some more errors converted with tests, I suspec at least:

def resolve_list(ctx: Context, state: State, tree: list[Any]) -> list[Any]:
    """Resolving a list means applying the resolve function to each element in
    the list."""

    log.debug("resolving list %r", tree)

    return [resolve(ctx, state, val) for val in tree]

will not (yet) work because here the list compreshension will mean we loose the information from "tree" which used to be a annotated list

@@ -49,10 +50,11 @@ def ensure(cls, deserialized_data: dict[str, Any]) -> None:
raise ParseVersionError(f"omnifest must contain a key by the name of {NAME_VERSION!r}")

# no toplevel keys without a target or an otk directive
targetless_keys = [key for key in deserialized_data
targetless_keys = [f" * \"{key}\" in {deserialized_data.get_attribute(key, 'src')}" for key in deserialized_data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe split the lines to make this easier to read:

        targetless_keys = [
            f" * \"{key}\" in {deserialized_data.get_attribute(key, 'src')}"
            for key in deserialized_data
            if not key.startswith(PREFIX)]

line_number = key_data.start_mark.line + 1
column = key_data.start_mark.column + 1
filename = os.path.relpath(key_data.start_mark.name, os.path.curdir)
obj.set_attribute(key, "src", f"{filename}:{line_number}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we plan to use more then "src" (and/or expand with column as well)? If not, maybe we just go YAGNI and remove the other ones for now and (re)add them if we need them. Or do we have a concrete use-case already?

Alternaively "src" (or "origin") could become a class/dataclass so all the information is in a single abstraction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there might be more like "applied operations" or "order or operations", not sure which ones. Will depend on the details of error messages

from typing import TypeVar, Generic


def _deep_convert(data):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both the filename and the name of this function are very generic, maybe either rename to "deep_convert_hidden" or move to "annotations.py" (and then the name is fine as it's in a specific file)

for idx, value in enumerate(self):
self[idx] = _deep_convert(value)

def set_attribute(self, idx, attr_key, value):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder_ if we should go YAGNI here too, our use-case right now is to how the orgin/src of a key, so this could be just "set_origin()" (or "set_src").

With that the usage would look something like this:

        targetless_keys = [
            f" * \"{key}\" in {deserialized_data.get_origin(key)}
            for key in deserialized_data
            if not key.startswith(PREFIX)]

origin could still be a class/dataclass if we want fine grained control over the details. Same for get below.

return self[item]


class HiddenAttrDictJSONEncoder(json.JSONEncoder):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO this should not be part of this PR, AFACIT it is not used or tested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, can be removed, is used in follow up commits which are not yet PRed ;)

assert obj.sub.get_attribute("x", "src") == "hidden_source_for_x"

# Nested Structure
nested_obj = HiddenAttrDict(a="level_1", b={"level_2": {"level_3": "deep_value"}})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's expand the nesting a bit more so that both a list/dict are part of the nesting


def _deep_convert(data):
"""Recursively convert nested dicts and lists into HiddenAttrDict and HiddenAttrList."""
if isinstance(data, (HiddenAttrDict, HiddenAttrList)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is maybe a silly question - but why do we need this? The yaml parser will build the nesting for us AFAICT (i.e. everywhere we need the HiddenAttr* clases we have them via yaml constructions) - if I (naviely) remove this helper it, nothing in the tests break (except the one test that is designed to test this). So what am I missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was actually ment for places in the code just working with dict and not caring about the types. e.g. initializing the class with a full dict (I thought this will be needed, but I could also stash it somewhere and pull out once really needed (other than tests which I thought make sense to make the class robust)

@schuellerf
Copy link
Contributor Author

schuellerf commented Oct 9, 2024

One thing I think we should do is have some more errors converted with tests

@mvo5 I'm fully for it - but wanted to keep the PR small really just introducing the class and one example. Changing all error messages in later PRs - but if you like it, I'll introduce the class and change all error messages in this PR

@mvo5
Copy link
Contributor

mvo5 commented Oct 9, 2024

One thing I think we should do is have some more errors converted with tests

@mvo5 I'm fully for it - but wanted to keep the PR small really just introducing the class and one example. Changing all error messages in later PRs - but if you like it, I'll introduce the class and change all error messages in this PR

Sorry, I think I was wrong here, we probably just need some tests like:

import os

from otk.context import CommonContext
from otk.transform import process_include
from otk.traversal import State


TEST_YAML = """\
otk.version: 1

otk.target.osbuild:
  list:
    - 1
    - 2
  dict:
   nested_dict:
    foo: bar
"""

def test_annoate(tmp_path):
    test_yaml_path = tmp_path / "test.yaml"
    test_yaml_path.write_text(TEST_YAML)
    
    ctx = CommonContext(target_requested="osbuild")
    state = State()
    tree = process_include(ctx, state, test_yaml_path)

    assert "test.yaml:1" in tree.get_attribute("otk.version", "src")
    assert "test.yaml:4" in tree["otk.target.osbuild"].get_attribute("list", "src")
    assert "test.yaml:7" in tree["otk.target.osbuild"].get_attribute("dict", "src")
    assert "test.yaml:8" in tree["otk.target.osbuild"]["dict"].get_attribute("nested_dict", "src") 
    assert "test.yaml:4" in tree["otk.target.osbuild"]["list"].get_attribute(0, "src")

so validate some of the more complex structure, then we don't need to touch anything in the tests (those are probably something for later anyway because the error.py:OTKError already takes an optional state, if we pass an optional "Src/Oigin" class instead we would have only minimal work and get all errors "nice" almost for free (obviously that would be followups). I hope I am making sense.

@schuellerf
Copy link
Contributor Author

if we pass an optional "Src/Oigin"

I'll rework this PR but we would just change the type, no extra parameter needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants