otk/utils: introduce HiddenAttr* objects #271

schuellerf · 2024-10-07T19:06:17Z

The now objects represent a dict and a list
but with hidden attributes to be able to
track the source of the data from the originating
yaml file.

Within this PR there is also a first example of the usage within an error message. (more to come in following PRs)

mvo5

I think this is very cool, my main use-case for this would be to have better errors, we could be super precise with this, I played a bit and we could have e.g. in test/data/error/03-lonely-keys.err

-otk file contains top-level keys ['targetless'] without a target
+otk file contains top-level keys 'targetless' at test/data/error/03-lonely-keys.yaml:6 without a target

I looked into if we can avoid the wrappers and use a side-store with references to the id() but that idea does not work for us as we manipulate/copy/transform the resulting objects so new python objects are created all the time and the id() mapping breaks down.

I would love a broken out version of this that does not use "explain" yet but just adds the annotations as it would be a great way to improve the error reporting.

supakeen · 2024-10-08T08:21:40Z

The general idea is good and I agree with @mvo5 that the primary usecase for this is user friendliness through good error reporting.

Since this is a pre-draft I'll just say two things about the code since it's likely to change. I'd like it more if this was called 'Annotation'; I also don't think we need the set_attribute, get_attribute it seems to me that a single object implementing the appropriate dunders should be enough to wrap any 'complex' (I think that's the YAML terminology?) type.

schuellerf · 2024-10-08T09:32:24Z

@supakeen I thought the two classes are needed as I wanted them to really act like list and dict (aso for isinstance…)
so the code would need really minor changes only where needed.

supakeen · 2024-10-08T09:53:28Z

Let's not get into this deeply now; it is possible to do this with a single Annotated class but let's first see everything it'd need to support.

schuellerf · 2024-10-08T10:43:17Z

I will definitely refine and remove the # DEBUG - TBD: DELETE ME but it now contains an example error message @mvo5 please check

schuellerf · 2024-10-08T10:49:18Z

@mvo5 @supakeen should I clean this up (remove the --explain but keep the test/data/error/03-lonely-keys.err example to get the class merged and continue in separate PRs for more error message improvements and the real --explain functionality?

schuellerf · 2024-10-08T10:50:11Z

I think keeping the example error message makes sense as a starting point

schuellerf · 2024-10-08T11:11:40Z

(btw. I can not reproduce the mypy and pylint errors locally - I'll continue investigation)

Update: now I can :-| … they behave differently when run in an ubuntu container 🤷‍♂️, which has an older python inside by default :-/

The Objects represent a dictionary and a list but with hidden attributes to be able to track the source of the data from the originating yaml file.

use the new hidden attributes for nicer error messages showing the source of the problem

supakeen

Since this is now on non-draft state, some remarks from me before I look into the code more deeply:

Let's maybe call .utils -> .annotation.
Let's maybe call HiddenAttrDict -> AnnotatedDict.
Let's maybe call HiddenAttrList -> AnnotatedList.

I'll think about consolidation and sharing of protocol between those two types in the morning.

supakeen · 2024-10-08T16:08:22Z

src/otk/transform.py

+def _add_hidden_attributes(obj, key, key_data):
+    line_number = key_data.start_mark.line + 1
+    column = key_data.start_mark.column + 1
+    filename = os.path.relpath(key_data.start_mark.name, os.path.curdir)
+    obj.set_attribute(key, "src", f"{filename}:{line_number}")
+    obj.set_attribute(key, "filename", filename)
+    obj.set_attribute(key, "linenumber", line_number)
+    obj.set_attribute(key, "column", column)
+
+
+def hidden_attr_dict_constructor(loader, node):
+    data = loader.construct_mapping(node, deep=True)
+    hidden_data_node = HiddenAttrDict(data)
+    for key, _ in hidden_data_node.items():
+        key_data = _find_data(node.value, key)
+        _add_hidden_attributes(hidden_data_node, key, key_data)
+    return hidden_data_node


This (_add_hidden_attributes) feels like an @classmethod's for their respective AnnotatedType's.

supakeen · 2024-10-08T16:13:37Z

src/otk/transform.py

+    hidden_data_node = HiddenAttrList(data)
+    _add_hidden_attributes(hidden_data_node, "self", node)


For example, this could be HiddenAttrList.with_hidden_attributes(data, hidden_data_node, "self", node) or such if the above as a classmethod.

The idea was that the HiddenAttr* objects are generic and this is a usecase specific type/usage
That's why it's in transform.py but for sure transform.py looks nicer when this is a classmethod

supakeen · 2024-10-08T16:15:39Z

src/otk/utils.py

+
+
+class HiddenAttrList(list):
+    def __init__(self, *args, **kwargs):


Why not have __setitem__/__getitem__ and the slice dunders here to mirror how stuff is done in the dict variant?

mvo5

Thanks, this is a very interesting feature. I added some suggestins inline, I don't really have proper time so it's all a bit terse. One thing I think we should do is have some more errors converted with tests, I suspec at least:

def resolve_list(ctx: Context, state: State, tree: list[Any]) -> list[Any]:
    """Resolving a list means applying the resolve function to each element in
    the list."""

    log.debug("resolving list %r", tree)

    return [resolve(ctx, state, val) for val in tree]

will not (yet) work because here the list compreshension will mean we loose the information from "tree" which used to be a annotated list

mvo5 · 2024-10-09T09:35:59Z

src/otk/document.py

@@ -49,10 +50,11 @@ def ensure(cls, deserialized_data: dict[str, Any]) -> None:
            raise ParseVersionError(f"omnifest must contain a key by the name of {NAME_VERSION!r}")

        # no toplevel keys without a target or an otk directive
-        targetless_keys = [key for key in deserialized_data
+        targetless_keys = [f" * \"{key}\" in {deserialized_data.get_attribute(key, 'src')}" for key in deserialized_data


Maybe split the lines to make this easier to read:

targetless_keys = [ f" * \"{key}\" in {deserialized_data.get_attribute(key, 'src')}" for key in deserialized_data if not key.startswith(PREFIX)]

mvo5 · 2024-10-09T09:38:54Z

src/otk/transform.py

+    line_number = key_data.start_mark.line + 1
+    column = key_data.start_mark.column + 1
+    filename = os.path.relpath(key_data.start_mark.name, os.path.curdir)
+    obj.set_attribute(key, "src", f"{filename}:{line_number}")


Do we plan to use more then "src" (and/or expand with column as well)? If not, maybe we just go YAGNI and remove the other ones for now and (re)add them if we need them. Or do we have a concrete use-case already?

Alternaively "src" (or "origin") could become a class/dataclass so all the information is in a single abstraction?

I think there might be more like "applied operations" or "order or operations", not sure which ones. Will depend on the details of error messages

mvo5 · 2024-10-09T09:40:44Z

src/otk/utils.py

+from typing import TypeVar, Generic
+
+
+def _deep_convert(data):


Both the filename and the name of this function are very generic, maybe either rename to "deep_convert_hidden" or move to "annotations.py" (and then the name is fine as it's in a specific file)

mvo5 · 2024-10-09T09:47:25Z

src/otk/utils.py

+        for idx, value in enumerate(self):
+            self[idx] = _deep_convert(value)
+
+    def set_attribute(self, idx, attr_key, value):


I wonder_ if we should go YAGNI here too, our use-case right now is to how the orgin/src of a key, so this could be just "set_origin()" (or "set_src").

With that the usage would look something like this:

targetless_keys = [ f" * \"{key}\" in {deserialized_data.get_origin(key)} for key in deserialized_data if not key.startswith(PREFIX)]

origin could still be a class/dataclass if we want fine grained control over the details. Same for get below.

mvo5 · 2024-10-09T09:48:20Z

src/otk/utils.py

+        return self[item]
+
+
+class HiddenAttrDictJSONEncoder(json.JSONEncoder):


IMHO this should not be part of this PR, AFACIT it is not used or tested.

yes, can be removed, is used in follow up commits which are not yet PRed ;)

mvo5 · 2024-10-09T09:50:15Z

test/test_utils.py

+    assert obj.sub.get_attribute("x", "src") == "hidden_source_for_x"
+
+    # Nested Structure
+    nested_obj = HiddenAttrDict(a="level_1", b={"level_2": {"level_3": "deep_value"}})


let's expand the nesting a bit more so that both a list/dict are part of the nesting

mvo5 · 2024-10-09T10:08:05Z

src/otk/utils.py

+
+def _deep_convert(data):
+    """Recursively convert nested dicts and lists into HiddenAttrDict and HiddenAttrList."""
+    if isinstance(data, (HiddenAttrDict, HiddenAttrList)):


This is maybe a silly question - but why do we need this? The yaml parser will build the nesting for us AFAICT (i.e. everywhere we need the HiddenAttr* clases we have them via yaml constructions) - if I (naviely) remove this helper it, nothing in the tests break (except the one test that is designed to test this). So what am I missing?

This was actually ment for places in the code just working with dict and not caring about the types. e.g. initializing the class with a full dict (I thought this will be needed, but I could also stash it somewhere and pull out once really needed (other than tests which I thought make sense to make the class robust)

schuellerf · 2024-10-09T10:20:26Z

One thing I think we should do is have some more errors converted with tests

@mvo5 I'm fully for it - but wanted to keep the PR small really just introducing the class and one example. Changing all error messages in later PRs - but if you like it, I'll introduce the class and change all error messages in this PR

mvo5 · 2024-10-09T11:29:07Z

One thing I think we should do is have some more errors converted with tests

@mvo5 I'm fully for it - but wanted to keep the PR small really just introducing the class and one example. Changing all error messages in later PRs - but if you like it, I'll introduce the class and change all error messages in this PR

Sorry, I think I was wrong here, we probably just need some tests like:

import os

from otk.context import CommonContext
from otk.transform import process_include
from otk.traversal import State


TEST_YAML = """\
otk.version: 1

otk.target.osbuild:
  list:
    - 1
    - 2
  dict:
   nested_dict:
    foo: bar
"""

def test_annoate(tmp_path):
    test_yaml_path = tmp_path / "test.yaml"
    test_yaml_path.write_text(TEST_YAML)
    
    ctx = CommonContext(target_requested="osbuild")
    state = State()
    tree = process_include(ctx, state, test_yaml_path)

    assert "test.yaml:1" in tree.get_attribute("otk.version", "src")
    assert "test.yaml:4" in tree["otk.target.osbuild"].get_attribute("list", "src")
    assert "test.yaml:7" in tree["otk.target.osbuild"].get_attribute("dict", "src")
    assert "test.yaml:8" in tree["otk.target.osbuild"]["dict"].get_attribute("nested_dict", "src") 
    assert "test.yaml:4" in tree["otk.target.osbuild"]["list"].get_attribute(0, "src")

so validate some of the more complex structure, then we don't need to touch anything in the tests (those are probably something for later anyway because the error.py:OTKError already takes an optional state, if we pass an optional "Src/Oigin" class instead we would have only minimal work and get all errors "nice" almost for free (obviously that would be followups). I hope I am making sense.

schuellerf · 2024-10-09T11:51:30Z

if we pass an optional "Src/Oigin"

I'll rework this PR but we would just change the type, no extra parameter needed.

schuellerf force-pushed the implement-otk-explain branch from 9a02aec to e80d13a Compare October 7, 2024 19:06

schuellerf requested review from mvo5 and supakeen October 7, 2024 19:21

mvo5 reviewed Oct 8, 2024

View reviewed changes

schuellerf force-pushed the implement-otk-explain branch from 5b69c07 to 01cc626 Compare October 8, 2024 10:42

schuellerf force-pushed the implement-otk-explain branch from 01cc626 to 3d6103c Compare October 8, 2024 10:46

schuellerf requested a review from mvo5 October 8, 2024 10:49

schuellerf force-pushed the implement-otk-explain branch from 3d6103c to 1979839 Compare October 8, 2024 10:49

schuellerf force-pushed the implement-otk-explain branch 4 times, most recently from c8e2702 to abc2329 Compare October 8, 2024 15:42

schuellerf marked this pull request as ready for review October 8, 2024 15:44

schuellerf changed the title ~~otk/utils: introduce HiddenAddr* objects~~ otk/utils: introduce HiddenAttr* objects Oct 8, 2024

schuellerf added 2 commits October 8, 2024 17:45

otk/utils: introduce HiddenAttr* objects

7406ff3

The Objects represent a dictionary and a list but with hidden attributes to be able to track the source of the data from the originating yaml file.

test: introduce new error message with file reference

7b9023c

use the new hidden attributes for nicer error messages showing the source of the problem

schuellerf force-pushed the implement-otk-explain branch from abc2329 to 7b9023c Compare October 8, 2024 15:45

supakeen reviewed Oct 8, 2024

View reviewed changes

mvo5 reviewed Oct 9, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

otk/utils: introduce HiddenAttr* objects #271

otk/utils: introduce HiddenAttr* objects #271

schuellerf commented Oct 7, 2024 •

edited

Loading

mvo5 left a comment

supakeen commented Oct 8, 2024

schuellerf commented Oct 8, 2024

supakeen commented Oct 8, 2024

schuellerf commented Oct 8, 2024

schuellerf commented Oct 8, 2024

schuellerf commented Oct 8, 2024

schuellerf commented Oct 8, 2024 •

edited

Loading

supakeen left a comment

supakeen Oct 8, 2024 •

edited

Loading

supakeen Oct 8, 2024

schuellerf Oct 8, 2024

supakeen Oct 8, 2024

mvo5 left a comment

mvo5 Oct 9, 2024

mvo5 Oct 9, 2024

schuellerf Oct 9, 2024

mvo5 Oct 9, 2024

mvo5 Oct 9, 2024

mvo5 Oct 9, 2024

schuellerf Oct 9, 2024

mvo5 Oct 9, 2024

mvo5 Oct 9, 2024

schuellerf Oct 9, 2024

schuellerf commented Oct 9, 2024 •

edited

Loading

mvo5 commented Oct 9, 2024

schuellerf commented Oct 9, 2024

		hidden_data_node = HiddenAttrList(data)
		_add_hidden_attributes(hidden_data_node, "self", node)



		class HiddenAttrList(list):
		def __init__(self, args, *kwargs):

		return self[item]


		class HiddenAttrDictJSONEncoder(json.JSONEncoder):

otk/utils: introduce HiddenAttr* objects #271

Are you sure you want to change the base?

otk/utils: introduce HiddenAttr* objects #271

Conversation

schuellerf commented Oct 7, 2024 • edited Loading

mvo5 left a comment

Choose a reason for hiding this comment

supakeen commented Oct 8, 2024

schuellerf commented Oct 8, 2024

supakeen commented Oct 8, 2024

schuellerf commented Oct 8, 2024

schuellerf commented Oct 8, 2024

schuellerf commented Oct 8, 2024

schuellerf commented Oct 8, 2024 • edited Loading

supakeen left a comment

Choose a reason for hiding this comment

supakeen Oct 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mvo5 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

schuellerf commented Oct 9, 2024 • edited Loading

mvo5 commented Oct 9, 2024

schuellerf commented Oct 9, 2024

schuellerf commented Oct 7, 2024 •

edited

Loading

schuellerf commented Oct 8, 2024 •

edited

Loading

supakeen Oct 8, 2024 •

edited

Loading

schuellerf commented Oct 9, 2024 •

edited

Loading