-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
otk/utils: introduce HiddenAttr* objects #271
base: main
Are you sure you want to change the base?
Conversation
9a02aec
to
e80d13a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is very cool, my main use-case for this would be to have better errors, we could be super precise with this, I played a bit and we could have e.g. in test/data/error/03-lonely-keys.err
-otk file contains top-level keys ['targetless'] without a target
+otk file contains top-level keys 'targetless' at test/data/error/03-lonely-keys.yaml:6 without a target
I looked into if we can avoid the wrappers and use a side-store with references to the id() but that idea does not work for us as we manipulate/copy/transform the resulting objects so new python objects are created all the time and the id() mapping breaks down.
I would love a broken out version of this that does not use "explain" yet but just adds the annotations as it would be a great way to improve the error reporting.
The general idea is good and I agree with @mvo5 that the primary usecase for this is user friendliness through good error reporting. Since this is a pre-draft I'll just say two things about the code since it's likely to change. I'd like it more if this was called 'Annotation'; I also don't think we need the |
@supakeen I thought the two classes are needed as I wanted them to really act like list and dict (aso for isinstance…) |
Let's not get into this deeply now; it is possible to do this with a single |
5b69c07
to
01cc626
Compare
I will definitely refine and remove the |
01cc626
to
3d6103c
Compare
3d6103c
to
1979839
Compare
I think keeping the example error message makes sense as a starting point |
(btw. I can not reproduce the Update: now I can :-| … they behave differently when run in an ubuntu container 🤷♂️, which has an older python inside by default :-/ |
c8e2702
to
abc2329
Compare
The Objects represent a dictionary and a list but with hidden attributes to be able to track the source of the data from the originating yaml file.
use the new hidden attributes for nicer error messages showing the source of the problem
abc2329
to
7b9023c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is now on non-draft state, some remarks from me before I look into the code more deeply:
- Let's maybe call
.utils
->.annotation
. - Let's maybe call
HiddenAttrDict
->AnnotatedDict
. - Let's maybe call
HiddenAttrList
->AnnotatedList
.
I'll think about consolidation and sharing of protocol between those two types in the morning.
def _add_hidden_attributes(obj, key, key_data): | ||
line_number = key_data.start_mark.line + 1 | ||
column = key_data.start_mark.column + 1 | ||
filename = os.path.relpath(key_data.start_mark.name, os.path.curdir) | ||
obj.set_attribute(key, "src", f"{filename}:{line_number}") | ||
obj.set_attribute(key, "filename", filename) | ||
obj.set_attribute(key, "linenumber", line_number) | ||
obj.set_attribute(key, "column", column) | ||
|
||
|
||
def hidden_attr_dict_constructor(loader, node): | ||
data = loader.construct_mapping(node, deep=True) | ||
hidden_data_node = HiddenAttrDict(data) | ||
for key, _ in hidden_data_node.items(): | ||
key_data = _find_data(node.value, key) | ||
_add_hidden_attributes(hidden_data_node, key, key_data) | ||
return hidden_data_node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This (_add_hidden_attributes
) feels like an @classmethod
's for their respective AnnotatedType
's.
hidden_data_node = HiddenAttrList(data) | ||
_add_hidden_attributes(hidden_data_node, "self", node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, this could be HiddenAttrList.with_hidden_attributes(data, hidden_data_node, "self", node)
or such if the above as a classmethod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was that the HiddenAttr*
objects are generic and this is a usecase specific type/usage
That's why it's in transform.py
but for sure transform.py
looks nicer when this is a classmethod
|
||
|
||
class HiddenAttrList(list): | ||
def __init__(self, *args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not have __setitem__
/__getitem__
and the slice dunders here to mirror how stuff is done in the dict variant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is a very interesting feature. I added some suggestins inline, I don't really have proper time so it's all a bit terse. One thing I think we should do is have some more errors converted with tests, I suspec at least:
def resolve_list(ctx: Context, state: State, tree: list[Any]) -> list[Any]:
"""Resolving a list means applying the resolve function to each element in
the list."""
log.debug("resolving list %r", tree)
return [resolve(ctx, state, val) for val in tree]
will not (yet) work because here the list compreshension will mean we loose the information from "tree" which used to be a annotated list
@@ -49,10 +50,11 @@ def ensure(cls, deserialized_data: dict[str, Any]) -> None: | |||
raise ParseVersionError(f"omnifest must contain a key by the name of {NAME_VERSION!r}") | |||
|
|||
# no toplevel keys without a target or an otk directive | |||
targetless_keys = [key for key in deserialized_data | |||
targetless_keys = [f" * \"{key}\" in {deserialized_data.get_attribute(key, 'src')}" for key in deserialized_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe split the lines to make this easier to read:
targetless_keys = [
f" * \"{key}\" in {deserialized_data.get_attribute(key, 'src')}"
for key in deserialized_data
if not key.startswith(PREFIX)]
line_number = key_data.start_mark.line + 1 | ||
column = key_data.start_mark.column + 1 | ||
filename = os.path.relpath(key_data.start_mark.name, os.path.curdir) | ||
obj.set_attribute(key, "src", f"{filename}:{line_number}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we plan to use more then "src" (and/or expand with column as well)? If not, maybe we just go YAGNI and remove the other ones for now and (re)add them if we need them. Or do we have a concrete use-case already?
Alternaively "src" (or "origin") could become a class/dataclass so all the information is in a single abstraction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there might be more like "applied operations" or "order or operations", not sure which ones. Will depend on the details of error messages
from typing import TypeVar, Generic | ||
|
||
|
||
def _deep_convert(data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both the filename and the name of this function are very generic, maybe either rename to "deep_convert_hidden" or move to "annotations.py" (and then the name is fine as it's in a specific file)
for idx, value in enumerate(self): | ||
self[idx] = _deep_convert(value) | ||
|
||
def set_attribute(self, idx, attr_key, value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder_ if we should go YAGNI here too, our use-case right now is to how the orgin/src of a key, so this could be just "set_origin()" (or "set_src").
With that the usage would look something like this:
targetless_keys = [
f" * \"{key}\" in {deserialized_data.get_origin(key)}
for key in deserialized_data
if not key.startswith(PREFIX)]
origin could still be a class/dataclass if we want fine grained control over the details. Same for get below.
return self[item] | ||
|
||
|
||
class HiddenAttrDictJSONEncoder(json.JSONEncoder): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO this should not be part of this PR, AFACIT it is not used or tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, can be removed, is used in follow up commits which are not yet PRed ;)
assert obj.sub.get_attribute("x", "src") == "hidden_source_for_x" | ||
|
||
# Nested Structure | ||
nested_obj = HiddenAttrDict(a="level_1", b={"level_2": {"level_3": "deep_value"}}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's expand the nesting a bit more so that both a list/dict are part of the nesting
|
||
def _deep_convert(data): | ||
"""Recursively convert nested dicts and lists into HiddenAttrDict and HiddenAttrList.""" | ||
if isinstance(data, (HiddenAttrDict, HiddenAttrList)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is maybe a silly question - but why do we need this? The yaml parser will build the nesting for us AFAICT (i.e. everywhere we need the HiddenAttr* clases we have them via yaml constructions) - if I (naviely) remove this helper it, nothing in the tests break (except the one test that is designed to test this). So what am I missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was actually ment for places in the code just working with dict and not caring about the types. e.g. initializing the class with a full dict (I thought this will be needed, but I could also stash it somewhere and pull out once really needed (other than tests which I thought make sense to make the class robust)
@mvo5 I'm fully for it - but wanted to keep the PR small really just introducing the class and one example. Changing all error messages in later PRs - but if you like it, I'll introduce the class and change all error messages in this PR |
Sorry, I think I was wrong here, we probably just need some tests like: import os
from otk.context import CommonContext
from otk.transform import process_include
from otk.traversal import State
TEST_YAML = """\
otk.version: 1
otk.target.osbuild:
list:
- 1
- 2
dict:
nested_dict:
foo: bar
"""
def test_annoate(tmp_path):
test_yaml_path = tmp_path / "test.yaml"
test_yaml_path.write_text(TEST_YAML)
ctx = CommonContext(target_requested="osbuild")
state = State()
tree = process_include(ctx, state, test_yaml_path)
assert "test.yaml:1" in tree.get_attribute("otk.version", "src")
assert "test.yaml:4" in tree["otk.target.osbuild"].get_attribute("list", "src")
assert "test.yaml:7" in tree["otk.target.osbuild"].get_attribute("dict", "src")
assert "test.yaml:8" in tree["otk.target.osbuild"]["dict"].get_attribute("nested_dict", "src")
assert "test.yaml:4" in tree["otk.target.osbuild"]["list"].get_attribute(0, "src") so validate some of the more complex structure, then we don't need to touch anything in the tests (those are probably something for later anyway because the error.py:OTKError already takes an optional state, if we pass an optional "Src/Oigin" class instead we would have only minimal work and get all errors "nice" almost for free (obviously that would be followups). I hope I am making sense. |
I'll rework this PR but we would just change the type, no extra parameter needed. |
The now objects represent a
dict
and alist
but with hidden attributes to be able to
track the source of the data from the originating
yaml file.
Within this PR there is also a first example of the usage within an error message. (more to come in following PRs)