Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbx deploy: named_parameters dictionary gets an extraneous "named_parameters" entry when renaming file: paths #855

Open
angus-dawson-idexx opened this issue Nov 9, 2023 · 1 comment

Comments

@angus-dawson-idexx
Copy link

Expected Behavior

A deployment with dbx deploy [...] --write-specs-to-file=spec.json with a named_parameters definition like so:

named_parameters:
  conf-file: "file:fuse://path/to/some-file.yaml"

Should result in a specs file with the following named_parameters object:

"named_parameters": {
  "conf-file": "/dbfs/[...]/artifacts/path/to/some-file.yaml",
}

And only conf-file should be created as a keyword argument parameter in the workflow task on the web GUI.

Current Behavior

Instead, I get this:

"named_parameters": {
  "conf-file": "/dbfs/[...]/artifacts/path/to/some-file.yaml",
  "named_parameters": "/dbfs/[...]/artifacts/path/to/some-file.yaml"
}

And the workflow task on the web GUI ends up with two keyword argument parameters: conf-file and named_parameters.

Steps to Reproduce (for bugs)

Run a dbx deploy with a workflow that has python_wheel_task tasks with named_parameters, at least one of which has a value that starts with file:// or file:fuse://. Then check the keyword argument parameters of the task in the web GUI.

Context

I've traced the problem to this function:

def file_traverse(self, workflows, file_adjuster: FileReferenceAdjuster):
for element, parent, index in self.traverse(workflows):
if isinstance(element, str):
if element.startswith("file://") or element.startswith("file:fuse://"):
file_adjuster.adjust_file_ref(element, parent, index)

And this part in PropertyAdjuster.traverse():

if isinstance(_object, dict):
for key in list(_object.keys()):
item = _object[key]
yield item, _object, key
for _out in self.traverse(item, _object, index_in_parent):
yield _out

After yielding the correct tuple for conf-file:

('file:fuse://path/to/config.yaml', {'conf-file': 'file:fuse://path/to/config.yaml'}, 'conf-file')

It then attempts to traverse item with index_in_parent, which is named_parameters, and since item is a string, traverse jumps here and terminates:

yield _object, parent, index_in_parent

And yields essentially a duplicate tuple except with the wrong index_in_parent:

('file:fuse://path/to/config.yaml', {'conf-file': 'file:fuse://path/to/config.yaml'}, 'named_parameters')

Your Environment

  • dbx version used: 0.8.18
  • Databricks Runtime version: 12.2.x-scala2.12
@angus-dawson-idexx
Copy link
Author

If the call in line 47 is changed to self.traverse(item, _object, key), you still get a duplicate tuple, but at least the semantics of index_in_parent in that generator are correct and you just end up setting the same property twice, which is fine. I'm not sure if that ends up breaking other traversal cases though, or if there's a better fix to avoid visiting that element twice in the first place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant