-
Notifications
You must be signed in to change notification settings - Fork 6
Conversation
I was creating a new task for Cirrus, so I decided to just expand on the Task class here and get a working version. I've also added a task example in the tests directory that show a basic copy task, now far simpler than having all the previous boilerplate. |
As you know I really like this idea. I wish I had more time to review and test out some of my thoughts, but here's a couple high-level comments:
For abstract classes like |
Good point on having another base class, although I really don't like calling it Lambda though, because this Task would be used for Batch as well. So not sure what to call the base class, but like the idea of having one and then adding a Feeder class (which also does not need to be a Lambda). |
In my mind this was only for use for tasks within a cirrus project, i.e., for things packaged as a lambda. But you're suggesting it could be used for tasks built into docker containers as well? I suppose that makes sense. Then maybe we want to have a I don't know what is best here, I just want to make sure it is super easy for a task author to ensure the task class will actually work with the default lambda definition.yml config. |
Oh I see, I was wondering what the value was with the registration. But yeah, you could use the Task class when creating a new Batch task as well, it provides the CLI for local testing and all the wrapping/fetching/parsing/payload return. We could also change the default generated definition.yml file for new tasks to not use lambda_function.lambda_handler, which as you know I've never cared for. |
I suppose cirrus could import the .py file in the component dir to inspect it to see if it has a subclass of the base (whatever we're calling it), an explicit I guess we also need to know what .py file to import; that could be handled in a similar discovery-with-error process as above. Though we also have to consider with the lambda-as-batch runner, to make sure whatever is set here is compatible with the patterns it supports. If we are actually modifying the module state on class instantiation to set a well-known Also, maybe setting |
src/cirrus/lib/task.py
Outdated
self.original_items = payload.pop('features') | ||
self.items = deepcopy(self.original_items) | ||
|
||
self.process_definition = payload.pop('process') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For compatibility with workflow chaining, I think you want to use payload.process
here, as that will figure out the current process definition in the payload for you. Otherwise you could get a list of process definitions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I say that, but it seems this attribute is used two different ways. In light of that realization, I think maybe you want something like this:
self.process_definition = payload.pop('process') | |
self.process_definition = payload.process | |
self._process_definition = payload.pop('process') |
Then in output_payload
you would use `self._process_definition.
That said, I'd be interested in learning more about the need here and what problem(s) this whole pull-it-apart-and-rebuild-later strategy solves for us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good points here.
The motivation here was to try and keep things clear for developers as to what should be mutable and what shouldn't.
The input is items (features) which we want to store the original version, but then also a separate output version of that users would modify (they would never modify the original items, it is for reference and adding source links and such). I'm also wondering if it makes sense to copy the input items by default into the output structure.
The process definition on the other hand should generally not be changed by the user (although I've hit a couple use cases where I modify the parameters for future tasks).
So basically the reason for splitting them up is because the user interacts with the Items and the Process definition in different ways. Although it probably makes sense to treat them in a similar way. Make a deepcopy of the original paylod then a copy that can be changed by the user.
I added some tests, although there are still failing tests from test_transfer unrelated to these additions |
This also introduces using VCR for testing, will open new ticket |
Co-authored-by: Pete Gadomski <[email protected]>
…b into features/task-class
Codecov Report
@@ Coverage Diff @@
## main #43 +/- ##
==========================================
- Coverage 69.77% 69.63% -0.15%
==========================================
Files 5 6 +1
Lines 622 764 +142
==========================================
+ Hits 434 532 +98
- Misses 188 232 +44
Continue to review full report at Codecov.
|
I was trying to fix the tests, started changing from mocking to vcr when I updated all the dev dependencies and the tests all work now. So ready for review so we can get this merged into main @jkeifer |
src/cirrus/lib/task.py
Outdated
_name = 'task' | ||
_description = 'A task for doing things' | ||
_version = '0.1.0' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems weird to me to default these things. Shouldn't they be required parameters on any subclasses? That said, validating that they are set on subclasses likely would require using the __new__
method on a custom metaclass. Perhaps we make an issue to follow up on this instead of trying to resolve it now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are static attributes and this was the easiest way to define them, but yes they should be required. Every base class should define them, so open to options on how to better do that.
src/cirrus/lib/task.py
Outdated
|
||
@property | ||
def process_definition(self) -> Dict: | ||
return self._payload['process'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have an incompatibility here with workflow chaining. You should use self._payload.process
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth adding a workflow chaining test case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
Test for workflow chaining test case sounds like a good idea. Not it.
This work has been replaced by new library stac-task |
This draft Task class (see #42) is derived from the Cirrus task template: https://github.com/cirrus-geo/cirrus-task-template