-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why is workload_requires needed? #182
Comments
*Updated the url to example |
Hi @solo-driven , in general, To understand this concept, one should distinguish between local and remote workflows (those that can submit jobs to (e.g.) batch systems), that work slightly differently in the way they initiate their branch tasks. For this, it is imperative to differentiate between the Remote workflows have a Usually, before jobs can be submitted, one needs to make sure that certain conditions are met, e.g., that certain software is pre-bundled and provided to the batch system (for those that need that). This is exactly where the Local workflows often don't need these extra dependencies that ensure that branch tasks can be run, since you're already in the correct environment. However, you are free to declare them regardless if it fits your use case. There is even a parameter predefined on all workflows, Side note: have a look at how local workflows trigger their branch tasks. There are two options: declare as dependency, or yield as dynamic dependencies (which is a luigi pattern). That being said, all your example cases are valid and the actual decision of what you declare as a workflow requirement is a design choice you are free to make. |
But why did you use workflow_requires for branches manipulation in that example? As you said it is for controlling the dependency of the whole workflow. Like setting up an environment. (I read your last comment, so probably it is not a best example for it?) Also I noticed that controlling |
Yeah, it probably is not a good example. The linked task is the proxy that lives underneath the workflow and that implements the actual run(), requires() and output() methods that take effect in case a task is a workflow (
The |
The last question. And when I specify |
The workflow itself will count as a single yet separate task in the tree whose only "payload" is to trigger its branch tasks (either via static or dynamic requirements). All branch tasks will be distributed across |
Question
ALL EXAMPLES ARE RUN LOCALLY
in the example for htcondor (https://github.com/riga/law/blob/master/examples/sequential_htcondor_at_cern/analysis/tasks.py)
workload_requires
is being used and results in the following graph:Scheduled 45 tasks of which:
But when I just comment it I get the following more clearer graph:
Scheduled 39 tasks of which:
Does this change anything? Other than number of tasks decreases when no workflow_requirements is not provided from 45 to 39
In addition it is also possible by changing reruires and run to:
to obtain the following graph:
And finally the result which I was expecting to see:
can be done by changing the CreateFullAlphabet:
I would really appreciate if you could help me with that, struggled a lot with this trying to find the reason for
workload_requires
. Thank you for readingThe text was updated successfully, but these errors were encountered: