Dependency handling for electrons #661
Replies: 7 comments 17 replies
-
Just noting here that this will involve changes to the DB schema. We'll need to have a broader discussion about how we're storing electron metadata in the DB and object store. |
Beta Was this translation helpful? Give feedback.
-
@kessler-frost / one quick thing thats missing in this class is the dependency on local import files. This is majorly allowed by simply passing the imported module which can be passed by value to cloudpickle. (refer cloudpickle Readme) |
Beta Was this translation helpful? Give feedback.
-
This design doc is really well written. I just have a few questions.
def new_func(*args, **kwargs):
deps_object_1.apply()
deps_object_2.apply()
task(*args, **kwargs)
deps_object_3.apply() (This implementation wouldn't actually work if the deps need to be installed before unpickling the task, but one might imagine introducing "dep" electrons, which would fit into the Transport Graph framework.) There a few considerations here: readability, ease of use, and ease of reasoning about the code. The latter is especially important when code doesn't run the way one expects, and error handling would be most straightforward if the user issues the setup commands explicitly.
|
Beta Was this translation helpful? Give feedback.
-
Do we expect the Deps classes to be cloud-picklable? |
Beta Was this translation helpful? Give feedback.
-
For the implementation, we need to distinguish between two types of deps. Some classes of Deps can be applied in a separate task before the actual task; these can be packaged in "Dep" electrons and marked as a dependency of the core electron. But others only have an effect if applied in the same session as the actual task and need special handling by the executor. To illustrate, recall that the SSH and Slurm executors runs each task through a series of noninteractive SSH sessions:
Installing pip packages into a venv/condaenv can be done in a separate SSH session before Step 1. Other setup commands, however, only persist for one shell session. These must be run in the same session as Step 2. For example, any environment variables must be set at the beginning of Step 2 since they will be unset in the next SSH session. To apply the latter kind of deps, the |
Beta Was this translation helpful? Give feedback.
-
Mutating the state of the executor's environment outside of the user's explicitly defined tasks can introduce some new and potentially subtle failure modes:
We need to figure out how to handle these in a transparent way. |
Beta Was this translation helpful? Give feedback.
-
Finally closed by #1876 |
Beta Was this translation helpful? Give feedback.
-
Deps Design Document
Terminology
task
- electron decorated function by the userbackend environment
- the environment where thetask
function is actually executed, e.g an aws instance or inside a slurm clustercall_before
andcall_after
- functions to be called beforetask
and aftertask
respectively in the samebackend environment
partial
- a function decorated with thefunctools.partial
decorator (link) which basically “freezes” the functionProblems
Dependency features that are not available right now in a simpler more direct way:
call_before
andcall_after
function executions. It’ll basically mean packing all three types of these functions into 1 function and calling that instead.pip
,conda
,module
package dependency installation on the backend environment.Proposal
UX - Explicit
UX - Implicit conversion to objects
Deps Class
This will be the parent class for any kind of dependency. For implementing any new type of dependency we’ll have to subclass this and override the
__init__()
andapply()
methods.__init__()
:Deps
object with given variables.Deps
object.apply()
:self.*
kind of variables, i.e, internal variables local to the object assigned at initialization.build_graph
time by the electronDeps
subclasses, or the user wants to have a customDeps
, they can easily do it without having to worry about argument managementpartial
function runnable without having thisDeps
object available on the backend environmentInitial Deps SubClasses
PipDeps
:pip install
hence even version number can be provided as “numpy==0.23”requirements.txt
file path can also be givenapply()
methodCondaDeps
:environment.yml
file for creating a new conda environmentModuleDeps
:module
package managermodule
EnvDeps
:BashDeps
:CallDeps
:func
func
call_before
list andcall_after
list of ordered executableDeps
objects, e.g:call_before=[CallDeps(a, args=(1, 2))], CallDeps(a, args=(3, 4))]
a(1, 2)
function will be run beforea(3, 4)
Special Case of
ImportDeps
or any such composite “Deps”:Deps
but just a proxy for UX to handle multiple package installation dependencies togetherPipDeps
objectCondaDeps
objectModuleDeps
objectDeps
object, e.g:pip
metadata field will be assigned theimport_deps_object.pip
value, etc.__new__()
method of this classSome things worth mentioning:
apply()
methods will be run every time for every electron, so if some dependencies are already present in the environment then we shouldn’t try to install/download them in there - the creator of theDeps
subclass should keep this in mind when writing theapply()
methodapply()
in a way where the backend environment does not need theDeps
object to be there hence no covalent dependency should be thereapply()
functions is intentionally avoided but can be implemented if need be, although some thought should be given when deciding that as to why exactly is that needed.Beta Was this translation helpful? Give feedback.
All reactions