v0.12.0
What's new
Added 🎉
- Step resources:
- Added a
step_resources
parameter to theStep
class which should be used to describe the computational resources required to run a step.
Executor
implementations can use this information. For example, if your step needs 2 GPUs, you should set
step_resources=StepResources(gpu_count=2)
("step_resources": {"gpu_count": 2}
in the configuration language). - Added a
Step.resources()
property method. By default this returns the value specified by thestep_resources
parameter.
If your step implementation always requires the same resources, you can just override this method so you don't have to provide
thestep_resources
parameter.
- Added a
- Step execution:
- Added an
executor
field to thetango.yml
settings. You can use this to define the executor you want to use by default. - Added a Beaker
Executor
to the Beaker integration, registered as anExecutor
with the name "beaker".
To use this executor, add these lines to yourtango.yml
file:See the docs for theexecutor: type: beaker beaker_workspace: ai2/my-workspace clusters: - ai2/general-cirrascale
BeakerExecutor
for more information on the input parameters.
- Added an
- Step class:
- Added a metadata field to the step class API. This can be set through the class
variableMETADATA
or through the constructor argumentstep_metadata
.
- Added a metadata field to the step class API. This can be set through the class
- Weights & Biases integration:
- You can now change the artifact kind for step result artifacts by adding a field
called "artifact_kind" to a step's metadata.
For models, setting "artifact_kind" to "model" will add the corresponding artifact to W&B's new model zoo.
- You can now change the artifact kind for step result artifacts by adding a field
Changed ⚠️
- CLI:
- The
tango run
command will throw an error if you have uncommitted changes in your repository, unless
you use the--allow-dirty
flag. - The
tango run
command will use the lightweight base executor (single process) by default.
To use the multi-process executor, set-j/--parallelism
to 1 or higher or -1 to use all available CPU cores.
- The
Fixed ✅
- Fixed bug where
StepInfo
environment and platform metadata could be out-of-date if a step is run again due to failure. - Fixed a bug where an unfortunate combination of early stopping and decreasing model performance could result in a crash in the torch trainer.
Commits
befb00a Add workspace_metadata
arg to Step
class, allow changing artifact kind in W&B workspace (#363)
5ab1c2a Fix undefined behavior with TorchTrainStep
(#366)
bf3c1a0 Update filelock requirement from <3.8,>=3.4 to >=3.4,<3.9 (#354)
b4e48a7 Update jsonpickle requirement from <2.2.0,>=2.1.0 to >=2.1.0,<2.3.0 (#351)
1c491f0 Update wandb requirement from <0.13,>=0.12 to >=0.12,<0.14 (#350)
93d5eb4 Bump allenai/setup-beaker from 1 to 2 (#359)
dc0f89a Fix #355 - ensure git metadata is up-to-date (#361)
258e880 Raise better error msg from step_result_for_run()
(#360)
43916d1 Print debugging information about the repo used. (#353)
928aa7a Add BeakerExecutor
(#340)