Releases: allenai/tango
v0.13.0
What's new
Added 🎉
- You can now reference into a particular index of the result of another step in a config. For example:
{type: "ref", ref: "some_previous_step", key: 0}
.
The key field can be an integer if the result of the referenced step is a list or tuple, or a string if the result of the referenced step is a dictionary. - Added
priority
parameter to Beaker executor for setting the default task priority for Beaker jobs. - Added
Workspace.step_result()
method for getting a step's result from the latest
run. tango run
will now display a URL to the logs for failed steps when you use theBeakerExecutor
.
Changed ⚠️
- The
TorchTrainStep
now enables monitoring arbitrary model outputs during training.TorchTrainEngine.forward_train
now returns a tupleloss, model_outputs
for each micro batch and the list of model outputs for all micro batches in a batch is passed to theTrainCallback.log_batch
andTrainCallback.post_batch
. - Tango will now automatically search Python modules in the current working directory
for registered classes so that you don't always need to use the--include-package
setting. - The minimum supported Python version is now 3.8.
- Added support for PyTorch Lightning 1.7.x
- The Beaker Executor will no-longer live-stream logs from Beaker jobs, but logs will be viewable on Beaker and more readable.
- Only the Beaker executor requires a clean working directory
Fixed ✅
- Fixed a bug that did not allow a wandb artifact's type to be set from a step's metadata dictionary.
- Fixed a bug with how the Beaker executor streams log lines from Beaker which sometimes resulted in messages missing some starting characters, and tqdm lines being duplicated.
- Fixed a bug in the Beaker workspace where the lock dataset wouldn't be removed if the step
was found to be in an invalid state. - Improved cluster choice logic in
BeakerExecutor
to ensure greater diversity of clusters when submitting many steps at once. - Fixed bug where sub-processes of the multicore executor would use the wrong executor if
executor
was defined in atango.yml
file.
Commits
4f89d55 Improve Beaker cluster choice logic (#392)
e1ceae2 Display URL to logs for failed steps (#390)
3dc9591 Bump black from 22.6.0 to 22.8.0 (#380)
c9ce257 Catch when Beaker experiments are stopped (#389)
0fe12e9 Fix issues with WandbWorkspace causing CI crash (#388)
342eb26 Keep parameters in Params
objects to make error messages more readable (#375)
92f0354 Simplified beaker logging (#383)
fd9d3cc Only the Beaker executor needs clean working directories (#373)
06f26ae Update wandb artifact type (#378)
f6a6b70 Update base images, get us out of the latest infinite loop of pip madness (#382)
306986b Catch all errors when attempting log record decode (#379)
628caff Allowing indexing into step results in config (#371)
858cef8 Minor improvement to Beaker logging (#377)
7a5619e Add Workspace.step_result()
method (#374)
0750d76 Fix bugs with how BeakerExecutor
streams logs (#372)
6e8b107 Detailed train outputs (#369)
bcd50d8 Update pytorch-lightning requirement from <1.7,>=1.6 to >=1.6,<1.8 (#349)
8ed0c86 Bump fairscale from 0.4.6 to 0.4.8 (#347)
62f2746 Python minimum version is 3.8 (#368)
45e02fe Auto import local Python modules when searching for registered classes (#367)
v0.12.0
What's new
Added 🎉
- Step resources:
- Added a
step_resources
parameter to theStep
class which should be used to describe the computational resources required to run a step.
Executor
implementations can use this information. For example, if your step needs 2 GPUs, you should set
step_resources=StepResources(gpu_count=2)
("step_resources": {"gpu_count": 2}
in the configuration language). - Added a
Step.resources()
property method. By default this returns the value specified by thestep_resources
parameter.
If your step implementation always requires the same resources, you can just override this method so you don't have to provide
thestep_resources
parameter.
- Added a
- Step execution:
- Added an
executor
field to thetango.yml
settings. You can use this to define the executor you want to use by default. - Added a Beaker
Executor
to the Beaker integration, registered as anExecutor
with the name "beaker".
To use this executor, add these lines to yourtango.yml
file:See the docs for theexecutor: type: beaker beaker_workspace: ai2/my-workspace clusters: - ai2/general-cirrascale
BeakerExecutor
for more information on the input parameters.
- Added an
- Step class:
- Added a metadata field to the step class API. This can be set through the class
variableMETADATA
or through the constructor argumentstep_metadata
.
- Added a metadata field to the step class API. This can be set through the class
- Weights & Biases integration:
- You can now change the artifact kind for step result artifacts by adding a field
called "artifact_kind" to a step's metadata.
For models, setting "artifact_kind" to "model" will add the corresponding artifact to W&B's new model zoo.
- You can now change the artifact kind for step result artifacts by adding a field
Changed ⚠️
- CLI:
- The
tango run
command will throw an error if you have uncommitted changes in your repository, unless
you use the--allow-dirty
flag. - The
tango run
command will use the lightweight base executor (single process) by default.
To use the multi-process executor, set-j/--parallelism
to 1 or higher or -1 to use all available CPU cores.
- The
Fixed ✅
- Fixed bug where
StepInfo
environment and platform metadata could be out-of-date if a step is run again due to failure. - Fixed a bug where an unfortunate combination of early stopping and decreasing model performance could result in a crash in the torch trainer.
Commits
befb00a Add workspace_metadata
arg to Step
class, allow changing artifact kind in W&B workspace (#363)
5ab1c2a Fix undefined behavior with TorchTrainStep
(#366)
bf3c1a0 Update filelock requirement from <3.8,>=3.4 to >=3.4,<3.9 (#354)
b4e48a7 Update jsonpickle requirement from <2.2.0,>=2.1.0 to >=2.1.0,<2.3.0 (#351)
1c491f0 Update wandb requirement from <0.13,>=0.12 to >=0.12,<0.14 (#350)
93d5eb4 Bump allenai/setup-beaker from 1 to 2 (#359)
dc0f89a Fix #355 - ensure git metadata is up-to-date (#361)
258e880 Raise better error msg from step_result_for_run()
(#360)
43916d1 Print debugging information about the repo used. (#353)
928aa7a Add BeakerExecutor
(#340)
v0.11.0
v0.10.1
What's new
Fixed ✅
- Fixed issue where the StepInfo config argument could be parsed into a Step.
- Restored capability to run tests out-of-tree.
Commits
2498318 Fix issue where StepInfo config could be parsed into a Step (#344)
57096b2 Make tests runnable out-of-tree for help with conda-packaging (#307)
v0.10.0
What's new
Changed ⚠️
- Renamed
workspace
parameter ofBeakerWorkspace
class tobeaker_workspace
. Executor
class is now aRegistrable
base class.MulticoreExecutor
is registered as "multicore".
Removed 👋
- Removed
StepExecutionMetadata
. Its fields have been absorbed intoStepInfo
.
Fixed ✅
- Improved
Step.ensure_result()
such that the step's result doesn't have to be read from the cache. - Fixed an issue with the output from
MulticoreExecutor
such that it's now consistent with the defaultExecutor
for steps that were found in the cache. - One of our error messages referred to a configuration file that no longer exists.
- Improved performance of
BeakerWorkspace
.
Added 🎉
- Added the ability to train straight
Model
instead of justLazy[Model]
Commits
4e809f5 Eager models (#319)
361777b Metadata changes, make executor registrable (#331)
a6b0be9 Beaker workspace performance (#328)
f43e5ea Update torch requirement from <1.12,>=1.9 to >=1.9,<1.13 (#330)
8495c64 update dev dependencies (#333)
712d862 Make multicore executor output consistent with default (#325)
903569c Refer to the right config file (#324)
bd9e4be Modernize our issue templates (#323)
v0.9.1
What's new
Fixed ✅
- Fixed non-deterministic behavior in
TorchTrainStep
. - Fixed bug in
BeakerWorkspace
where.step_info(step)
would raise aKeyError
if the step hasn't been registered as part of a run yet. - Fixed a bug in
BeakerWorkspace
where it would send too many requests to the beaker service. - Fixed a bug where
WandbWorkspace.step_finished()
or.step_failed()
would crash if called
from a different process than.step_starting()
. - Fixed a bug in
WandbWorkspace.step_finished()
which led to aRuntimeError
sometimes while
caching the result of a step.
Commits
c6fc5be Fix bugs with Workspace
and WandbWorkspace
, specifically (#321)
80c90ca Beaker DOS fix (#315)
8b75591 Log from BeakerStepLock
at WARNING level (#316)
4d46d67 fix non-deterministic behavior in TorchTrainStep (#314)
c59b6b3 Bump actions/setup-python from 3 to 4 (#311)
b02cf40 Bump sphinx from 4.5.0 to 5.0.1 (#305)
4501815 Bump furo from 2022.6.4 to 2022.6.4.1 (#309)
da9c29c Fix bug in Beaker workspace (#312)
e8422cb Bump mypy from 0.960 to 0.961 (#308)
8256a74 Bump myst-parser from 0.17.2 to 0.18.0 (#310)
44ae92e Bump furo from 2022.4.7 to 2022.6.4 (#306)
39923ae Update protobuf requirement from <=3.20.0 to <4.22.0 (#301)
e7ef1f5 Registerables first steps eg (#304)
v0.9.0
What's new
Added 🎉
- Added a Beaker integration that comes with
BeakerWorkspace
, a remoteWorkspace
implementation that uses Beaker Datasets under the hood. - Added a
datasets::dataset_remix
step that provides the split remixing functionality oftango.steps.datasest_remix.DatasetRemixStep
now for HuggingfaceDatasetDict
.
Changed ⚠️
- If you try to import something from a tango integration that is not fully installed due to missing dependencies, an
IntegrationMissingError
will be raised
instead ofModuleNotFound
. - You can now set
-j 0
intango run
to disable multicore execution altogether.
Fixed ✅
- Improved how steps and workspaces handle race conditions when different processes are competing to execute the same step. This would result in a
RuntimeError
before with most workspaces, but now it's handled gracefully. - Fixed bug which caused GradScaler state to not be saved and loaded with checkpoints.
Commits
0ddd2ac Add Beaker integration (#296)
6bdd1dd Updates the Euler example (#297)
bc89470 GradScaler state saving and loading (#293)
b8562db fix old filename in CONTRIBUTING.md (#300)
4aff1bb Dataset remix (#298)
eb1fcd8 Bump mypy from 0.950 to 0.960 (#295)
903741e Update filelock requirement from <3.7,>=3.4 to >=3.4,<3.8 (#284)
b58b823 Handle missing integrations (#292)
v0.8.0
What's new
Added 🎉
- Added a Weights & Baises remote
Workspace
implementation:WandbWorkspace
, registered as "wandb".
This can be instantiated from a workspace URL in the form "wandb://entity/project". - Added a method
Workspace.step_result_for_run
which gives the result of a step given the run name and step name within that run. - Added property
Workspace.url
, which returns a URL for the workspace that can be used to instantiate the exact same workspace usingWorkspace.from_url()
. Subclasses must implement this.
Changed ⚠️
StepInfo
start and end times will be always be in UTC now.WandbTrainCallback
now logs system metrics from each worker process in distributed training.StepCache.__contains__()
andStepCache.__getitem__()
now take accept either aStep
orStepInfo
as an argument (Union[Step, StepInfo]
).- Refactored
tango.step_graph.StepGraph
to allow initialization from aDict[str, Step]
. Executor.execute_step_graph()
now attempts to execute all steps and summarizes success/failures.
Fixed ✅
- Fixed bug with
LocalWorkspace.from_parsed_url()
(#278). - Deprecation warnings will now be logged from
tango
CLI. - Fixed the text format in the case of serializing an iterator of string.
- Added missing default value of
None
toTangoGlobalSettings.find_or_default()
. - Mypy has become incompatible with transformers and datasets, so we have to disable the checks in some places.
- The
VERSION
member of step arguments that were wrapped inLazy
were not respected. Now they are.
Commits
3069226 Makes sure the VERSION
parameter of classes is respected even when we construct them inside of a Lazy
object. (#289)
dd71446 Add Weights & Baises remote workspace (#232)
e3f2bd2 Adds a dependency that's missing from transformers (#285)
25919e1 Fixes the text format (#283)
381de74 Add missing default to TangoGlobalSettings.find_or_default()
(#282)
9ac708a Update click requirement from <8.1.3,>=7.0 to >=7.0,<8.1.4 (#277)
749357e Bump mypy from 0.942 to 0.950 (#276)
2c59c96 Bump allenai/beaker-run-action from 1.0 to 1.1 (#274)
53ffe80 refactor (#275)
v0.7.0
What's new
Added 🎉
- Added the "-n/--name" option to
tango run
. This option allows the user to give the run an arbitrary name. - Added a convenience property
.workspace
toStep
class that can be called from a step's.run()
method to get the currentWorkspace
being used. - Gave
FromParams
objects (which includes allRegistrable
objects) the ability to version themselves. - Added CLI option to run a single step in a config using
--step-name
or-s
. - Added a
MultiCoreExecutor
that executes steps in parallel. - Added an
ExecutorOutput
dataclass that is returned byExecutor.execute_step_graph()
. StepGraph
now prints itself in a readable way.- Tango now automatically detects when it's running under a debugger, and disables multicore support accordingly. Many debuggers can't properly follow sub-processes, so this is a convenience for people who love debuggers.
- Added more models to the stuff we can import from the transformers library.
- Added new example for finetuning text-to-text models.
Changed ⚠️
- Renamed
click_logger
tocli_logger
, and we now use rich's loggingHandler
as the default handler, which means prettier output, better tracebacks, and you can use rich's markup syntax with thecli_logger
to easily add style to text. - Refactored
tango.step_graph.StepGraph
to allow initialization from aDict[str, Step]
. Executor.execute_step_graph()
now attempts to execute all steps and summarizes success/failures.- Upgraded PyTorch version in
tango
Docker image to latestv1.11.0+cu113
. RunGeneration
now allows model object as input.
Fixed ✅
- Fixed bug that mistakenly disallowed fully-qualified names containing
"_"
(underscores) in the config. - Fixed bug where
TorchTrainStep
working directory would be left in an unrecoverable state if training failed after saving the final model weights. - Fixed bug in
FromParams
where**kwargs
might be passed down to the constructors of arguments. - Fixed bug in the way dependencies are tracked between steps.
- Fixed bug that caused
MulticoreExecutor
to hang in case of a failing step that was required recursively (not directly) downstream. - Fixed bug in the way dependencies are tracked between steps
- Compatibility with PyTorch Lightning 1.6
Commits
1083049 Finetuning (#255)
42b1dba Bug fix with failing steps (#257)
7bd251a Bump myst-parser from 0.17.0 to 0.17.2 (#273)
cc9a1dd Bump actions/upload-artifact from 2 to 3 (#262)
66777d9 Bump actions/download-artifact from 2 to 3 (#261)
14d4adb use new beaker-action for building test image (#265)
af47287 Update pytorch-lightning requirement from <1.6,>=1.5 to >=1.5,<1.7 (#248)
b1df9a4 use beaker-run action for GPU Tests (#263)
0a7468e fix release job (#260)
c1b16b2 Bump furo from 2022.3.4 to 2022.4.7 (#259)
b55aaf2 use beaker-py to submit GPU tests (#258)
b2a93a9 Logging part 2: denoising run logging and making Dirk happy (#252)
ff6be8d Update click requirement from <=8.0.4,>=7.0 to >=7.0,<8.1.3 (#254)
83d78cc Bump mypy from 0.941 to 0.942 (#243)
3769327 Bump sphinx from 4.4.0 to 4.5.0 (#245)
81fc5c5 Bump black from 21.12b0 to 22.3.0 (#246)
e46059b Update tqdm requirement from <4.64,>=4.62 to >=4.62,<4.65 (#256)
bbdeb6f Revert "Set $TEMP
(#241)"
b9fd9e9 Fix tracking dependencies between steps (#249)
53502e1 Pretty-print a step graph (#250)
d5328c9 Fix dissimilar objects hashing to the same thing (#240)
ccc37ce Autodetect debugger and turn off multicore (#251)
5c39f61 Pin click
5bb0fad Logging improvements (#233)
037e4a0 fix bug with FromParams (#242)
e142530 Bump actions/cache from 2 to 3 (#236)
878402d Set $TEMP
(#241)
2d9fa0c fix bug w/ TorchTrainStep working dir (#238)
410faeb Multicore Parallelism (#204)
9e8e99f Update datasets requirement from <2,>=1.12 to >=1.12,<3 (#234)
40e0a1a Bump mypy from 0.940 to 0.941 (#230)
ede7428 add name to changelog workflow
4bb659b Bump actions/setup-python from 2 to 3 (#229)
5db1a6a Bump actions/checkout from 1 to 3 (#228)
8049104 Update torch version where it's hard-coded, add an automatic remind to do this stuff in the future (#227)
fe05449 add back intersphinx inventory links for HF libraries (#222)
9927749 Bump mypy from 0.931 to 0.940 (#226)
29ab68b Update torch requirement from <1.11,>=1.9 to >=1.9,<1.12 (#225)
a3fc83b Bump furo from 2022.2.23 to 2022.3.4 (#218)
28e839e Bump fairscale from 0.4.5 to 0.4.6 (#224)
f18d393 Update tqdm requirement from <4.63,>=4.62 to >=4.62,<4.64 (#213)
54c4a8d automatically keep copyright up-to-date (#221)
06adb07 Allow setting the run name as a command-line option (#212)
71e0639 Update cached-path requirement from <1.1,>=1.0 to >=1.0,<1.2 (#217)
5d4660a Temporarily remove intersphinx links to HF docs (#220)
13c7f3f Merge pull request #216 from allenai/VersionForFromParams
0027cb2 Merge pull request #215 from allenai/fix-fully-qualified-name-recognition
76f9922 Add "Step.workspace" property (#210)
v0.6.0
What's new
Added 🎉
- New example that finetunes a pre-trained ResNet model on the Cats & Dogs dataset.
- Added a '@requires_gpus' decorator for marking tests as needing GPUs. Tests marked with this will be run in the "GPU Tests" workflow
on dual k80 GPUs via Beaker. - Added the "-w/--workspace" option to
tango run
andtango server
commands. This option takes a path or URL, and instantiates the workspace from the URL using the newly addedWorkspace.from_url()
method. - Added the "workspace" field to
TangoGlobalSettings
. - Added the "environment" field to
TangoGlobalSettings
for setting environment variables each
timetango
is run. - Added a utility function to get a
StepGraph
directly from a file. - Added
tango.settings
module andtango settings
group of commands. - A format for storing sequences as
SqliteSparseSequence
- A way to massage kwargs before they determine the unique ID of a
Step
Changed ⚠️
local_workspace.ExecutorMetadata
renamed toStepExecutionMetadata
and now saved asexecution-metadata.json
.tango run
without the option "-w/--workspace" or "-d/--workspace-dir" will now use aMemoryWorkspace
instead of aLocalWorkspace
in a temp directory, unless you've specified
a default workspace in aTangoGlobalSettings
file.- Moved
tango.workspace.MemoryWorkspace
andtango.local_workspace.LocalWorkspace
totango.workspaces.*
. - Moved
tango.step_cache.MemoryStepCache
andtango.step_cache.LocalStepCache
totango.step_caches.*
. - Deprecated the
-d/--workspace-dir
command-line option. Please use-w/--workspace
instead.
Fixed ✅
- Fixed a small bug
LocalWorkspace
would fail to capture the conda environment in our Docker image. - Fixed activation of
FILE_FRIENDLY_LOGGING
when set from the corresponding environment variable. - Fixed setting log level via the environment variable
TANGO_LOG_LEVEL
. - Use relative paths within the
work_dir
for symbolic links to the latest and the best checkpoints inTorchTrainStep
. - Fixed some scenarios where Tango can hang after finishing all steps.
distributed_port
andlog_every
parameters won't factor intoTorchTrainStep
's unique ID.MappedSequence
now works with slicing.MappedSequence
now works with HuggingfaceDataset
.- Uncacheable steps are now visible in Tango UI.
- Fixed bug in
Registrable.list_available()
where an error might be raised if the default implementation hadn't been explicitly imported. - Fixed issue where having a default argument to the
run()
method wasn't getting applied to the step's unique ID.
Commits
f9da0af Merge pull request #211 from allenai/Massage
e78dcbe Allow setting environment variables in tango settings, fix bug with TANGO_LOG_LEVEL env var (#209)
82404b6 Re-create LICENSE so GitHub will show it (#208)
0fadecf Bump furo from 2022.2.14.1 to 2022.2.23 (#207)
787b6e6 Merge pull request #206 from allenai/settings
c3401f2 Merge pull request #205 from allenai/RobustnessFixes
7ceda9c Merge pull request #201 from allenai/workspace-prep
6dd7d86 Merge pull request #200 from allenai/uncacheable-steps-in-server
5ad3f44 Bump furo from 2022.1.2 to 2022.2.14.1 (#199)
3528230 Update filelock requirement from <3.5,>=3.4 to >=3.4,<3.7 (#202)
21d6d40 Merge pull request #193 from allenai/StepGraphFromFile
258a7d2 skip 'distributed_port' and 'log_every' in unique ID (#197)
dd4c47f Merge pull request #192 from allenai/CloseSqliteHarder
cc94e1c Merge pull request #156 from allenai/DocumentationRefresh
5cc86b8 Rename "ExecutorMetadata" -> "StepExecutionMetadata" (#195)
6aecab7 Bump myst-parser from 0.16.1 to 0.17.0 (#191)
6478293 make pushing test image to Beaker more robust (#190)
7c1ac5b Finetune resnet Example for Tango (#150)
5187b01 update docs for integration tests and gpu tests timeout
95b78b5 Add new manually triggered workflow for integration tests, other bug fixes (#188)
19f7b31 Merge pull request #189 from allenai/fix-checkpoint-path-link
a438b26 Workflow quickfix
671a6dc verify exit code of beaker job (#187)
7ccad94 Merge pull request #186 from allenai/add-tests
bf6ecd0 Run GPU tests on Beaker (#183)