Releases: determined-ai/determined
0.12.11
Changelog
5993e2e chore: bump version: 0.12.11rc2 -> 0.12.11
ba62016 chore: bump version: 0.12.11rc1 -> 0.12.11rc2
b03e21c fix: update examples link (#845)
cd2e4dd chore: add response headers to bust cache for elm and react index.html (#847)
b309be9 chore: bump version: 0.12.11rc0 -> 0.12.11rc1
1746c44 chore: link react trial logs for improved rendering performance [DET-3530] (#834)
a7b4c25 feat: react trial logs [DET-3128] (#830)
e1171b6 chore: bump version: 0.12.11.dev0 -> 0.12.11rc0
dad64cc ci: update webui e2e tests to kill experiment instead of cancel (#835)
6920dbd feat: add allgather_metrics to EstimatorContext (#826)
8f45512 test: add nightly test for pytorch flexible primitive example [DET-3534] (#827)
e501ea0 feat: experiment list filter [DET-2999, DET-3000] (#796)
a1b494e feat: model versions endpoints [DET-3478] (#822)
21fb956 fix: fix an issue with cluster resource computation [DET-3509] (#832)
1c5d151 feat: added cli logging to native (#833)
6843c8f feat: add unets tf.keras example [DET-3397] (#825)
a9d7007 feat: clean up swagger spec (#823)
afc6e3f revert: added cli logging to native. (#821)
3d42608 docs: add example for Pytorch flexible primitives [DET-3202] (#778)
0c99a0c feat: added cli logging to native [DET-3316] (#788)
9918ae1 refactor: dissolve experiment table and task table (#791)
f40b75e docs: improve docs for graceful trial termination (#809)
e9721d0 fix: correct the active task counter on dashboard [DET-3510] (#804)
3f599df docs: add warning for max_slots
[DET-3145] (#814)
d9ed73f feat: add preview search to new API (#813)
523b6b8 style: update master logs [DET-3471] (#793)
be5e99f feat: add experiments details page and endpoint [DET-3003] (#795)
6c59f30 Revert "feat: add preview search to new API (#777)" (#812)
d7d9176 fix: update the comment reference. (#802)
f44b527 feat: add preview search to new API (#777)
5fbb4a3 feat: support follow flag in trial logs (#810)
0c1dcd3 chore: bump version: 0.12.10.dev0 -> 0.12.11.dev0
d44261e fix: don't set Segment key to quotes (#803)
ee08c46 docs: update docs for estimator callbacks [DET-3461] (#800)
1836016 feat: support Pytorch multiple optimizers and LR schedulers [DET-3194, DET-3195, DET-3196, DET-3197, DET-3198] (#807)
ef1406c ci: ensure all release jobs have the proper filters (#805)
fad2ffd revert: support Pytorch multiple optimizers and LR schedulers (#806)
b860646 feat: support Pytorch multiple optimizers and LR schedulers [DET-3194, DET-3195, DET-3196, DET-3197, DET-3198] (#707)
2ae78e1 docs: release notes for 0.12.10 (#786)
7c51a47 docs: improve shared fs checkpoint exporting documentation. [DET-3392] (#797)
ad61ccd fix: retry if upload fails with requests.exceptions.ConnectionError [DET-3358] (#792)
7aea74c chore: log failed trial's trial logs when experiment succeeds [DET-3501]
6131fc8 fix: check for analytics library (#794)
a6f4114 feat: model registry create CLI (#787)
2e10c64 feat: task list batch [DET-3224] (#780)
d3f27c3 chore: refactor master to send batches in RUN_STEP [DET-3253] (#704)
5e48ae8 ci: remove cypress logs (#763)
498044c chore: point cluster and master logs routes to react (#757)
14396a1 fix: fix broken docs examples link [DET-3462] (#785)
e18fc6a feat: model registry describe and list CLI (#781)
89e8fb0 feat: task list search [DET-3222] (#768)
fcbeec4 fix: add missing sort-fix eslint plugin (#775)
ce31e56 style: update task table styles (#773)
d965527 build: swap wget for curl and add it as a dependency (#784)
e629043 build: add a missing dependency step (#783)
06d0850 feat: generate and use swagger typescript client [DET-3249 DET-3324 DET-3355] (#691)
Docker images
docker pull determinedai/determined-master:0.12.11
docker pull determinedai/determined-master:5993e2e
docker pull determinedai/determined-master:5993e2e0b866d8b4123bc8361d29fd5baa212756
docker pull determinedai/determined-dev:determined-master-5993e2e
docker pull determinedai/determined-dev:determined-master-5993e2e0b866d8b4123bc8361d29fd5baa212756
0.12.10
Changelog
ba5f7fb chore: bump version: 0.12.10rc3 -> 0.12.10
67e55da chore: bump version: 0.12.10rc2 -> 0.12.10rc3
6200680 fix: fix broken docs examples link [DET-3462] (#785)
7c7a0ca docs: release notes for 0.12.10 (#786)
9fb197c chore: bump version: 0.12.10rc1 -> 0.12.10rc2
aef27f5 chore: bump version: 0.12.10rc0 -> 0.12.10rc1
f0c8f6e chore: bump version: 0.12.10.dev0 -> 0.12.10rc0
ddeeddd feat: paginated CLI trial logs [DET-3442] (#779)
c481350 feat: add asha searcher (#735)
b751ebe fix: send Terminate response after on_trial_close callback [DET-3433] (#772)
d91c51f fix: fix relative asset paths for swagger-ui [DET-3437] (#764)
3477ac8 fix: avoid unnecessary re-rendering on each agent poll [DET-3427] (#760)
a93c794 fix: don't terminate container gang immediately when one container exits [DET-3435] (#774)
9e80a86 feat: add filter on task list page [DET-3223] (#756)
2263178 fix: ignore stale termination timeouts in trial (#769)
06538e6 feat: model python class (#767)
4d5d185 feat: add experiments page and table [DET-2998 DET-3015] (#742)
2d660c9 fix: hide master logs on elm (#770)
92aff8d chore: bump version: 0.12.9.dev0 -> 0.12.10.dev0
d829644 chore: bump version: 0.12.8.dev0 -> 0.12.9.dev0
f321a2b docs: update the path to example configs (#765)
2242d7b feat: add trial logs to the new API [DET-3308] (#766)
ea4f4f1 refactor: abstract task filters to be reusable (#748)
4c78789 feat: list models endpoint [DET-3278] (#762)
af948b5 feat: registry patch (#759)
f5e4e49 fix: dev sidebar (#754)
588aead chore: clean up GET agents endpoint (#758)
8e5f03a fix: checkpoint workload fails if upload fails (#752)
4a80514 style: antd style adjustments (#732)
5e93770 refactor: separate reusable task table columns (#741)
20af9b1 fix: import typo (#753)
e5ef920 fix: set a fallback array for computing available resources [DET-3411] (#751)
0362b9d feat: model registry get and post (#743)
5d6cf83 chore: don't filter metrics with "/" (#749)
153a9d4 chore: raise Eslint check level for sort and unused variable rules. (#750)
4aa9229 fix: shared_fs checkpoint validation (#746)
4bce303 feat: add GET experiments endpoints to new api (#717)
699e685 feat: show dev pages in development (#744)
c51e3eb feat: add no action state to task action dropdown [DET-3381] [DET-3393] (#725)
d9f443b style: fix lint issues (#740)
fee5bbf feat: add GBT TF Estimator example (#727)
ee4ec9f fix: add coalesce for checkpoints with null metadata [DET-3400] (#747)
54b6f6c feat: add top level resource provider [DET-3179, DET-3180] (#684)
cc7c68a refactor: abstract reading and writing to clipboard [DET-3396] (#736)
b0a13fa style: add linting rule to require await for async functions (#739)
7a6bc0a feat: make master & db deployable via helm [DET-3294] (#728)
d2a4c2b refactor: extract icon filter buttons from dashboard to be a reusable component (#734)
edab561 chore: clean up Pytorch LR Scheduler helper [DET-3270] (#715)
402915f refactor: separate task types [DET-3395] (#737)
2eeb4c4 style: add linting to prevent multi-spaces (#738)
bdf0036 style: fix task card, menu and dropdown styles to be uniform [DET-3286] (#723)
112ed39 feat: logs component and master logs [DET-2997 DET-3041] (#626)
e85c8bc refactor: rename asha to sha (#733)
c75e693 feat: model registry database migration [DET-3277] (#724)
e59a5a1 feat: increase GLOO timeout [DET-3309] (#729)
0acab3e fix: "det-deploy local agent-up" works for remote master [DET-3386] (#730)
657ef14 feat: link to swagger-ui from WebUI (#726)
b562a35 fix: correctly set steps for eval in EstimatorTrial (#731)
cb4090e chore: simplify logic in patchUsername (#702)
85a8501 feat: add tasks table component and task list page [DET-3221] (#652)
1c22922 feat: show experiments in increments [DET-3320] (#703)
9b2b8a9 docs: various improvements for checkpoint documentation (#718)
05b5eda feat: retry ConnectionError and ProtocolError types for GCS upload (#722)
ae6e5cd chore: remove is_chief calculation for non-horovod distributed training [DET-3338] (#705)
6d8c07c fix: use custom TLS cert only for Determined API requests [DET-3360] (#716)
f01c17c feat: add webui version mismatch notification on elm (#697)
dadde23 fix: change cache busting mechanism on react to query string (#696)
67db1b8 docs: adjust Keras documentation to indicate support for model.stop_training (#714)
e346c95 docs: add info to topic guide for graceful trial termination [DET-3361] (#713)
4bb60d5 feat: add user endpoints to new api (#689)
60ced66 fix: learning rate scheduler fix for bert squad example [DET-2897] (#711)
ee4ba43 feat: add a timeout to trial termination [DET-3246] (#690)
091bd09 chore: update webui test dependencies (#706)
4f71eb4 fix: handle auth check cancelation (#710)
26d10d5 feat: add context decorators and fix task cards [DET-2982] (#682)
63803a9 chore: better logging for websocket failures (#709)
e41fa59 fix: upgrade scheme when using websockets (#708)
559b504 fix: add missing directory in Swagger config path [DET-3312] (#680)
8f7b68e fix: correctly use mixed precision with multi-GPU in PyTorchTrial [DET-3285] (#699)
5aa0eb3 chore: bump version: 0.12.7.dev0 -> 0.12.8.dev0
b3d40e3 chore: bump version: 0.12.6.dev0 -> 0.12.7.dev0
ffb5de0 ci: ensure npm ci does not dirty package-lock.json via npm-force-resolution (#694)
4a349a3 docs: minor fixes for TensorBoard docs (#700)
28ef4b4 feat: add cluster page with donut charts [DET-2985] (#618)
aa48768 feat: store test cluster logs and improve test readme [DET-3269] (#657)
5b68448 docs: release notes for 0.12.7 (#701)
4b26ce7 fix: fix nightly tests file locations (#698)
39d7f18 docs: checkpoint metadata [DET-3211] (#671)
3b12c87 feat: add det user change-username
to CLI [DET-3322] (#692)
a331567 fix: data caching by rank for distributed setting [DET-2897] (#693)
c33df86 chore: bump task environments version (#695)
983546d fix: fix broken examples tests [DET-3321] (#688)
b43ffe1 docs: add explanation of det-nobody user (#686)
4cd1d6e refactor: restructure examples [DET-3126] (#673)
eca9e21 fix: Fix typo in terraform files for max_agent_starting_period (#685)
2f8f2c6 docs: document on_trial_close estimator hook (#683)
5ada60a chore: add User-Facing API Change label reminder (#676)
5a7e2e7 fix: apply same model compilation args to trial and native mode in TfKerasTrial [DET-3314] (#681)
Docker images
docker pull determinedai/determined-master:0.12.10
docker pull determinedai/determined-master:ba5f7fb
docker pull determinedai/determined-master:ba5f7fb0b580a300bb888e10c52d4b098a111e7f
docker pull determinedai/determined-dev:determined-master-ba5f7fb
docker pull determinedai/determined-dev:determined-master-ba5f7fb0b580a300bb888e10c52d4b098a111e7f
0.12.8
Changelog
c8497c6 chore: bump version: 0.12.8rc0 -> 0.12.8
cd5a66e chore: bump version: 0.12.8.dev0 -> 0.12.8rc0
60cc187 chore: bump version: 0.12.7 -> 0.12.8.dev0
5909230 chore: bump task environments version (#695)
01e56a5 docs: add explanation of det-nobody user (#686)
c7533a0 fix: Fix typo in terraform files for max_agent_starting_period (#685)
97a26a2 docs: document on_trial_close estimator hook (#683)
Docker images
docker pull determinedai/determined-master:0.12.8
docker pull determinedai/determined-master:c8497c6
docker pull determinedai/determined-master:c8497c6bde3bdc7121d3a2071e88814153a61555
docker pull determinedai/determined-dev:determined-master-c8497c6
docker pull determinedai/determined-dev:determined-master-c8497c6bde3bdc7121d3a2071e88814153a61555
0.12.7
Changelog
d770579 chore: bump version: 0.12.7rc0 -> 0.12.7
19bf22e chore: bump version: 0.12.7.dev0 -> 0.12.7rc0
1ca3a87 docs: release notes for 0.12.7 (#701)
55a81f4 chore: bump version: 0.12.6.dev0 -> 0.12.7.dev0
31c0edc docs: add RPM package install documentation (#674)
e3757af feat: support IndexedSlices for multi-GPU TF2 training [DET-3186] (#608)
052da17 build: build storybooks as part of CI [DET-3248] (#622)
7d9a115 feat: enable sign in button when last username is recalled (#679)
44103f9 fix: update /info to not require auth (#677)
642b851 feat: checkpoint export from database fields (#664)
9b78b54 chore: remove unused variables (#675)
1d03c75 fix: update state labels to be more user-friendly (#672)
481d72a fix: eagerly update experiments on successful write actions [DET-3263] (#642)
3b07678 feat: add task list page route and placeholder [DET-3220] (#636)
4c2d0a6 feat: remember last logged in username [DET-3274] (#660)
18c8125 refactor: set up experiments context [DET-3255] (#640)
5e5b188 chore: add license to pip metadata (#669)
05aa3d2 feat: support TF Keras EarlyStopping callbacks [DET-3240] (#666)
4056146 docs: add to FAQ how to port a TF core graph model (#650)
c8bb942 feat: support Estimator early stopping hooks [DET-3239] (#661)
3ab90a6 test: temporarily disable AMP test since it causes NaNs (#670)
629f106 feat: treat NaN metrics as an error (#667)
db76932 fix: set auth cookie path to apply site wide (#668)
6588f77 feat: decouple agent information from workloads starting tasks [DET-3178] (#631)
f604a28 feat: read cookies in the new API auth module (#665)
9da1063 fix: space out WebUI plot x-axis ticks a bit more (#658)
b9d9324 feat: support early stopping callbacks on a validation step (#662)
cfb3f51 feat: add user auth to new api (#649)
414bfdf fix: set authentication failure reason synchronously. (#659)
ed94d86 feat: decouple agents from transmitting container status changes [DET-3174] (#646)
f27146a fix: address minor login issues (#611)
d014500 revert: "revert: "feat: support stopping training in trial code [DET-3238] (#648)" (#654)" (#656)
44a398a feat: ensure WebUI version is up to date with platform version (#632)
5baea6a revert: "feat: support stopping training in trial code [DET-3238] (#648)" (#654)
ee1314f feat: support stopping training in trial code [DET-3238] (#648)
fa09a74 ci: download protoc install to /tmp (#653)
9759ce7 docs: release notes for 0.12.5 (#595) (#651)
5f476df chore: remove yarn mentions from tests (#635)
8662fda fix: correct filename in Elm Makefile (#647)
0e7ca0a feat: add checkpoint metadata to cli describe commands (#645)
84e875a test: fix nightly nas and iris tf keras tests [DET-3264] (#644)
4ff9fa0 feat: checkpoint metadata api (#619)
cbbe117 chore: move proto files to determined namespace (#639)
fafd686 feat: add template endpoints to new api (#638)
4bad652 feat: support USER_CANCELLED exited reason (#637)
d1146d3 refactor: update link to support secure blank targets (#612)
f71d64e feat: add page component [DET-3232] (#614)
25e725e feat: support gradient clipping in PyTorchTrial via callbacks (#615)
80e39d0 feat: add antd breadcrumb stories [DET-3002] (#582)
5c9afa2 feat: add activate, pause, and cancel actions to task cards [DET-2934] (#585)
a3e121a feat: add end of training callback to EstimatorTrial (#621)
8056055 feat: make agent starting period configurable [DET-3219] (#624)
8fdc371 chore: upgrade proto libraries (#630)
bdfd980 fix: correct logic for checking if a validation is the best one seen (#601)
f590fc3 chore: remove container recovery (#629)
a8c1bb2 feat: add master endpoint to new api (#627)
678d53d chore: ignore pkg dir in proto sub project (#628)
65b5c17 chore: bump version: 0.12.5.dev0 -> 0.12.6.dev0 (#625)
13c0db2 chore: move proto to separate top level package (#620)
897f2f6 revert: make agent starting period configurable [DET-3219] (#623)
7f83e97 feat: make agent starting period configurable [DET-3219] (#610)
b01b560 fix: read docker config file from HOME directory (#587)
e0d0447 feat: make GCP operation tracker timeout configuration [DET-3182] (#598)
0011218 feat: add agent endpoints for new api (#613)
b08657e test: set seed for fashion mnist nightly convergence test (#616)
92ecfc0 fix: pass checkpoint gc metadata as a file (#606)
52b006e feat: decouple container logs from agents (#604)
3379eca test: remove WebUI e2e-tests dependency on det-deploy [DET-3072 DET-2652] (#575)
c5c5eaf fix: simplify login and logout (#553)
976617e refactor: remove additional determined routing [DET-3216] (#609)
6dcfde7 refactor: separate API configs (#584)
4050020 feat: initial grpc support (#552)
fd34fec chore: resolve new node security vulnerabilities (#607)
3bb6b83 feat: support early trial termination (#586)
88c3fbb test: create a test suite for examples (#597)
5474ac4 ci: fix upload-try-now-template (#599)
2c457aa ci: fix changelog generation (#603)
eb057e2 docs: various fixes for Native API docs
9e17699 fix: synchronize before gradient clipping in PyTorch (#602)
fa97c3b fix: properly stringify optional public message (#590)
159c41f docs: update sphinx theme version
924f1d5 feat: BERT on SQuAD Dataset (#574)
1e7025b docs: fix checkpoint load default path
31bcf57 fix: avoid saving pytorch model architecture (#594)
445f5cd docs: clarify documentation for agent startup script
21b1832 docs: fixes for PyTorch API docs.
beba5f8 feat: use str instead of pathlib.Path in checkpoint callbacks
173e48f fix: enable logging with --local --test mode (#589)
d415758 fix: use "Agent ID" instead of "Agent Name" in CLI.
Docker images
docker pull determinedai/determined-master:0.12.7
docker pull determinedai/determined-master:d770579
docker pull determinedai/determined-master:d770579b5ab09c662fa5325b535a8d4e202d7564
docker pull determinedai/determined-dev:determined-master-d770579
docker pull determinedai/determined-dev:determined-master-d770579b5ab09c662fa5325b535a8d4e202d7564
0.12.5
Changelog
d9a2cdc fix: avoid saving pytorch model architecture (#594)
071b3eb docs: release notes for 0.12.5 (#595)
be835fd feat: use str instead of pathlib.Path in checkpoint callbacks
4b1c8ca fix: enable logging with --local --test mode (#589)
0978df0 docs: add documentation for EsatimatorTrial callbacks
2e1eb39 feat: add callbacks to EstimatorTrial
52d50c9 fix: fix off-by-one error in master logs
fc27be0 feat: auto focus the username field on page load (#580)
9475664 docs: tweak advice on mounting file systems with cloud deployments
49ae381 docs: use "distributed training" to mean any kind of multi-GPU training
9e5506d feat: support hooks in EvalSpec when using TF Estimators
f975690 docs: add documentation for pytorch callbacks
9ab8984 feat: add timeout to tensorboard startup
354c744 fix: fix startup-hook.sh for tensorboard-entrypoint.sh
69999cd docs: evaluation functions should return JSON-serializable metrics
e0ec9f1 feat: support PyTorch on_validation_end callbacks with multi-GPU
bd569b6 feat: update to nccl 2.6.4 and fix multi-machine dtrain (#564)
1b79ec1 feat: add meta-learning example using protonets (#527)
b003bc7 docs: re-organize model definitions docs
7b82073 docs: package-based install documentation improvements
b0f075d feat: bump YogaDL verion to 0.1.1
ae8e192 feat: make TF 2.2 the default TF2 version
253fd74 feat: support TF 2.2
4f99bb9 feat: add multiple lr schedulers example
085114e fix: pass s3 endpoint url to tensorboard process
2cf584c fix: set default http method name to GET (#516)
a7a3221 feat: add on_checkpoint_end PyTorch callback
5fa952e feat: add user with password and user without password to tests
5c8bf4c fix: enable OSX interactive session during det user create
9c08cc6 feat: support arbitrary login redirect routes (#522)
47ec6cc feat: checkpoint load uses trial to retrieve checkpoints
f14f769 feat: add model code to checkpoints
8623fea feat: add ability to load a trial class locally
f1b30a4 feat: support --csv
, --json
to slot list
and agent list
in CLI
9a707ab docs: add CONTRIBUTING guidelines
bdf83f8 fix: don't support AMP w/ aggregation_frequency > 1
13078b7 docs: update Users docs for auto-login removal [DET-2992] (#532)
25ffffe fix: re-sync package-lock.json to package.json for react
87dc544 docs: various fixes
208ee11 feat: add imagenet NAS architecture (#378)
6c185a2 feat: add container count to "det agent list"
ebf3628 feat: make Link component to be based on HTML anchor tag
2300788 docs: release notes for v0.12.4 (#520)
7668874 fix: update login docs link
74b0fb4 feat: add support for presenting the authentication token
b659f72 docs: update reference docs to include PyTorchTrialContext
4c55a90 feat: add a PyTorchContext with ability to access model, optimizer, lr scheduler
f2726c6 feat: add generic callback support to PyTorch trials
Docker images
docker pull determinedai/determined-master:0.12.5
docker pull determinedai/determined-master:35e75b5
docker pull determinedai/determined-master:35e75b5c2fa2241f2ecdccbdc58634d107234377
v0.12.4
Changelog
8b1d50b build: place a 'v' in front of version for goreleaser
581ce06 chore: bump version: 0.12.4rc2 -> 0.12.4
e0089fd docs: release notes for v0.12.4 (#520)
9c68737 chore: bump version: 0.12.4rc1 -> 0.12.4rc2
abe61c9 ci: use bigger clusters for release parties (#508)
2d4700f ci: don't require approval for deploying release-.* branches (#504)
b9bb9bc build: tag git commit when using bumpversion (#502)
2688d5d revert "add ability to load a trial class locally"
cae7881 revert: "add model code to checkpoints"
350cb4f revert: checkpoint load uses trial to retrieve keras h5 checkpoints"
efc60a6 chore: bump version: 0.12.4rc0 -> 0.12.4rc1
fc6bcbd chore: bump version: 0.12.4.dev0 -> 0.12.4rc0
935e5f1 test: install pandas to fix iris test
90e5302 chore: rename fixture-up to cluster-up
1edd6c7 fix: use master_host in default agent config
734b242 build: remove commit message check (#499)
8f81497 chore: update linter and templates for squash-and-merge compatibility (#496)
f1b9a1d fix: checkpoint load uses trial to retrieve keras h5 checkpoints
d1ac971 feat: add model code to checkpoints
64a9218 feat: add ability to load a trial class locally
3640fb9 docs: updates for recent changes
161c1da fix: add missing request headers
5c272c5 ci: test packaging
f4924f7 chore: various minor cleanups to the provisioner
31ac008 chore: minor cleanup for filterable view code in scheduler
a0db589 fix: include LaunchTime
in implementation of equality for Instance
01c7196 fix: fix logic for updating instance snapshot in provisioner
5d4b0d6 docs: fix custom env tf.2 snippet
b643c39 fix: fix opening shell
a58d865 fix: fix shell host address parsing
f358d83 fix: pass a string for visible devices, as TF expects
62feea0 docs: add additional Docker install instructions
8bdda00 chore: remove patch_saver_restore for tf_keras
8e74b74 refactor: remove dependency on GPUtil
928e689 fix: set the ApiBuilder errorType based on the response code
3a50670 chore: remove hasura
b9ba90a test: adjust confidence threshold for a successful convergence
2825753 chore: rename best checkpoint to imply its a function
3de5ef2 chore: remove graphql dependency from commons
fd32687 chore: remove GraphQL from React SPA
1187d5d chore: add experiment summary query to replace hasura in react
54e0cec fix: add missing "f" for format string
0003db5 build: update flake8 to >=3.8.0
48ef6af build: separate symlinks from tools run server step
add3db8 build: fixes for flake8 update
1ee3ee6 build: remove parallel support for make all
aed76c7 docs: edit guide on using trained checkpoints
01c8b82 ci: add AWS CLI install to docs publishing
76e2d1e ci: publish the try-now template separately
3f833aa fix: resize the Ace editor when an error pops up
18e0f37 docs: update CONTRIBUTING.md
for recent changes
91015ac chore: remove hasura from cli
74f0e23 chore: remove hasura from checkpoint list
f6b8847 build: add temp db support to run-server
7baae45 docs: revise data access tutorial
beb82b8 refactor: simplify implementation of updateTasks and updateAgents in scheduler
a7140a9 refactor: avoid dependending on pointer equality in scheduler test
c86899b chore: remove hasura from integration tests
2099060 build: rely on goreleaser LDFLAGS
3a9d984 ci: fix CircleCI release
5b467ab ci: cleanup e2e pytest calls
222bd16 build: clean tools dir on root level clean
68b9aa1 fix: format agent tools file
be4943c fix: enable go modules for root get deps
c065695 build: support parallel test target
6036321 build: support parallel fmt target
186fb38 build: support parallel check target
64eb8ee build: support parallel clean target
e240cc5 build: set root all target to build
ea297fa build: remove unused VERSION variable in root Makefile
49a6b6e build: support parallel builds
4d2d8c1 ci: use 3.x-slim-buster images for CLI tests
1594a52 ci: cleanup branch filters
72f9709 chore: remove agent install target
68d036a build: reduce deploy build target output
2a0fc77 ci: fix Go cache
ab270e9 ci: e2e tests using Go binaries
fb6a49e build: remove master install target
026865c build: remove python-get-deps target
b691005 build: remove go-get-deps target
42e89f6 build: clean up master build dir
607e71d docs: minor fixes for dtrain documentation
d478f81 feat: support different master host between agent and container
875df52 chore: ignore build dir in tools dir
2d1adc9 chore: suppress output of tools run target
6a5f039 fix: symlink subproject build dirs to local run
1fdf383 build: add local cluster mode for developement
24b23a0 build: add agent build process to root Makefile
dc8bc08 build: add local builds for the master
572722e fix: clean up agent build artifacts properly
cf34eea chore: remove agent test coverage report
f48a1a4 build: add build target for agent
0827ae2 docs: make Sphinx default to not highlighting code blocks
8e1c0dd docs: make miscellaneous edits
1e1d696 docs: remove non-ASCII quotes
6c9c551 docs: use Sphinx's ability to show only the base of a code name
111554b build: clean up master goreleaser file
8d17998 ci: separate tf1 and tf2 tests
ab02af1 feat: add support for "det master config" in CLI
43dd6f3 feat: support "det m" as an alias for "det master" in the cli
2f41d8a build: remove master builds from root Makefile
87abdb7 build: clean up docs deploy Makefile
50f6e35 chore: use an available Go constant over 301 status code
d75f958 perf: remove an unnecessary redirect from the webui root
32246c1 build: clean up elm make file
2d2e118 ci: clean up Python venv creation
e75c84b build: clean up react Makefile
bdbd58a chore: remove unused env vars from webui tests
82890c3 build: clean docs build in CircleCI
07b854d build: clean up docs Makefile
f2694c5 build: add examples Makefile
8b90a2a ci: fix node cache install logic
ca2c4e2 ci: fix e2e_tests splitting paths
ba12073 chore: move master buildtools to tools.go
905c88d chore: remove component env var from master Makefile
9ec3fec chore: clean up master ldflags config
a0b567c chore: remove dependency on internal goimports
44d04d9 chore: remove gotest dependency
3894766 chore: remove ldflag requirements for master go test
7005aac ci: bump wait_for_master time to five minutes
43a3518 chore: isolate agent build system
fe65f27 docs: improvements to AWS, GCP install docs
53952ae ci: update docker executors and use determinedai/cimg-base
9d35de5 fix: pin docutils version to avoid dependency version conflict
a42c55c fix: add ssh extra docker dependency to det-deploy
20bd785 chore: simplify commons Makefile
c28bff5 ci: update login/out to wait on requests and cookie set/get
de4bb24 chore: simplify det-deploy Makefile
386b1da chore: simplify harness Makefile
2b9d131 chore: remove version dep from python packages
a4a2204 chore: simplify cli Makefile
7c1ee8d chore: remove unused script
638f2ce ci: use python:3.6.9 for deploy step
89c5427 ci: upgrade to new vm image that has Docker 19.03
22bb3f1 ci: use remote docker save/load for local tests
60258fe fix: avoid crash in PyTorch trial when debug mode is on
5b0c0fb chore: simplify agent build tools
df2b90e test: add a test for tf.dataset with tf.keras native api
0e12cd2 fix: add missing dep in build-docker target
40d1579 refactor: fix links to new dynamic agents docs
9745349 refactor: reorganize the doc structure for dynamic agents
62a77c2 docs: add elastic infrastructure topic guide
7fb21ab refactor: remove the old dynamic agents overview
3bed89e docs: describe usage of nvidia-container-toolkit
on Docker 19.03
136b07a feat: make det-deploy
work with nvidia-container-toolkit
1d95731 feat: make the provisioner work with nvidia-container-toolkit
d746cab feat: make the agent work with nvidia-container-toolkit
416ff5a test: intercept and wait for requests in login before moving on
cdf6568 chore: avoid hardcoding the test username in test units
4df0cd8 chore: remove old mailmap file
064c90b chore: bump circleci cache
e58bea7 chore: combine root requirements files
cd431f5 chore: remove references to release docs
e4f6f18 build: remove release-requirements.txt
26e7220 build: move bump2version to be a dev-dependency
76b963b build: move docs-requirements.txt to docs/requirements.txt
2bff674 build: remove unused python configs
5400fa8 chore: remove unused dockerignore file
80184a6 chore: move phony targets to respective targets
d39b74b chore: remove unused root build target
db749a6 chore: move commit linter to check target
9465a59 ci: store docs artifacts
e0dd3e5 chore: merge build docker targets in root makefile
109b8d0 chore: remove unused PHONY targets
d3b8465 chore: removed unused makefile variables
95aa9e5 chore: fix references to Apple's OS, which is currently called macOS
b45fe4c ci: allow manual approvals to be provided immediately
4aba004 docs: reorganize a few docs
a259c0b test: move tests under their respective component directories
0247760 chore: move commit message linter to new step
a98cf2d fix: exit out of non-chief containers gracefully
0ba3386 chore: remove unused PHONY target
c088405 chore: clean webui root make targets
29a1a31 ci: pull cypress docker image in dependency set up step
319b6dd fix: salt and hash password before posting
5614bb6 docs: add agent config reference
530fd3b chore: remove unused agent build target
a8463e7 test: add a convergence test for cifar10_cnn_pytorch
8695b6e fix: have cifar10_cnn_pytorch use training set
e6745e1 docs: fix subtle math bug in cifar10_cnn_pytorch example
659ea23 chore: add a minimum validation period to cifar10_cnn_pytorch
201291b test: add a mnist_pytorch_multi_output capability test to nightly
3df2892 refactor: remove graphql from elm
a25c962 chore: remove publish phase
96d0e09 chore: remove bump version target
567ed07 build: remove unused Python dependencies
54a183d build: si...
v0.12.3
Changelog
65b1e51 chore: bump version: 0.12.3rc3 -> 0.12.3
4ddfab4 fix: add logic in agent to invoke docker credentials helpers
dcd4d86 docs: add 0.12.3 release notes
c6ca487 fix: fix loading logic for tensorpacktrial when backbone is set
f2e5702 chore: bump version: 0.12.3rc2 -> 0.12.3rc3
17818fc fix: remove cd from startup-hook.sh in bert_glue_pytorch
d6f429e fix: fix make -C agent clean
ef74363 chore: bump version: 0.12.3rc1 -> 0.12.3rc2
ffc4544 feat: remove container_path from shared_fs configuration documentation
dab6de7 fix: wrong postgres query in restart
22e7be9 chore: better logs --tail default
87bcc70 fix: fix FasterRCNN example
4aeeca1 fix: install determined before using it
f502935 fix: update imports in nas
9d5df9d fix: add specificity to css rules to avoid override sub styles
85c6c1e fix: add missing order for steps in det t describe
a8ee76e chore: fix determined-deploy publish
52723b0 feat: reintroduce trial endpoints
6deafa9 fix: add script to run inside agent's container
a352a21 docs: reorganize main install page
2743056 chore: bump version: 0.12.3rc0 -> 0.12.3rc1
50f452c docs: add a topic guide for effective distributed training
2e6c826 docs: cleanup docstrings
e94e865 chore: bump version: 0.12.3.dev0 -> 0.12.3rc0
e3de26c docs: improve docs around context_dir argument
42b3223 feat: improve experiment configuration handling for LOCAL mode
6e29449 feat: add --local support to cli
5873205 chore: update company name in license files
66da4e9 docs: fix typo in quickstart
7aea9c8 docs: document Docker daemon socket bind mount requirement for agents
fc591c1 fix: display progress bar for image pulls
04edcca docs: add HP and dtrain to quick-start
154e3af chore: change mode of trial entrypoint to 744
a900609 chore: make PyTorch checkpoint code path open to all
97d127e fix: properly restore stopping experiments
defe9ec fix: incorrect shared_fs path for tensorboard if storage_path is set
cbfe85c docs: do some miscellaneous copy editing
ff6f7aa docs: use better code block markup in CLI reference page
1f67d8f docs: fix link to docs on Chinese AWS site
67c6668 docs: edit doc page on users
df2b3b8 docs: remove outdated reference to single-file model definitions
4175ea9 docs: fix unintentionally separated lists
edad1a7 docs: fix unintentional definition lists
23c444f docs: add attributions for master and agent
2203869 feat: add restart policy for fault-tolerance
673007a feat: re-revert add Determined object for consolidating authentication
daf6244 feat: usability messages for det-deploy aws
e7ece2a fix: correctly initialize LRSchedulers for multi-gpu training
c35cb1b fix: replace get_lr() with get_last_lr() in LRScheduler
62beacf fix: correctly call step() for LRScheduler when using epochs
c3cadc3 fix: revert add Determined object for consolidating authentication
b8ddc3d feat: add Determined object for consolidating authentication
1d5e3fe feat: make pending commands killable
c82925b feat: exit log tailing based on state and time rather than polls
18ae0e5 docs: add experimental warnings to native tutorial
bbd15ad chore: delegate auth failure handling to error handler
1daa6e6 docs: release notes for version 0.12.2
b953623 fix: clean up logic around commands
2f97381 docs: add a Native API tutorial
d84de8d chore: use pypi to install yogadl
54a3f48 chore: bump version: 0.12.2 -> 0.12.3.dev0
f8a81c0 feat: introduce REST API builder and refactor existing APIs
f0b996c fix: update modal to support flexible height and auto scrolling
e530ad7 chore: add PyTorch object detection example
89c530f feat: sort user selection option to keep the authenticated user on top
2979949 docs: update docs to move native apis to experimental
e9ee930 chore: update native examples to use experimental namespace
df3e0ea chore: fix experiment config comment typos in examples
6f99e07 chore: fix broken and outdated reference links in TF Keras example
bc9ce53 feat: recover notebook state on agent failure
52fb275 chore: shift naming of test and submit to local and cluster
b38bdef fix: grab back control after loading native implementation
69af009 chore: support .dev and rc tags in determined version
1c4be08 chore: bump version: 0.12.1 -> 0.12.2
be3b295 chore: disambiguate logs in workload_sequencer
961798c chore: standardize official example experiment configurations
dfd54ce docs: update README.md for recent docs reorg
515afe4 chore: bump environment images
6667a87 fix: respect shared_fs storage_path configuration in tensorboard
65c3b2e docs: minor fixes for quick start
acabe33 docs: minor fixes for tutorials
bbab37d docs: revert quick-start to use tarball
8a42748 feat: expose container failure reason in trial logs
d5792ad docs: tweak tutorial text around downloading model code
26a1e0b fix: fix data config for MNIST PyTorch distributed example
0c02269 chore: update environments
1eee210 fix: add missing arguments to GraphQL schema update command
d12ceb6 fix: don't automatically import tf with determined
8798035 chore: update generated GraphQL files for function rename
6ecb84e fix: move a new SQL migration after all previous ones numerically
1d014b8 fix: use per-slot batch size in tf keras Native
514a2c6 chore: move calculating batch sizes to EnvContext
c1c5f0b fix: use a default session when initializing TFKerasTrialController
64c9a32 chore: add more logging to harness
9a70a74 feat: add experiment class for querying top checkpoints
2af7f2a feat: add experiment level best checkpoint by metric
4435d7d chore: mount master config file rather etc root
8fb768d fix: add det command to PATH
dcea476 fix: update old references to "mnist_tf_keras"
b44abe6 docs: update CONTRIBUTING.md
dbefe18 chore: add JetBrains IDE config folder to .gitignore
6573bac chore: bump and limit Cypress version to 4.3.x
d1c3e2b fix: fixture-down before fixture-up
4e52149 docs: update local deploy with new commands
60c108e fix: use correct hvd size function
775f24e docs: add documentation for experimental contexts
5c26ca7 docs: add examples using data layer
65c09ff feat: support data layer in harness
9fcefee feat: bind mount data layer paths
cc1a5eb feat: add data layer config
c21aab8 docs: add command to speed up experiments
17ccd19 docs: fix tutorials typos
v0.12.1
Changelog
d9a7b8a chore: bump version: 0.12.0 -> 0.12.1
efed66b feat: add inbox logging to actor system
31ae5cc docs: revise landing page
cb37821 docs: update tf-mnist tutorial to not use adapt_data
e4b9873 feat: support native input formats for tf.keras
18758c6 fix: unify container_defaults/trial_runner configs
40b3b1a chore: lower max_connections of postgres instances
3353325 chore: lock down database max_connection on the client side
a4b1822 chore: specify application_name in postgres cnxs
7bbf8f8 chore: det-deploy gcp formatting for docs and outputs
6f090bb docs: quick start pytorch
419f25c docs: document distributed context
b629b9d chore: get private ip address in CI/integrations/get_address.py
25280cf chore: always delete nightly tests clusters
0b828a0 chore: wait for master for nightly tests
65b501d test: fix cluster log manager for remote clusters
a6ab9dd chore: fix environment variable for nightly tests
9b8b544 docs: add data-access, name tweaks on landing page
5f4e232 chore: update rolling cluster to use correct name
4913a44 test: update test_gc_checkpoint to wait for gc_task to start
ad99a83 docs: update landing page
2514f05 fix: add missing pandas install to iris experiment
0ea369a chore: directly install the determined wheels in entrypoints
b6f6e17 test: don't skip tf2 keras gradient aggregation tests
a5c42b8 feat: update environments with Horovod TF2 Keras gradient aggregation support
66b6db2 docs: add clarity around configuration
18e0bca feat: add master, agent subcommands for det-deploy local
976e814 docs: update pytorch tutorial
9911872 docs: fix tutorials.index indents
1c40b25 feat: add min-cpu-platform to gcp deployment
9c077e6 docs: data-access tutorial
c609e55 refactor: update tf.keras mnist
3f002f6 fix: change _ to - for gcp down variable placeholder names
72a8fb9 docs: break installation background info into a new page
2fe556a docs: add more info on selecting an installation method
efe9370 docs: remove notes about system administrators
8bc7b2e docs: move around installation-related pages
c84342b fix: update gcp arch diagram from pedl to det
5e14162 feat: support tf.data.Dataset for all TF version
1f0552c chore: revert "chore: sync aws det-deploy"
218c3ea chore: standardize gcp within det-deploy
cde9d54 chore: sync aws det-deploy
20b929b feat: add a central error handler with notification support
095c4e6 test: use a no op experiment for WebUI test
8f61d49 chore: include test in webui module dependencies
fcf9972 ci: always cleanup before running the WebUI tests
4ff3043 docs: revise tf.keras MNIST tutorial
006363b chore: minor cleanup for tf.keras MNIST example
4ee1323 docs: add docs for wrap_model
in TFKerasTrialContext
a5d7de2 feat: cloudwatch logs for master and agents on aws
de84c10 docs: improve docs for distributed training
f5f5b35 feat: persist master db by default
f7cd8ed fix: increase GCP Cloud SQL default max connections
690a8c2 refactor: remove use_tf_dataset from cifar tf keras tests
5d59327 test: add parallel test for mnist_tf_keras
7a340f5 feat: simplify mnist_tf_keras example
94abc2f chore: fix typo in log message
dc8b17e chore: tweak master log output for telemetry
7416856 docs: update docs for revised telemetry reporting
2688e6a fix: revise telemetry reporting in the master
9aab97f fix: update golang-migrate URL
4bb2878 fix: package default master.yaml with Docker image
c2d0e00 test: update tests for multi array-like data
c73d2ff docs: update examples for multi inputs and outputs in keras
11781ef docs: update docs for multi inputs and outputs in keras
56fad61 chore: remove tf data adapter
459c872 feat: support multi inputs and outputs for keras
0569460 feat: enrich error types
8251da9 chore: add Determined AI logo to README.md
80d891a fix: access to public images
8ad3362 refactor: add a new global trialLogger actor
f792ab6 test: add unit tests for det.create_trial_instance
b5b32ad chore: clean-up use of self.context in tf.keras fixture
3521065 feat: Support determined.create_trial_instance for local development
dcd0b90 docs: improve python reference documentation
e9510b6 docs: add documentation to native-related apis
2683e6e fix: catch all keyboard exits when tailing trial logs
87e0854 chore: fix comments in tf.keras MNIST example
8a18051 fix: perform cluster scheduling every 500ms
d6097aa fix: add default aws region to constants
001b9bd style: clean up minor doc issues
22d666b ci: add cypress cleanup to post_e2e_tests
506a2b1 docs: fixes for package names and example pip commands
98f53f3 feat: add try now button to README
790c801 fix: fix the user select label logic considering authUser and cache
32d4b27 fix: delete init.py in examples
5554adb docs: consolidate HP tuning topic guides
a51542a docs: tweak main index page
0b8103d docs: fix typo in installation docs
aed56fc docs: fix list formatting in main index page
ded6db0 fix: update product name in Elm webui
1cb6ab5 docs: various work on TF and PyTorch tutorials
da6e493 test: skip notebook log modal test
3c41d3f fix: set task type filter to start as all disabled
f67c121 fix: login test flake
ae8d09c feat: add gcp to det-deploy
8a70f41 ci: use a new set of Segment keys for CI and no keys by default
c07cb6a docs: fix references to old name of terms doc
f175124 docs: update terminology doc
5edb860 chore: fix various typos
4ed4f05 fix: use unique download paths for tf keras mnist example
35d96b1 fix: apply the computed user filter for recent tasks
ed0732f chore: bump version: 0.11.2 -> 0.12.0
c771f8d chore: update bumpversion commit message
39650e2 feat: add community code of conduct
24459ab feat: update mnist estimator example to create saved_model checkpoints
1aa722e feat: add checkpoint.load for tensorflow saved_models
f499f2c feat: add metadata.json file to tensorflow checkpoints
v0.12.0
Changelog
f489216 chore: bump version: 0.11.2 -> 0.12.0
e20ebe9 chore: update agent ami
4e276f0 test: update test cases according to Native examples
1e2859b chore: use more reasonable default configuration for native API
4aaf154 docs: update examples with searcher configuration
016e598 fix: order rendezvous ports deterministically
d6ebdf1 feat: dynamically pull telemetry key from /info endpoint
5f899ad docs: fix wording in tutorials
73dd04d docs: add reference docs for TrialContext objects
88301e1 build: clean before copying static files
3822a87 style: clean up dashboard to make interactive items stand out
e97696a feat: isolate user preferences in localStorage from each other
4288b39 fix: avoid showing 'No Agents' before agents' request is resolved
7481613 feat: refactor agents into using React hook pattern
6d1fcda chore: revert "chore: fix "Sign in/out" cypress test with waiting"
37a0f20 docs: update tutorials
2225b8e build: allow master's default Segment keys to be set at build time
004faf0 docs: add ace to webui attributions
9342062 docs: clean-up reference docs under reference/api
3557af6 docs: update the master configuration doc
ed3f0d2 fix: validate args to det trial download
d96ebfc fix: use random directory do download data in cifar pytorch example
398485d fix: set visible devices for tf2 keras eager trials
c0e8451 ci: pull environments images for integration tests
2851181 chore: update default environments
c0a8e90 chore: fix "Sign in/out" cypress test with waiting
eca6d24 feat: support natively selecting GPU devices
33f0681 test: stream Cypress browser logs to terminal and CI
9cad03e docs: rearrange structure of installation docs
4ca980a docs: bring master.yaml up to date
f87cf8d feat: add distributed namespace to context
8173c27 ci: shorten default shared_fs dir name
da5f3d5 refactor: change default storage dir permissions
aa3245e fix: restore non-root container support
da5e64a test: add integration test for non-root containers
35bd5f9 chore: delete unused Scanner-Valuer interface
5488651 fix: update examples directory in docs build
4bff89a feat: migrate to determined repository