Skip to content

Commit

Permalink
Merge branch 'main' into fix-typo
Browse files Browse the repository at this point in the history
  • Loading branch information
dberenbaum authored Oct 3, 2023
2 parents e8e7bf8 + 504c285 commit 746db9b
Show file tree
Hide file tree
Showing 12 changed files with 385 additions and 34 deletions.
8 changes: 7 additions & 1 deletion content/basic-concepts/pipeline.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
---
name: Pipeline
match: [pipeline, pipelines, 'data pipeline', 'data pipelines', 'dvc pipelines']
match:
- pipeline
- pipelines
- 'data pipeline'
- 'data pipelines'
- 'dvc pipelines'
- 'dvc pipeline'
tooltip: >-
DVC pipelines describe data processing workflows in a standard declarative
YAML format ([`dvc.yaml`](/doc/user-guide/project-structure/dvcyaml-files)).
Expand Down
106 changes: 106 additions & 0 deletions content/docs/api-reference/artifacts_show.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
# dvc.api.artifacts_show()

Get the path and Git revision for an <abbr>artifact</abbr> tracked in a
<abbr>DVC repository</abbr>.

```py
def artifacts_show(
name: str,
version: Optional[str] = None,
stage: Optional[str] = None,
repo: Optional[str] = None,
) -> Dict[str, str]:
```

## Usage:

```py:
import dvc.api
artifact = dvc.api.artifacts_show(
'text-classification',
repo='https://github.com/iterative/example-get-started.git',
)
```

## Description

Returns a path and Git revision for a named artifact which can then be used in
other Python API calls.

The returned dictionary will be of the form:

```py
{
'path': 'model.pkl',
'rev': 'c7c6ae0',
}
```

where `path` contains the relative path to the artifact in the DVC repository,
and `rev` contains the Git revision for the specified artifact version or stage.

When neither `version` nor `stage` are provided, the Git revision for the latest
version of the model will be returned.

## Parameters

- `name` (required) - name of the artifact. By default DVC will search for
artifacts declared in a `dvc.yaml` file located at the root of the DVC
repository. Artifacts declared in other `dvc.yaml` files should be addressed
in the form `path/to/dvc.yaml:artifact_name` or `path/to:artifact_name` (where
`dvc.yaml` is omitted).

- `version` - version of the artifact (mutually exclusive with `stage`).

- `stage` - stage of the artifact (mutually exclusive with `version`).

- `repo` - the location of the DVC project. It can be a URL or a file system
path. Both HTTP and SSH protocols are supported for online Git repos (e.g.
`[user@]server:project.git`). _Default_: The current project (found by walking
up from the current working directory tree).

## Example: Read the contents of an artifact

```py
import pickle
import dvc.api

artifact = dvc.api.artifacts_show(
'text-classification',
version='v1.0.0',
repo='https://github.com/iterative/example-get-started.git',
)
data = dvc.api.read(
artifact['path'],
rev=artifact['rev'],
repo='https://github.com/iterative/example-get-started.git',
mode='rb',
)
model = pickle.loads(data)
```

This example uses the returned path and Git revision in conjunction with
`dvc.api.read()` to read the file content for the artifact.

## Example: Download an artifact

```py
import os
import dvc.api

artifact = dvc.api.artifacts_show(
'text-classification',
stage='prod',
repo='https://github.com/iterative/example-get-started.git',
)
fs = dvc.api.DVCFileSystem(
'https://github.com/iterative/example-get-started.git',
rev=artifact['rev'],
)
fs.get_file(artifact['path'], os.path.basename(artifact['path']))
```

This example uses the returned path and Git revision in conjunction with
`dvc.api.DVCFileSystem` to download the artifact to the current working
directory.
134 changes: 134 additions & 0 deletions content/docs/command-reference/artifacts/get.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
## artifacts get

Download an <abbr>artifact</abbr> tracked in a DVC project into the current
working directory.

## Synopsis

```usage
usage: dvc artifacts get [-h] [-q | -v]
[--rev [<version>]] [--stage [<stage>]]
[-o [<path>]] [-j <number>] [-f]
[--config CONFIG]
[--remote REMOTE] [--remote-config [REMOTE_CONFIG ...]]
url name
positional arguments:
url Location of DVC repository to download from
name Name of artifact in the repository
```

## Description

Provides a way to download artifacts tracked in a DVC project. Unlike `dvc get`,
`dvc artifacts get` supports downloading an artifact by name, rather than by
path. Likewise, `dvc artifacts get` supports downloading a registered artifact
version or stage, instead of requiring a specified Git revision.

`dvc artifacts get` also supports downloading artifacts both from the
<abbr>model registry</abbr> and from DVC remotes.

<admon type="tip">

Downloading an artifact from the <abbr>model registry</abbr> only requires a
valid Studio
[access token](/doc/studio/user-guide/account-management#studio-access-token).
It does not require the client to have DVC remote credentials.

</admon>

The `url` argument specifies the address of the DVC or Git repository containing
the artifact. Both HTTP and SSH protocols are supported (e.g.
`[user@]server:project.git`). `url` can also be a local file system path
(including the current project e.g. `.`).

The `name` argument specifies the name of the artifact to download. By default
DVC will search for artifacts declared in a `dvc.yaml` file located at the root
of the DVC repository. Artifacts declared in other `dvc.yaml` files should be
addressed in the form `path/to/dvc.yaml:artifact_name` or
`path/to:artifact_name` (where `dvc.yaml` is omitted).

<admon type="info">

`dvc artifacts get` will first try to download artifacts via the <abbr>model
registry</abbr>. If you do not have a valid Studio token, or the artifact is not
tracked in the model registry, DVC will fall back to downloading the artifact
from the project's default DVC remote.

</admon>

## Options

- `--rev <version>` - Version of the artifact to download. The latest version of
the artifact is used by default when neither `rev` nor `stage` are specified.

- `--stage <stage>` - Stage of the artifact to download. The latest version of
the artifact is used by default when neither `rev` nor `stage` are specified.

- `-o <path>`, `--out <path>` - specify a `path` to the desired location in the
workspace to place the downloaded file or directory (instead of using the
current working directory). Directories specified in the path will be created
by this command.

- `-j <number>`, `--jobs <number>` - parallelism level for DVC to download data
from the remote. The default value is `4 * cpu_count()`. Using more jobs may
speed up the operation. Note that the default value can be set in the source
repo using the `jobs` config option of `dvc remote modify`.

- `-f`, `--force` - when using `--out` to specify a local target file or
directory, the operation will fail if those paths already exist. this flag
will force the operation causing local files/dirs to be overwritten by the
command.

- `--config <path>` - path to a [config file](/doc/command-reference/config)
that will be merged with the config in the target repository.

- `--remote <name>` - name of the `dvc remote` to set as a default in the target
repository. Only applicable when downloading artifacts from a DVC remote.

- `--remote-config [<name>=<value> ...]` - `dvc remote` config options to merge
with a remote's config (default or one specified by `--remote`) in the target
repository. Only applicable when downloading artifacts from a DVC remote.

- `-h`, `--help` - prints the usage/help message, and exit.

- `-q`, `--quiet` - do not write anything to standard output. Exit with 0 if no
problems arise, otherwise 1.

- `-v`, `--verbose` - displays detailed tracing information.

## Example: Download an artifact from a DVC remote

```cli
$ dvc artifacts get https://github.com/iterative/example-get-started.git text-classification --rev=v1.0.0
Downloaded 1 file(s) to 'model.pkl'
```

In this example, we download version `v1.0.0` of the artifact. Since we have no
Studio credentials set in our environment, `dvc artifacts get` will download the
artifact from the default DVC remote defined in the repository.

## Example: Download an artifact using a Studio token

```cli
$ DVC_STUDIO_TOKEN=mytoken dvc artifacts get https://github.com/iterative/example-get-started.git text-classification --stage=prod
Downloaded 1 file(s) to 'model.pkl'
```

In this example, we download stage `prod` of the artifact. Since we have set our
Studio access token in the `DVC_STUDIO_TOKEN` environment variable,
`dvc artifacts get` will download the artifact via the <abbr>model
registry</abbr> rather than from a DVC remote.

## Example: Download an artifact defined in a specific `dvc.yaml` file

```cli
$ dvc artifacts get https://github.com/iterative/lstm_seq2seq.git results/dvc.yaml:best
Downloaded 1 file(s) to 'epoch=0-step=16.ckpt'
```

In this example, we download the latest version of the `best` artifact. In this
case, the artifact is defined in `results/dvc.yaml` so we must include the path
to the `dvc.yaml` file when addressing the artifact. Since we do not specify
`--rev` or `--stage`, `dvc artifacts get` will download the latest version of
the artifact by default.
27 changes: 27 additions & 0 deletions content/docs/command-reference/artifacts/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# artifacts

Commands for working with DVC <abbr>artifacts</abbr> and the <abbr>model
registry</abbr>.

## Synopsis

```usage
usage: dvc artifacts [-h] [-q | -v] {get} ...
positional arguments:
COMMAND
get Download an artifact from a DVC project.
```

## Description

`dvc artifacts` subcommands provide a command line client for working with
<abbr>model registry</abbr> artifacts.

## Options

- `-h`, `--help` - prints the usage/help message, and exit.

- `-q`, `--quiet` - do not write anything to standard output.

- `-v`, `--verbose` - displays detailed tracing information.
88 changes: 67 additions & 21 deletions content/docs/dvclive/how-it-works.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,22 +109,81 @@ with Git, in which case you can use

## Setup to Run with DVC

You can create or modify the `dvc.yaml` file at the base of your repository (or
elsewhere) to define a [pipeline](#setup-to-run-with-dvc) to run experiments
with DVC or
[customize plots](/doc/user-guide/experiment-management/visualizing-plots#defining-plots).
A pipeline stage for model training might look like:
Running experiments with DVC provides a structured and reproducible
<abbr>pipeline</abbr> for end-to-end model training. To run experiments with
DVC, define a pipeline using `dvc stage add` or by editing `dvc.yaml`. A
pipeline stage for model training might look like:

<toggle>
<tab title="CLI">

```cli
$ dvc stage add --name train \
--deps data_dir --deps src/train.py \
--outs model.pt --outs dvclive \
python train.py
```

</tab>
<tab title="YAML">

```yaml
stages:
train:
cmd: python train.py
deps:
- train.py
- data_dir
outs:
- model.pt
- dvclive
```
</tab>
</toggle>
Adding the DVCLive [directory] to the [outputs] will add it to the DVC [cache]
(if you previously tracked the directory in Git, you must first stop tracking it
there). If you want to keep it in Git, you can disable the cache. You can also
choose to cache only some paths, like keeping lightweight metrics in Git but
adding more heavyweight plots data to the cache:
<toggle>
<tab title="CLI">
```cli
$ dvc stage add --name train \
--deps data_dir --deps src/train.py \
--outs model.pt --outs-no-cache dvclive/metrics.json \
--outs dvclive/plots \
python train.py
```

</tab>
<tab title="YAML">

```yaml
stages:
train:
cmd: python train.py
deps:
- train.py
- data_dir
outs:
- model.pt
- dvclive/metrics.json:
cache: false
- dvclive/plots
```
</tab>
</toggle>
Now you can run an experiment using `dvc exp run`. Instead of DVCLive handling
caching and saving experiments, DVC will do this at the end of each run. See
examples of how to [add DVCLive to a pipeline] or [add a pipeline to DVCLive
code], including how to parametrize your code to iterate on experiments.

<admon type="tip">

You may have previously tracked [outputs] with `Live.log_artifact()` that
Expand All @@ -135,24 +194,11 @@ pipeline. You can optionally drop `Live.log_artifact()` from your code.

</admon>

Optionally add any subpaths of the DVCLive [directory] to the [outputs]. DVC
will [cache] them by default, and you can use those paths as [dependencies]
downstream in your pipeline. For example, to cache all DVCLive plots:

```diff
stages:
train:
cmd: python train.py
deps:
- train.py
outs:
- model.pt
+ - dvclive/plots
```

[directory]: /doc/dvclive/how-it-works#directory-structure
[cache]: /doc/start/data-management/data-versioning
[outputs]: /doc/user-guide/pipelines/defining-pipelines#outputs
[dependencies]: /doc/user-guide/pipelines/defining-pipelines#simple-dependencies
[pipelines]: /doc/start/experiments/experiment-pipelines
[pipeline]: /doc/start/experiments/experiment-pipelines
[generates]: /doc/dvclive/live/make_dvcyaml
[add DVCLive to a pipeline]: /doc/start/data-management/metrics-parameters-plots
[add a pipeline to DVCLive code]: /doc/start/experiments/experiment-pipelines
Loading

0 comments on commit 746db9b

Please sign in to comment.