Skip to content

Commit

Permalink
exp run: Copy Only pull pipeline data as needed example from dvc re…
Browse files Browse the repository at this point in the history
…pro (#4831)
  • Loading branch information
daavoo authored Sep 5, 2023
1 parent 5a0c53a commit 9d9f229
Showing 1 changed file with 58 additions and 0 deletions.
58 changes: 58 additions & 0 deletions content/docs/command-reference/exp/run.md
Original file line number Diff line number Diff line change
Expand Up @@ -345,6 +345,64 @@ $ dvc queue start
[grid search]:
https://en.wikipedia.org/wiki/Hyperparameter_optimization#Grid_search

## Example: Only pull pipeline data as needed.

You can combine the `--pull` and `--allow-missing` flags to reproduce a pipeline
while only pulling the data that is actually needed to run the changed stages.

Given the pipeline used in
[example-get-started-experiments](https://github.com/iterative/example-get-started-experiments):

```cli
$ dvc dag
+--------------------+
| data/pool_data.dvc |
+--------------------+
*
*
*
+------------+
| data_split |
+------------+
** **
** **
* **
+-------+ *
| train | **
+-------+ **
** **
** **
* *
+----------+
| evaluate |
+----------+
```

If we are in a machine where all the data is missing:

```cli
$ dvc status
Not in cache:
(use "dvc fetch <file>..." to download files)
models/model.pkl
data/pool_data/
data/test_data/
data/train_data/
```

We can modify the `evaluate` stage and DVC will only pull the necessary data to
run that stage (`models/model.pkl` `data/test_data/`) while skipping the rest of
the stages:

```cli
$ dvc exp run --pull --allow-missing
'data/pool_data.dvc' didn't change, skipping
Stage 'data_split' didn't change, skipping
Stage 'train' didn't change, skipping
Running stage 'evaluate':
...
```

## Example: Include untracked or ignored paths

If your code relies on some paths that are intentionally untracked or ignored by
Expand Down

0 comments on commit 9d9f229

Please sign in to comment.