Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

example-dvc-experiments: Improvements #85

Closed
2 tasks done
iesahin opened this issue Sep 3, 2021 · 7 comments
Closed
2 tasks done

example-dvc-experiments: Improvements #85

iesahin opened this issue Sep 3, 2021 · 7 comments
Labels
A: example-get-started-experiments DVC Experiment, DVCLive examples enhancement New feature or request epic priority-p1 Immediate pool of tickets to take and work as part of the next sprint question Further information is requested

Comments

@iesahin
Copy link
Contributor

iesahin commented Sep 3, 2021

What do we need to make https://github.com/iterative/example-dvc-experiments useful for the Studio?

Originally posted by @shcheklein in #79 (comment)

@iesahin iesahin self-assigned this Sep 3, 2021
@iesahin iesahin added enhancement New feature or request epic priority-p1 Immediate pool of tickets to take and work as part of the next sprint question Further information is requested labels Sep 3, 2021
@dberenbaum
Copy link

Why do metrics need to be cached for Studio?

@shcheklein
Copy link
Member

No need them to be cached (for plots and metrics). Model files I would expected to be cached and not gitignored. Intermediate artifacts - depends. Probably it's fine to avoid caching 70K images and ignore them. But we should really fix the DVC performances for this. Repo becomes suboptimal, artificial because of the limitations that we have.

@shcheklein
Copy link
Member

I would rename the ticket also - it's not about Studio, for any scenario I would expect model files to be DVC-tracked (and may be some important intermediate artifacts).

@iesahin iesahin changed the title example-dvc-experiments: Improvements for Studio example-dvc-experiments: Improvements Sep 6, 2021
@shcheklein
Copy link
Member

In the code, I see some dead code:

# print(f"Training Dataset Shape: {training_images.shape}")
    # print(f"Testing Dataset Shape: {testing_images.shape}")
    # print(f"Training Labels: {training_labels}")
    # print(f"Testing Labels: {testing_labels}")

we should not keep dead code around. Also let's run linters, and the other regular tools to keep it clean please.

@dberenbaum
Copy link

Is there a reason that data/images has cache: false in dvc.yaml? Was caching the output causing some issue?

@iesahin
Copy link
Contributor Author

iesahin commented Nov 2, 2021

data/images is extracted from data/images.tar.gz and contains 70K small files. dvc push/pull takes considerable time when we cache it. (We might need "cache but don't send to remote" setting for files and dirs, but that's a separate discussion.)

@dberenbaum
Copy link

We might need "cache but don't send to remote" setting for files and dirs, but that's a separate discussion.

That's in iterative/dvc#2095 and probably is pretty easy now that we have iterative/dvc#6486. We just need a way to specify "none."

@shcheklein shcheklein added the A: example-get-started-experiments DVC Experiment, DVCLive examples label May 11, 2022
@iesahin iesahin removed their assignment Jun 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: example-get-started-experiments DVC Experiment, DVCLive examples enhancement New feature or request epic priority-p1 Immediate pool of tickets to take and work as part of the next sprint question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants