-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow for logging to Studio when not inside a repo #646
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #646 +/- ##
==========================================
- Coverage 95.55% 95.51% -0.05%
==========================================
Files 55 55
Lines 3532 3542 +10
Branches 314 317 +3
==========================================
+ Hits 3375 3383 +8
- Misses 110 111 +1
- Partials 47 48 +1 ☔ View full report in Codecov by Sentry. |
Need to look into windows tests not passing. Somehow dvclive expects to pass posix paths even on windows, but not sure where that conversion happens. |
This is because Studio expected the paths in POSIX format. It was added here: Lines 28 to 30 in 36e8c4c
|
The cause was because the conversion only happens within a repo, which brought up that we need to decide where paths should be relative to if there is no repo root. I went with cwd for now. |
I see. Probably no need to do with The |
This depends on:
|
Updated to remove that. |
@dberenbaum do you have a sense on what is the mid-term plan for this? If we are not attached to Git - what is the benefit of using the DVC ecosystem? (sorry, I know it's a non-trivial question, definitely not a blocker of this PR or effort). Data management? something else? Is there a way to clone the repo in those system under the hood / seamlessly for end-users? Do we want to cover scenarios like when people don't use Git at all - they don't have a repo at all even? (we pretty much then become two products - Git based and separately MlFlow/W&B, right? |
It's not in my plans at this point. This is mainly to cover scenarios where you have a project in a repo but want to run an experiment in an environment where it's painful/impossible to have a full copy of the Git repo. In my experience, it's not uncommon to have a notebook that gets checked into Git but that runs independently in databricks, colab, etc. It's also helpful for the SageMaker scenario where you can kick off a job on a remote machine, but that machine only has access to a single .py file, so this way you can at least see metrics populated in Studio. |
Okay, I see. What is the plan of connecting it back? Those experiments will stay "studio"-only, is it correct? And why can't we try to do a git clone? any specific obstacles for that? |
@mattseddon i think you should be aware by this change as well (baseline is optional) |
In the end, baseline is still required to be sent to dvc-studio-client in this PR and defaults to |
@@ -131,9 +132,14 @@ def _update_entries(old, new, key): | |||
def get_exp_name(name, scm, baseline_rev) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated get_exp_name
to always be able to return an experiment name instead of only when inside a repo.
scm = self._dvc_repo.scm if self._dvc_repo else None | ||
if isinstance(scm, NoSCM): | ||
scm = None | ||
if scm: | ||
self._baseline_rev = scm.get_rev() | ||
self._exp_name = get_exp_name(self._exp_name, scm, self._baseline_rev) | ||
logger.info(f"Logging to experiment '{self._exp_name}'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we have tried to find a repo, we can generate all exp info we need instead of scattering it throughout the code
assert live._baseline_rev is not None | ||
assert live._exp_name is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These now get set always
@@ -60,31 +57,6 @@ def test_exp_save_skip_on_env_vars(tmp_dir, monkeypatch, mocker): | |||
assert live._inside_dvc_pipeline | |||
|
|||
|
|||
def test_exp_save_run_on_dvc_repro(tmp_dir, mocker): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test was actually outdated and not working. It conflicts with test_dvc_repro
below, so I consolidated the meaningful assertions into that test.
monkeypatch.setenv(DVC_EXP_BASELINE_REV, "foo") | ||
monkeypatch.setenv(DVC_EXP_NAME, "bar") | ||
monkeypatch.setenv(DVC_ROOT, tmp_dir) | ||
|
||
mocker.patch("dvclive.live.get_dvc_repo", return_value=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed
@@ -166,7 +138,7 @@ def test_errors_on_git_add_are_catched(tmp_dir, mocked_dvc_repo, monkeypatch): | |||
mocked_dvc_repo.scm.untracked_files.return_value = ["dvclive/metrics.json"] | |||
mocked_dvc_repo.scm.add.side_effect = DvcException("foo") | |||
|
|||
with Live(dvcyaml=False) as live: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed
assert live._baseline_rev is not None | ||
assert live._exp_name is not None | ||
assert not live._studio_events_to_skip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consolidated from the deleted test_exp_save_run_on_dvc_repro
above. On repro
, everything will still work except actually saving an exp ref.
def test_post_to_studio_requires_exp(tmp_dir, mocked_dvc_repo, mocked_studio_post): | ||
assert Live(save_dvc_exp=False)._studio_events_to_skip == {"start", "data", "done"} | ||
assert not Live()._studio_events_to_skip | ||
def test_post_to_studio_without_exp(tmp_dir, mocked_dvc_repo, mocked_studio_post): | ||
assert not Live(save_dvc_exp=False)._studio_events_to_skip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Live updates to Studio will happen regardless of save_dvc_exp
value.
src/dvclive/live.py
Outdated
@@ -571,7 +558,8 @@ def make_report(self): | |||
|
|||
@catch_and_warn(DvcException, logger) | |||
def make_dvcyaml(self): | |||
make_dvcyaml(self) | |||
if self.dvc_file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the case for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean when can it be None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the only (?) unresolved question? Approved, since it doesn't look anything major ...
good stuff, Dave. thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, missed this one. Good catch. Maybe this came from an earlier iteration. I dropped this condition in the latest commit.
Partially addresses #638. This does not solve how to copy metrics back to the repo or make the experience completely smooth everywhere, but the goal for this PR is to achieve parity with other trackers where a token and project are all that's needed.
This allows for posting to Studio if the following environment variables are set:
DVC_STUDIO_TOKEN
DVC_STUDIO_REPO_URL
DVC_EXP_BASELINE_REV
DVC_EXP_NAME
TODO:
DVC_EXP_BASELINE_REV
and/orDVC_EXP_NAME
optional