Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle non-standard experiment pipeline configurations #4264

Merged
merged 15 commits into from
Jul 16, 2023

Conversation

mattseddon
Copy link
Member

@mattseddon mattseddon commented Jul 12, 2023

Closes #4222

This PR separates out checking of stages from experiments. Separating out the logic makes it easier for us to target different pipelines when running experiments.

The pipeline selection logic is as follows:

  • Each dvc.yaml is counted as a potential pipeline within a project.
  • In order to be counted as a pipeline the stage list must either fail (return undefined) or be non-empty
  • When there are no pipelines show the Add Stage button in the experiments table and block running/queuing experiments.
  • If there is only one pipeline then use it as the cwd for experiment commands
  • If there are multiple pipelines then automatically select the root pipeline first
  • If there are multiple pipelines and none at the root then offer the user the choice each time they try to run a command.

I have added comments inline to help with the size of the change.

Demo

Add pipeline from Run Experiment

Screen.Recording.2023-07-14.at.11.14.55.am.mov

Invalid vs empty pipeline

Screen.Recording.2023-07-14.at.11.15.56.am.mov

Non-root dvc.yaml

Screen.Recording.2023-07-14.at.11.17.51.am.mov

Choice of pipelines

Screen.Recording.2023-07-14.at.11.19.05.am.mov

Demo project

Screen.Recording.2023-07-14.at.11.21.12.am.mov

@@ -218,19 +256,17 @@ export const buildExperimentsData = (

export const stubWorkspaceExperimentsGetters = (
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[F] This function needs a rewrite now (should use buildExperiments) but doing that in this PR will just add even more noise. I'll branch off this and make the change.

default:
return ''
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[F] moved to Pipeline.

}

private async addPipeline(cwd: string) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[F] Moved to Pipeline as well.

this.pipelines.create(this.getRoots())
])
this.experiments.create(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[F] Now that an Experiments needs a Pipeline we have to move it out of the initial promise list.

@@ -20,7 +20,7 @@ import {
} from '.'
import { dvcDemoPath } from '../test/util'
import { DOT_DVC } from '../cli/dvc/constants'
import { scriptCommand } from '../experiments/workspace'
import { ScriptCommand } from '../pipeline'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[F] Gave the enum the proper casing.

@@ -173,8 +173,7 @@ export const findOrCreateDvcYamlFile = (
? relative(cwd, trainingScript)
: format(parse(trainingScript))

const pipeline = `
# Type dvc-help in this file and hit enter to get more information on how the extension can help to setup pipelines
const pipeline = `# Type dvc-help in this file and hit enter to get more information on how the extension can help to setup pipelines
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[F] Dropped the initial newline

return this.notifyChanged(dag)
const [dag, fileList] = await Promise.all([
this.internalCommands.executeCommand(AvailableCommands.DAG, this.dvcRoot),
this.findDvcYamls()
Copy link
Member Author

@mattseddon mattseddon Jul 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[F] Unfortunately dvc stage list -R . does not fail for broken dvc.yamls (that fail validation). We need that information because we've updated the logic to count invalid dvc.yaml(s) towards there being stages/pipelines. This is because we don't want to show the Add Stage button under the circumstances that there is a pipeline that has failed validation. There should already be enough information in the experiments table to work that out and the user has already passed the stage of needing a basic introduction to pipelines/stages.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the logic is now

  1. find all dvc.yamls for the project.
  2. run dvc stage list against each directory
  3. return an object of the format { [dir:string]: stdout } to the model for processing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately dvc stage list -R . does not fail for broken dvc.yamls (that fail validation).

We can use dvc stage list -R . --fail to detect if at least one dvc.yaml failing validation, but we would probably need more information per dvc.yaml so we'd have to call on each directory anyway.

super()
this.dvcRoot = dvcRoot
this.data = this.dispose.track(new PipelineData(dvcRoot, internalCommands))
this.data = this.dispose.track(
data || new PipelineData(dvcRoot, internalCommands)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[F] Have to pass data in for testing purposes.

return this.model.hasPipeline()
}

public async getCwd() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[F] Logic for deciding which pipeline to run a command against.

return
}

const dataUpdated = new Promise(resolve => {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[F] This promise is added so that if a user tries to run an experiment and has to add a stage the experiment will start running after the new stage has been added.

@@ -15,7 +15,6 @@ const tableDataFixture: TableData = {
hasConfig: true,
hasMoreCommits: { main: true },
hasRunningWorkspaceExperiment: true,
hasValidDvcYaml: true,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[F] We no long show Add Stage (disabled or not) if there is an invalid dvc.yaml.

@@ -574,8 +573,6 @@ suite('Experiments Tree Test Suite', () => {
})

it('should be able to queue an experiment from an existing one with dvc.views.experiments.queueExperiment', async () => {
stub(DvcReader.prototype, 'listStages').resolves('train')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[F] This stub is now wrapped in buildDependencies 👍🏻

this.deferred.resolve()
this.dispose.untrack(waitForInitialData)
waitForInitialData.dispose()
this.dispose.track(
this.pipeline.onDidUpdate(() =>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[F] This replaces changeHasConfig

@mattseddon mattseddon changed the title Split pipelines data (stages) out from experiments Handle non-standard experiment pipelines configuration Jul 14, 2023
@mattseddon mattseddon added product PR that affects product and removed 🏠 housekeeping labels Jul 14, 2023
@mattseddon mattseddon marked this pull request as ready for review July 14, 2023 01:47
@mattseddon mattseddon changed the title Handle non-standard experiment pipelines configuration Handle non-standard experiment pipeline configurations Jul 14, 2023
Copy link
Contributor

@julieg18 julieg18 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

return
}
const repository = this.getRepository(cwd)
const repository = this.getRepository(project)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor, should we move this logic (get project then repository) into its own function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a look once I've merged down

return this.notifyChanged(dag)
const [dag, fileList] = await Promise.all([
this.internalCommands.executeCommand(AvailableCommands.DAG, this.dvcRoot),
this.findDvcYamls()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately dvc stage list -R . does not fail for broken dvc.yamls (that fail validation).

We can use dvc stage list -R . --fail to detect if at least one dvc.yaml failing validation, but we would probably need more information per dvc.yaml so we'd have to call on each directory anyway.

@mattseddon mattseddon enabled auto-merge (squash) July 16, 2023 21:41
@codeclimate
Copy link

codeclimate bot commented Jul 16, 2023

Code Climate has analyzed commit e321dc5 and detected 0 issues on this pull request.

The test coverage on the diff in this pull request is 89.5% (85% is the threshold).

This pull request will bring the total coverage in the repository to 95.2% (0.1% change).

View more on Code Climate.

@mattseddon mattseddon merged commit ad00496 into main Jul 16, 2023
3 checks passed
@mattseddon mattseddon deleted the add-pipeline-model branch July 16, 2023 21:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
product PR that affects product
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add targets to "run experiment"
2 participants