Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do all notebook filtering in the business, not in the CI job #362

Merged
merged 3 commits into from
Jul 23, 2024

Conversation

fajpunk
Copy link
Member

@fajpunk fajpunk commented Jul 19, 2024

  • This is nice because it is easier to understand and less error prone
  • This stinks because a job that didn't find any notebooks to run will look just like a job that successfully ran notebooks

I'd like to mark the check run as neutral in this case, but I think it can wait until another PR.

There are a few ways to do it:

  • Inspect the log output in the SolitaryResult. The more I think about this, given this notebook runner is the only case of this third type of result from a solitary execution (aside from 'success' or 'failure'), the less opposed I am to do it...
  • Factor the filtering logic out and use it in both the CI worker and in the NotebookRunner. I don't love this, because:
    • it requires looking at the contents of all of the files in both places, which isn't horrible I guess
    • We'd have to remember to factor out any future logic that gets added to NotebookRunner outside of the filtering that causes a wacky situation like this
  • Thread a result type through the solitary execution path and have the NotebookRunner return a SolitaryNeutralResult. This is probably the right way to do it, but it would add a lot of changes to this already-big-ish PR.

Another related thing would be putting execution logs in the github check details. I haven't done this so far because I'm worried that this would make it easy to accidentally leak some kind of sensitive info.

@fajpunk fajpunk requested a review from rra July 19, 2024 21:54
Copy link
Member

@rra rra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with doing all of the notebook filtering in the business.

I'm trying to decide if it would make sense to merge the paths between the mobu configuration and the repo configuration instead. I think that may be best? I don't like the idea that an exclusion directory in the notebook repository can cause a notebook to be included that's explicitly excluded in the Phalanx configuration.

If the configuration in the repository is invalid, I think this throws an uncaught exception that may not have all the metadata that we'd prefer. It feels like there should be a custom exception for this case that incorporates the information from the Pydantic ValidationError but also adds the normal mobu exception metadata for a better Slack exception report.

I share your worry about leaking information into the logs, like PostgreSQL passwords, but it would also be very nice to be able to see the logs. I'm anticipating the tutorial notebook case where the folks working on the notebook are going to ask us what failed. Maybe something to bring up during stand-up?

src/mobu/services/business/notebookrunner.py Outdated Show resolved Hide resolved
@rra
Copy link
Member

rra commented Jul 20, 2024

If I'm understanding the neutral result discussion correctly, yes, I think having a solitary return a different result when it found no notebooks to execute is probably the right way to do it.

@fajpunk
Copy link
Member Author

fajpunk commented Jul 22, 2024

I'm trying to decide if it would make sense to merge the paths between the mobu configuration and the repo configuration instead. I think that may be best? I don't like the idea that an exclusion directory in the notebook repository can cause a notebook to be included that's explicitly excluded in the Phalanx configuration.

@rra, I think we should ditch the autostart exclude config entirely 😄 With that config in both places, any way we reconcile them is potentially confusing:

  • If we override autostart with in-repo, then folks responsible for the autostart config (mostly we SQuaRE's now?) could be confused when flocks behave differently than is described in the autostart config.
  • If we override in-repo with autostart, then it's confusing to folks using the GitHub CI integration when real mobu behaves differently than their CI checks did, and they might not have the knowledge or permissions to look at the autostart config.
  • I think that same confusion applies if we merge the configs.

If the goal with the mobu roadmap is to promote mobu and enable more folks to use mobu to test their stuff, it seems like we should be moving stuff out of the autostart config (which only mobu admins can change) and into the payload repo. Assuming we can do it securely.

@rra
Copy link
Member

rra commented Jul 22, 2024

Ah, yes, that works. Let's do it that way. It's not a meaningful security barrier since people can just put notebooks elsewhere in the repository anyway, and the problem stems from having the same information in more than one place. Eliminating either of those places will fix the problem.

@fajpunk fajpunk force-pushed the tickets/DM-45079/centralize-notebook-filtering branch from 386f218 to b7cf734 Compare July 23, 2024 14:15
@fajpunk fajpunk requested a review from rra July 23, 2024 16:04
Copy link
Member

@rra rra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of minor nits, otherwise looks great.

src/mobu/exceptions.py Outdated Show resolved Hide resolved
src/mobu/services/business/notebookrunner.py Outdated Show resolved Hide resolved
* This is nice because it is easier to understand and less error prone
* This stinks because a job that didn't find any notebooks to run will look just like a job that successfully ran notebooks

I'd like to mark the check run as neutral in this case, but I think it
can wait until another PR.
Which means it can no longer be set in the autostart config.
`exclude_dirs must be set in an in-repo `mobu.yaml` config file now.
@fajpunk fajpunk force-pushed the tickets/DM-45079/centralize-notebook-filtering branch from c201ec9 to 1327bb0 Compare July 23, 2024 18:59
@fajpunk fajpunk enabled auto-merge July 23, 2024 18:59
@fajpunk fajpunk merged commit e938a01 into main Jul 23, 2024
4 checks passed
@fajpunk fajpunk deleted the tickets/DM-45079/centralize-notebook-filtering branch July 23, 2024 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants