-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Start separating normalisation from validation logic #282
Conversation
We change the validation logic and separate the normalisation from the validation step. We make sure that if a notebook is normalized, it emits a warning. In the future we will turn the warning in to an Error. We add test for the current and an xfail test for the future behavior
@ewjoachim, you expressed interest in Jupyter and security. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey :) Thanks for the invitation to review :)
It's my first time contributing to this project, I've tried to make relevant comments, but I know quite a few comments in here are on code that you didn't introduce and merely moved (and also, typos)
Also, I may have more time later for a higher-level review :)
number_of_cells = len(nbdict.get("cells", 0)) | ||
for cell_idx in range(number_of_cells): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
number_of_cells = len(nbdict.get("cells", 0)) | |
for cell_idx in range(number_of_cells): | |
for cell, cell_error in zip(nbdict.get("cells", []), error_tree["cells"]): |
(and then replace all the nbdict["cells"][cell_idx]
below with cell
and the error_tree["cells"][cell_idx]
with cell_error
That said, I'm saying this because I imagine that error_tree["cells"]
is a list. If it's a dict with no assurance of being correctly ordered, we can do:
number_of_cells = len(nbdict.get("cells", 0)) | |
for cell_idx in range(number_of_cells): | |
for cell_idx, cell in enumerate(nbdict.get("cells", [])): | |
cell_error = error_tree["cells"][cell_idx] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, all that was indented/dedented as is to be extracted into its own function. I'll mark that as refactor later. Here as well I think we want to generate a copy that is fixed instead of mutating in place.
Co-authored-by: Joachim Jablon <[email protected]>
for more information, see https://pre-commit.ci
Thanks, I was not expecting an in-depth review, but as you pointed out your interest in security and this is problematic because of the signature, validate but mutate, save with mutated signature flow, I thought that would be of interest. |
It is :) As discussed, I agree with you regarding this aspect of "security boundaries", that we expect some functions to do checks (especially security-related checks, including hashes of content to ensure non-mutability), others to make changes, and it can quickly become problematic when we try to do both at once. (And I enjoyed every minute of doing this review, don't worry :) ) |
Todo: #282 (comment) need to be refactor later. The xfail test, and |
Ok, let's get that in and see what we break. |
Re-issue of #236. #244 made things even worse with respect to some problem with notebook trust,
where a notebook is trusted, saved, reopened but is not trusted.
The reason being that the signature is computed before validate, and validate mutate the notebook, leading to the signature not matching.
really
validate()
should not mutate the notebook ever. Validate is in part used for security. It should not return a value of give results on a modified object.Just try to imagine if a password comparison function, said
compare('HUNTER@', 'hunter2') == True
because obviously the user had caps lock pressed on their keyboard ? This is the same.We obviously can't change brutally, but we really need to stop having validation and normalisation be the same step.