Start separating normalisation from validation logic #282

Carreau · 2022-06-08T07:46:20Z

Re-issue of #236. #244 made things even worse with respect to some problem with notebook trust,
where a notebook is trusted, saved, reopened but is not trusted.

The reason being that the signature is computed before validate, and validate mutate the notebook, leading to the signature not matching.

really validate() should not mutate the notebook ever. Validate is in part used for security. It should not return a value of give results on a modified object.

Just try to imagine if a password comparison function, said compare('HUNTER@', 'hunter2') == True because obviously the user had caps lock pressed on their keyboard ? This is the same.

We obviously can't change brutally, but we really need to stop having validation and normalisation be the same step.

We change the validation logic and separate the normalisation from the validation step. We make sure that if a notebook is normalized, it emits a warning. In the future we will turn the warning in to an Error. We add test for the current and an xfail test for the future behavior

Carreau · 2022-06-22T09:00:22Z

@ewjoachim, you expressed interest in Jupyter and security.

ewjoachim

Hey :) Thanks for the invitation to review :)
It's my first time contributing to this project, I've tried to make relevant comments, but I know quite a few comments in here are on code that you didn't introduce and merely moved (and also, typos)

Also, I may have more time later for a higher-level review :)

nbformat/validator.py

ewjoachim · 2022-06-22T12:35:17Z

nbformat/validator.py

+            number_of_cells = len(nbdict.get("cells", 0))
+            for cell_idx in range(number_of_cells):


Suggested change

number_of_cells = len(nbdict.get("cells", 0))

for cell_idx in range(number_of_cells):

for cell, cell_error in zip(nbdict.get("cells", []), error_tree["cells"]):

(and then replace all the nbdict["cells"][cell_idx] below with cell and the error_tree["cells"][cell_idx] with cell_error

That said, I'm saying this because I imagine that error_tree["cells"] is a list. If it's a dict with no assurance of being correctly ordered, we can do:

Suggested change

number_of_cells = len(nbdict.get("cells", 0))

for cell_idx in range(number_of_cells):

for cell_idx, cell in enumerate(nbdict.get("cells", [])):

cell_error = error_tree["cells"][cell_idx]

Yeah, all that was indented/dedented as is to be extracted into its own function. I'll mark that as refactor later. Here as well I think we want to generate a copy that is fixed instead of mutating in place.

tests/test_validator.py

Co-authored-by: Joachim Jablon <[email protected]>

for more information, see https://pre-commit.ci

Carreau · 2022-06-22T12:58:22Z

Hey :) Thanks for the invitation to review :)

Thanks, I was not expecting an in-depth review, but as you pointed out your interest in security and this is problematic because of the signature, validate but mutate, save with mutated signature flow, I thought that would be of interest.

ewjoachim · 2022-06-22T13:14:17Z

I thought that would be of interest.

It is :) As discussed, I agree with you regarding this aspect of "security boundaries", that we expect some functions to do checks (especially security-related checks, including hashes of content to ensure non-mutability), others to make changes, and it can quickly become problematic when we try to do both at once.

(And I enjoyed every minute of doing this review, don't worry :) )

Carreau · 2022-08-03T08:21:02Z

Todo: #282 (comment) need to be refactor later.

The xfail test, and isvalid() should be updated to pass, and isvalid() should not try to fix anything.

Carreau · 2022-08-16T08:17:43Z

Ok, let's get that in and see what we break.

Carreau force-pushed the nbv2 branch from dc0fbf7 to 7b9b215 Compare June 8, 2022 07:58

Carreau added 3 commits June 8, 2022 10:26

please precommit

e967f44

more warnings

9a67bde

no warn none

8b55731

Carreau mentioned this pull request Jun 8, 2022

Notebook validation and security concerns jupyter/security#39

Open

Carreau added 4 commits June 8, 2022 15:16

try to extract logic

fee6c3c

typing

fbc8538

pass test

c106df8

type annotations

3656974

Carreau force-pushed the nbv2 branch from 7d40855 to 3656974 Compare June 9, 2022 06:24

Carreau added 4 commits June 22, 2022 10:26

add deprecation

f15413a

Fix the tests

cdf5800

remove useless local imports

ab5f945

update changelog

7842c8b

Carreau mentioned this pull request Jun 22, 2022

nbformat's validate(strip_invalid_metadata=...) and validate(repair_duplicate_cell_ids=...) will be deprecated. noteable-io/origami#4

Open

Carreau added 3 commits June 22, 2022 14:08

update deprecations

16ff6ed

write strict by default

4697e7f

rely on isvalid

d404679

ewjoachim reviewed Jun 22, 2022

View reviewed changes

Carreau and others added 2 commits June 22, 2022 05:56

Apply suggestions from code review

630530c

Co-authored-by: Joachim Jablon <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

7c86058

for more information, see https://pre-commit.ci

Carreau added 2 commits June 22, 2022 15:02

isvalid does not return errors

733941c

Merge remote-tracking branch 'github/nbv2' into nbv2

e0065da

take reviews into account

4445c69

docs

449b650

Carreau added 3 commits August 3, 2022 10:28

cleanup documentation with veln

1c625fd

note on deprecation

898cd15

test that in_valid does note mutate or autofix

fc45fed

Carreau merged commit 1b5c839 into jupyter:main Aug 16, 2022

Carreau added this to the 5.5.0 milestone Aug 16, 2022

martinRenou mentioned this pull request Oct 3, 2022

Update nbconvert pinning voila-dashboards/voila#1161

Merged

This was referenced Jan 12, 2023

Undeprecate validate(nb, relax_add_props=True) #343

Merged

always pass relax_add_props=True when validating jupyter/nbconvert#1936

Merged

krassowski mentioned this pull request Apr 8, 2023

Ambiguous warning about missing cell IDs #359

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Start separating normalisation from validation logic #282

Start separating normalisation from validation logic #282

Carreau commented Jun 8, 2022

Carreau commented Jun 22, 2022

ewjoachim left a comment •

edited

Loading

ewjoachim Jun 22, 2022

Carreau Jun 22, 2022

Carreau commented Jun 22, 2022

ewjoachim commented Jun 22, 2022 •

edited

Loading

Carreau commented Aug 3, 2022

Carreau commented Aug 16, 2022

		number_of_cells = len(nbdict.get("cells", 0))
		for cell_idx in range(number_of_cells):

	number_of_cells = len(nbdict.get("cells", 0))
	for cell_idx in range(number_of_cells):
	for cell, cell_error in zip(nbdict.get("cells", []), error_tree["cells"]):

Start separating normalisation from validation logic #282

Start separating normalisation from validation logic #282

Conversation

Carreau commented Jun 8, 2022

Carreau commented Jun 22, 2022

ewjoachim left a comment • edited Loading

Choose a reason for hiding this comment

ewjoachim Jun 22, 2022

Choose a reason for hiding this comment

Carreau Jun 22, 2022

Choose a reason for hiding this comment

Carreau commented Jun 22, 2022

ewjoachim commented Jun 22, 2022 • edited Loading

Carreau commented Aug 3, 2022

Carreau commented Aug 16, 2022

ewjoachim left a comment •

edited

Loading

ewjoachim commented Jun 22, 2022 •

edited

Loading