-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finalization of Dask Batch Import Notebook #429
Conversation
The error comes from |
@@ -57,7 +66,7 @@ | |||
}, | |||
"outputs": [], | |||
"source": [ | |||
"#!pip install dask pandas==2.0.0 'sqlalchemy[crate]'" | |||
"!pip install dask 'pandas==2.0.0' 'crate[sqlalchemy]' 'cratedb-toolkit==0.0.10' 'pueblo>=0.0.7' kaggle" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To stay consistent when it comes to library versions maybe it makes sense to always install those specified in requirements.txt
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why pandas==2.0.0
and not a higher pandas==2.*
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using requirements.txt
, as exercised with the other notebooks, will be much better, because Dependabot does not bump inside Notebooks. In turn, this will quickly get out of sync, increase the risk for havoc, or may even influence the dependencies the other notebooks are validated with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. As opposed to the other notebooks here, which need pandas<2
and sqlalchemy<2
, this one needs more recent versions of both? Is that correct?
If so, bringing this in is actually blocked by one of those?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for contributing this improvement. I've added two comments, one above, and one below. Thanks.
2853f6d
to
acfbcc7
Compare
"## Important Note\n", | ||
"If you are running this notebook on a (free) Google Colab environment, you \n", | ||
"might not see the parallelized execution of Dask operations due to constrained\n", | ||
"CPU availability.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's good to know. Thanks for finding out, and for conveying that information to readers and users of the relevant notebook.
acfbcc7
to
de5cfa9
Compare
Regardless of the suggestions contributed at #429 (review), this patch passes CI, so I am merging it. |
I hope you agree with that. Otherwise, please advise about anything which might have been missed. Thanks. |
Summary of the changes / Why this is an improvement
Checklist