Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finalization of Dask Batch Import Notebook #429

Merged
merged 2 commits into from
May 6, 2024
Merged

Conversation

ckurze
Copy link
Contributor

@ckurze ckurze commented Apr 19, 2024

Summary of the changes / Why this is an improvement

  • Finalization in wording
  • Finalization in dependencies, tested on Google Colab.

Checklist

  • Link to issue this PR refers to (if applicable):
  • CLA is signed

@ckurze ckurze marked this pull request as ready for review April 19, 2024 07:12
@marijaselakovic
Copy link
Contributor

The error comes from timeseries-anomaly-detection.ipynb notebook, reported in #426.

@@ -57,7 +66,7 @@
},
"outputs": [],
"source": [
"#!pip install dask pandas==2.0.0 'sqlalchemy[crate]'"
"!pip install dask 'pandas==2.0.0' 'crate[sqlalchemy]' 'cratedb-toolkit==0.0.10' 'pueblo>=0.0.7' kaggle"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To stay consistent when it comes to library versions maybe it makes sense to always install those specified in requirements.txt.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why pandas==2.0.0 and not a higher pandas==2.*?

Copy link
Member

@amotl amotl Apr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using requirements.txt, as exercised with the other notebooks, will be much better, because Dependabot does not bump inside Notebooks. In turn, this will quickly get out of sync, increase the risk for havoc, or may even influence the dependencies the other notebooks are validated with.

Copy link
Member

@amotl amotl May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. As opposed to the other notebooks here, which need pandas<2 and sqlalchemy<2, this one needs more recent versions of both? Is that correct?

If so, bringing this in is actually blocked by one of those?

Copy link
Member

@amotl amotl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contributing this improvement. I've added two comments, one above, and one below. Thanks.

topic/timeseries/dask-weather-data-import.ipynb Outdated Show resolved Hide resolved
Comment on lines +14 to +17
"## Important Note\n",
"If you are running this notebook on a (free) Google Colab environment, you \n",
"might not see the parallelized execution of Dask operations due to constrained\n",
"CPU availability.\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good to know. Thanks for finding out, and for conveying that information to readers and users of the relevant notebook.

@amotl
Copy link
Member

amotl commented May 6, 2024

Regardless of the suggestions contributed at #429 (review), this patch passes CI, so I am merging it.

@amotl amotl merged commit 3513b7c into main May 6, 2024
2 checks passed
@amotl amotl deleted the ckurze/dask-finalize branch May 6, 2024 20:57
@amotl
Copy link
Member

amotl commented May 8, 2024

I hope you agree with that. Otherwise, please advise about anything which might have been missed. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants