-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Spark-DL notebooks for CI/CD and update to latest dependencies #439
Conversation
Signed-off-by: Rishi Chandra <[email protected]>
@eordentlich @leewyang requesting a review - thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice. Here is a preliminary set of comments. Will do another round after revisions.
examples/ML+DL-Examples/Spark-DL/dl_inference/pytorch/image_classification.ipynb
Outdated
Show resolved
Hide resolved
examples/ML+DL-Examples/Spark-DL/dl_inference/pytorch/image_classification.ipynb
Outdated
Show resolved
Hide resolved
examples/ML+DL-Examples/Spark-DL/dl_inference/huggingface/conditional_generation_tf.ipynb
Outdated
Show resolved
Hide resolved
examples/ML+DL-Examples/Spark-DL/dl_inference/huggingface/conditional_generation_tf.ipynb
Outdated
Show resolved
Hide resolved
examples/ML+DL-Examples/Spark-DL/dl_inference/huggingface/conditional_generation_tf.ipynb
Outdated
Show resolved
Hide resolved
examples/ML+DL-Examples/Spark-DL/dl_inference/huggingface/conditional_generation_tf.ipynb
Outdated
Show resolved
Hide resolved
examples/ML+DL-Examples/Spark-DL/dl_inference/huggingface/conditional_generation_tf.ipynb
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nits, otherwise LGTM. Good work!
"2024-10-03 00:58:33.914094: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n", | ||
"2024-10-03 00:58:33.919757: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n", | ||
"To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n", | ||
"2024-10-03 00:58:34.259847: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to resolve (E)rrors and (W)arnings (here and in other notebooks), if possible. Otherwise, maybe add a quick comment why this is expected/OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a ton of work. Nice!
A few remaining minor comments/questions/suggestions.
examples/ML+DL-Examples/Spark-DL/dl_inference/huggingface/conditional_generation_tf.ipynb
Show resolved
Hide resolved
"outputs": [], | ||
"source": [ | ||
"spark.conf.set(\"spark.sql.execution.arrow.maxRecordsPerBatch\", \"512\")\n", | ||
"# This line will fail if the vectorized reader runs out of memory\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the 'vectorized reader' and in general what is this line for? @leewyang
examples/ML+DL-Examples/Spark-DL/dl_inference/huggingface/conditional_generation_tf.ipynb
Outdated
Show resolved
Hide resolved
examples/ML+DL-Examples/Spark-DL/dl_inference/huggingface/conditional_generation_tf.ipynb
Show resolved
Hide resolved
examples/ML+DL-Examples/Spark-DL/dl_inference/huggingface/conditional_generation_tf.ipynb
Outdated
Show resolved
Hide resolved
examples/ML+DL-Examples/Spark-DL/dl_inference/pytorch/image_classification.ipynb
Outdated
Show resolved
Hide resolved
examples/ML+DL-Examples/Spark-DL/dl_inference/pytorch/image_classification.ipynb
Outdated
Show resolved
Hide resolved
examples/ML+DL-Examples/Spark-DL/dl_inference/pytorch/regression.ipynb
Outdated
Show resolved
Hide resolved
examples/ML+DL-Examples/Spark-DL/dl_inference/pytorch/regression.ipynb
Outdated
Show resolved
Hide resolved
examples/ML+DL-Examples/Spark-DL/dl_inference/tensorflow/keras-metadata.ipynb
Outdated
Show resolved
Hide resolved
"spark.conf.set(\"spark.sql.execution.arrow.maxRecordsPerBatch\", \"512\")\n", | ||
"# This line will fail if the vectorized reader runs out of memory\n", | ||
"if int(spark.conf.get(\"spark.sql.execution.arrow.maxRecordsPerBatch\")) < 512:\n", | ||
" print(\"Increasing `spark.sql.execution.arrow.maxRecordsPerBatch` to ensure the vectorized reader won't run out of memory\")\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default is 10000 so this is decreasing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right decreasing. I guess this is meant to ensure the batches fit in the internal Arrow buffer, since each record is fairly large?
examples/ML+DL-Examples/Spark-DL/dl_inference/huggingface/conditional_generation_tf.ipynb
Outdated
Show resolved
Hide resolved
examples/ML+DL-Examples/Spark-DL/dl_inference/huggingface/conditional_generation_tf.ipynb
Outdated
Show resolved
Hide resolved
examples/ML+DL-Examples/Spark-DL/dl_inference/huggingface/conditional_generation_tf.ipynb
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Changes:
Version updates:
Environment separation:
CI/CD:
jupyter nbconvert
.