Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Databricks Tools Notebooks for Tools v24.06.0 #381

Merged
merged 4 commits into from
Jun 13, 2024

Conversation

parthosa
Copy link
Collaborator

@parthosa parthosa commented May 2, 2024

Fixes #380. This PR fixes the databricks notebook for compatibility with Tools v24.06.0. Would recommend feedback by testing the changes E2E.

Changes

Qualification User Tools Notebook Template:

  1. Fix virtual environment setup for user tools notebook.
    • Using %pip install spark_rapids_user_tools directly since the libraries are automatically notebook scoped and we do not need to create an additional virtual environment.
  2. Add Tools Version argument to use custom tools version.
  3. Show Console Output as well since it contains the Top Candidates view
  4. Fix summary section - Currently it shows rapids_4_spark_qualification_output.csv but for user tools it should be qualification_summary.csv.
  5. Include a Download button that downloads a zip file with output files and logs.

Future Maintenence:

  1. Added a line in the README to record the latest version of tools used to test the notebooks.
  2. For future, If we want to make newer version of tools as default:
    • Test the notebooks for this newer version.
    • If everything looks good, we should update the version number in this line.
    • Update TOOLS_VER variable in each notebook with the new version number.

Evaluation

  1. Tested for Databricks AWS and Azure platforms.
  2. Tested for the following file locations:
    • /dbfs/<eventlog>
    • Cannot support s3 and abfs locations because aws and az CLI are not available.

cc: @amahussein

@parthosa parthosa added the bug Something isn't working label May 2, 2024
@parthosa parthosa self-assigned this May 2, 2024
@parthosa parthosa changed the title Update Databricks Tools Notebooks for 24.02.4 Update Databricks Tools Notebooks for Tools v24.02.4 May 2, 2024
@parthosa parthosa marked this pull request as draft May 3, 2024 17:15
@parthosa parthosa marked this pull request as ready for review May 3, 2024 17:45
@viadea viadea requested a review from kuhushukla May 9, 2024 17:00
@SurajAralihalli
Copy link
Collaborator

Cannot support s3 and abfs locations because aws and az CLI are not available.
Note User Tools Notebook might still fail since aws and az CLI cmds are not available.

We can include these limitations in the readme.

@cindyyuanjiang
Copy link
Contributor

We should consider moving output info to cells after running the qualification tool. It would be nice if the first cell contains a concise summary of what the qual tool does.

@parthosa
Copy link
Collaborator Author

Address review feedback and included a download button to download output results as a zip along with log files.

@parthosa parthosa changed the title Update Databricks Tools Notebooks for Tools v24.02.4 Update Databricks Tools Notebooks for Tools v24.04.0 May 23, 2024
@nvliyuan
Copy link
Collaborator

nvliyuan commented May 24, 2024

Hi @parthosa , can we target this pr to branch-24.06? This repo follows the same release strategy as the plugin, thx.

@parthosa parthosa changed the base branch from main to branch-24.06 May 24, 2024 01:46
@parthosa parthosa changed the base branch from branch-24.06 to main May 24, 2024 01:52
@parthosa parthosa changed the base branch from main to branch-24.06 May 24, 2024 01:53
@parthosa
Copy link
Collaborator Author

parthosa commented May 24, 2024

@nvliyuan, Updated the target PR to branch-24.06. Thank you.

@parthosa parthosa force-pushed the spark-rapids-examples-380 branch 2 times, most recently from 3e4f2be to df895a3 Compare May 24, 2024 05:09
Copy link
Collaborator

@SurajAralihalli SurajAralihalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LTGM, tested the notebooks on Azure Databricks.

@cindyyuanjiang
Copy link
Contributor

LGTM, tested both notebooks. Thanks @parthosa!

@parthosa parthosa dismissed SurajAralihalli’s stale review May 24, 2024 21:41

Thank you for the reviews. Holding it off till next minor release of tools which handles the error when CSP CLIs are not present (in this case databricks and aws) NVIDIA/spark-rapids-tools#1035

@parthosa parthosa requested a review from kuhushukla June 5, 2024 00:27
@parthosa
Copy link
Collaborator Author

parthosa commented Jun 5, 2024

Based on offline discussion with @viadea:

  • Added a note with the current limitation of using single event logs for the Qualification notebook.
  • We can merge this PR for now.
  • Plan to submit a subsequent PR to update the notebooks to use newer version of tools.

Result

image

@kuhushukla
Copy link
Collaborator

@parthosa what related issue with respect to single log (csp cli creds) needs to be fixed first in order for that limitation to resolve? I see that NVIDIA/spark-rapids-tools#1035 is merged.

@parthosa
Copy link
Collaborator Author

parthosa commented Jun 5, 2024

Yes @kuhushukla, it is merged and will be available to users from next release.

@kuhushukla
Copy link
Collaborator

kuhushukla commented Jun 5, 2024

Yes @kuhushukla, it is merged and will be available to users from next release.

In that case, using this for multiple logs note would be deprecated even as part of this change right?

@parthosa parthosa changed the title Update Databricks Tools Notebooks for Tools v24.04.0 Update Databricks Tools Notebooks for Tools v24.06.0 Jun 13, 2024
Copy link
Collaborator

@kuhushukla kuhushukla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@parthosa parthosa merged commit 2e8f028 into NVIDIA:branch-24.06 Jun 13, 2024
2 checks passed
@parthosa parthosa deleted the spark-rapids-examples-380 branch June 13, 2024 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update Databricks Notebook for Tools 24.02
5 participants