[BUG] ERROR rapids.tools.qualification: Failed to execute the prediction model #1393

estebanmodmed · 2024-10-24T14:22:25Z

Describe the bug
I'm using the qualification tool over an eventlog generated by the execution of a Databricks Workflow Job.

I'm getting the following errors when using the qualification tool:

Processing...⣟2024-10-24 10:48:43,531 ERROR rapids.tools.qualification: Failed to execute the prediction model. Using default speed up of 1.0 for all apps. Reason - KeyError:'startTime' ERROR: Could not find elements [('rd-fleet.8xlarge',)] 2024-10-24 10:48:43,542 ERROR rapids.tools.cluster_inference: Error while inferring cluster: Instance type rd-fleet.8xlarge is not found in catalog. Processing...⡿2024-10-24 10:48:43,609 ERROR rapids.tools.AdditionalHeuristics: Cannot apply heuristics for qualification. Reason - FileNotFoundError:[Errno 2] No such file or directory: '/Users/username/repos/nvidia-rapids/qual_20241024134808_8B440b4b/rapids_4_spark_qualification_output/raw_metrics/app-20241022192347-0000/stage_level_aggregated_task_metrics.csv'

After the error is thrown, the tool generates the report but indicates there are no compatible apps.

Steps/Code to reproduce bug

Execute qualification tool with the following parameters:

spark_rapids qualification --platform databricks-aws --eventlogs logs/cluster_id/eventlog/cluster_id_10_69_238_61/some_id/eventlog

Expected behavior
No errors and a recommendation about the cluster shape I should use to improve performance.

Environment details (please complete the following information)

Environment location: It was executed in my local environment, using logs generated by Databricks AWS, which I previously downloaded to my machine.

Additional context
No additional context.

The text was updated successfully, but these errors were encountered:

parthosa · 2024-10-25T20:10:29Z

Hi @estebanmodmed,

It seems the path you provided maybe incomplete, which is causing the Tool to read partial event logs. Databricks stores event logs in a rolling manner as:

ls -l logs/<cluster-id>/eventlog/<cluster-id>_<some-id>/<some-id>
eventlog
eventlog-2024-02-20--04-50.gz
eventlog-2024-02-20--05-00.gz
eventlog-2024-02-20--05-10.gz
eventlog-2024-02-20--05-20.gz

To fix this, I would recommend using the parent directory instead of pointing directly to a specific eventlog file.

Recommended CMD:

 spark_rapids qualification  --platform databricks-aws  --eventlogs logs/cluster_id/eventlog/cluster_id_10_69_238_61/some_id

The application seems to have run on Databricks Fleet instances ('rd-fleet.8xlarge'). Currently, we don't support fleet instances, but we will update our catalog to include them. However, this is mostly a log message and is not related to the tool’s failure.

With the recommended CMD and path, you should be able to run the tool and get speedup estimation and recommendation about the cluster shape.

amahussein · 2025-05-24T03:56:33Z

Thanks @parthosa
@estebanmodmed please feel free to reopen if you have any further question.

estebanmodmed added ? - Needs Triage bug Something isn't working labels Oct 24, 2024

estebanmodmed changed the title ~~[BUG]~~ [BUG] ERROR rapids.tools.qualification: Failed to execute the prediction model Oct 24, 2024

amahussein assigned parthosa Oct 25, 2024

amahussein added user_tools Scope the wrapper module running CSP, QualX, and reports (python) question Further information is requested and removed ? - Needs Triage bug Something isn't working labels Oct 25, 2024

amahussein mentioned this issue Nov 14, 2024

[FEA] Support Databricks fleet clusters #1422

Open

amahussein closed this as completed May 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] ERROR rapids.tools.qualification: Failed to execute the prediction model #1393

[BUG] ERROR rapids.tools.qualification: Failed to execute the prediction model #1393

estebanmodmed commented Oct 24, 2024 •

edited

Loading

parthosa commented Oct 25, 2024 •

edited

Loading

Uh oh!

amahussein commented May 24, 2025

Uh oh!

[BUG] ERROR rapids.tools.qualification: Failed to execute the prediction model #1393

[BUG] ERROR rapids.tools.qualification: Failed to execute the prediction model #1393

Comments

estebanmodmed commented Oct 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

parthosa commented Oct 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amahussein commented May 24, 2025

Uh oh!

estebanmodmed commented Oct 24, 2024 •

edited

Loading

parthosa commented Oct 25, 2024 •

edited

Loading