[BUG] ERROR rapids.tools.qualification: Failed to execute the prediction model #1393
Labels
question
Further information is requested
user_tools
Scope the wrapper module running CSP, QualX, and reports (python)
Describe the bug
I'm using the qualification tool over an eventlog generated by the execution of a Databricks Workflow Job.
I'm getting the following errors when using the qualification tool:
Processing...⣟2024-10-24 10:48:43,531 ERROR rapids.tools.qualification: Failed to execute the prediction model. Using default speed up of 1.0 for all apps. Reason - KeyError:'startTime' ERROR: Could not find elements [('rd-fleet.8xlarge',)] 2024-10-24 10:48:43,542 ERROR rapids.tools.cluster_inference: Error while inferring cluster: Instance type rd-fleet.8xlarge is not found in catalog. Processing...⡿2024-10-24 10:48:43,609 ERROR rapids.tools.AdditionalHeuristics: Cannot apply heuristics for qualification. Reason - FileNotFoundError:[Errno 2] No such file or directory: '/Users/username/repos/nvidia-rapids/qual_20241024134808_8B440b4b/rapids_4_spark_qualification_output/raw_metrics/app-20241022192347-0000/stage_level_aggregated_task_metrics.csv'
After the error is thrown, the tool generates the report but indicates there are no compatible apps.
Steps/Code to reproduce bug
Expected behavior
No errors and a recommendation about the cluster shape I should use to improve performance.
Environment details (please complete the following information)
Additional context
No additional context.
The text was updated successfully, but these errors were encountered: