[BUG] Cannot auto-tune Dataproc on GKE using spark_rapids profiling --cluster #1433
Labels
bug
Something isn't working
user_tools
Scope the wrapper module running CSP, QualX, and reports (python)
Describe the bug
Following the official examples here, cannot profile event logs from Dataproc on GKE cluster due to following error:
Steps/Code to reproduce bug
spark_rapids profiling --cluster <cluster> -p dataproc -v --eventlogs gs://<logs>
This problem persists regardless of using the cluster name or the YAML from
gcloud dataproc clusters describe
I was able to actually run the tool successfully by omitting the
--cluster
portion. However, this seems to not give any cluster recommendations, despite having a large amount of cluster config info in the profiling output.Expected behavior
I expected the tool to work with Dataproc on GKE.
Environment details (please complete the following information)
Spark is being run on Dataproc on GKE. Tool is being run locally.
Additional context
I tried using a couple versions of the tool (built from source):
v24.10.0
,v24.10.1
,v24.08.2
. I also tried from simplepip install
.I also tried manually defaulting the
master_nodes_from_conf[0]
variable, but it simply uncovered a string of other issues revolving around not detecting the cluster config.I believe the issue is partially due to the fact that, with Dataproc on GKE, the nodes can be/are expected to be ephemeral.
As a quick "remedy", we can document the lack of "Dataproc on GKE" support for the auto-tuning part of the tool.
Possibly relevant issues:
The text was updated successfully, but these errors were encountered: