Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception when running kmeans.ipynb #3

Closed
adamtracymartin opened this issue Jun 12, 2021 · 1 comment
Closed

Exception when running kmeans.ipynb #3

adamtracymartin opened this issue Jun 12, 2021 · 1 comment

Comments

@adamtracymartin
Copy link

When running kmeans.ipynb, on the step:

spark = SparkSession.builder.appName('demo').master("local").getOrCreate()

socialDF = spark.read.format("org.apache.spark.sql.cassandra").options(table="socialmedia", keyspace="accelerate").load()

print ("Table Row Count: ")
print (socialDF.count())

I receive this exception:
Py4JJavaError: An error occurred while calling o38.load.
: java.lang.NoClassDefFoundError: scala/Product$class
at com.datastax.spark.connector.util.ConfigParameter.(ConfigParameter.scala:7)
at com.datastax.spark.connector.rdd.ReadConf$.(ReadConf.scala:33)
at com.datastax.spark.connector.rdd.ReadConf$.(ReadConf.scala)
at org.apache.spark.sql.cassandra.DefaultSource$.(DefaultSource.scala:134)
at org.apache.spark.sql.cassandra.DefaultSource$.(DefaultSource.scala)
at org.apache.spark.sql.cassandra.DefaultSource.createRelation(DefaultSource.scala:55)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:355)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:325)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:307)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:307)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:225)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.ClassNotFoundException: scala.Product$class
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:471)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:589)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
... 23 more

After some reading, I have found that some of the early versions of the spark-cassandra-connector were not compatible with scala 2.11, but I cannot figure out how to fix this.

@adamtracymartin
Copy link
Author

I figured it out. I looked at the version of the spark jars in /usr/local/spark/jars, and it was 2.12-3.1.2. So, I updated the docker-compose.yml with the latest version of the spark-cassandra-connector that I could find.

  PYSPARK_SUBMIT_ARGS: '--packages com.datastax.spark:spark-cassandra-connector_2.12:3.0.1 --conf spark.cassandra.connection.host=dse pyspark-shell'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant