Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

H3 functions missing methods on Databricks: java.lang.NoSuchMethodError: com.uber.h3core.H3Core... #1137

Open
remibaar opened this issue Dec 1, 2023 · 9 comments

Comments

@remibaar
Copy link

remibaar commented Dec 1, 2023

Expected behavior

When following the install information to use Sedona with Databricks.
When using ST_H3CellIDs

I expect to get the H3 indices of the given polygon.

As example I run this SQL query:

SELECT ST_H3CellIDs(ST_GeomFromText('POLYGON((0 0, 10 0, 10 10, 0 10, 0 0))'), 12, FALSE)

Actual behavior

I get this error:

NoSuchMethodException: java.lang.NoSuchMethodError: com.uber.h3core.H3Core.polygonToCells(Ljava/util/List;Ljava/util/List;I)Ljava/util/List;

Steps to reproduce the problem

I used both the pip installation route, and the pure SQL on Databricks.

Both result in the same error.

Settings

Environment Azure Databricks
Databricks runtime: 13.3 LTS

Operating System: Ubuntu 22.04.2 LTS
Java: Zulu 8.70.0.23-CA-linux64
Scala: 2.12.15
Python: 3.10.12
R: 4.2.2
Delta Lake: 2.4.0

Thoughts

I thought H3 might not be included in the shaded version. So I also tried to add the h3-4.1.1.jar to the init script. But this also doesn't solve the issue.

I finally used these scripts:

Download jars

# Create JAR directory for Sedona
mkdir -p /dbfs/FileStore/sedona/jars

# Remove contents of directory
rm -f /dbfs/FileStore/sedona/jars/*.jar

# Download the dependencies from Maven into DBFS
curl -o /dbfs/FileStore/sedona/jars/geotools-wrapper-1.5.0-28.2.jar "https://repo1.maven.org/maven2/org/datasyslab/geotools-wrapper/1.5.0-28.2/geotools-wrapper-1.5.0-28.2.jar"

curl -o /dbfs/FileStore/sedona/jars/sedona-spark-shaded-3.4_2.12-1.5.0.jar "https://repo1.maven.org/maven2/org/apache/sedona/sedona-spark-shaded-3.4_2.12/1.5.0/sedona-spark-shaded-3.4_2.12-1.5.0.jar"

curl -o /dbfs/FileStore/sedona/jars/h3-4.1.1.jar "https://repo1.maven.org/maven2/com/uber/h3/4.1.1/h3-4.1.1.jar"

Create init script

# Create init script
cat > /dbfs/FileStore/sedona/scripts/sedona-init.sh <<'EOF'
#!/bin/bash
#
# File: sedona-init.sh
# 
# On cluster startup, this script will copy the Sedona jars to the cluster's default jar directory.
# In order to activate Sedona functions, remember to add to your spark configuration the Sedona extensions: "spark.sql.extensions org.apache.sedona.viz.sql.SedonaVizExtensions,org.apache.sedona.sql.SedonaSqlExtensions"

cp /dbfs/FileStore/sedona/jars/*.jar /databricks/jars

EOF

All the other functions of Sedona do work. So Sedona is installed properly, I am only unable to use the H3 functions.

Did I miss a step in the set-up? I checked the documentation multiple times, but couldn't find any clue. I hope someone can help me out.

@remibaar remibaar changed the title H3 functions missing methods: java.lang.NoSuchMethodError: com.uber.h3core.H3Core H3 functions missing methods: java.lang.NoSuchMethodError: com.uber.h3core.H3Core... Dec 1, 2023
@remibaar
Copy link
Author

remibaar commented Dec 1, 2023

After some further investigation I see the Databricks runtime also contains H3 functionality. For this it uses com.uber h3 version 3.7.0. Could this be conflicting with the version 4.1.1 which is being used by Sedona? It would explain it as polygonToCells is not available in version 3.x of H3.

@remibaar remibaar changed the title H3 functions missing methods: java.lang.NoSuchMethodError: com.uber.h3core.H3Core... H3 functions missing methods on Databricks: java.lang.NoSuchMethodError: com.uber.h3core.H3Core... Dec 1, 2023
@remibaar
Copy link
Author

remibaar commented Dec 1, 2023

I managed to solve the issue! Indeed it was related to the version of H3 that was being installed in the Databricks runtime.

By adjusting the init script, I remove the older H3 jar from the Databricks jars. This solves the issue.
This is the code for my new init script:

%sh

# Create init script
cat > /dbfs/FileStore/sedona/scripts/sedona-init.sh <<'EOF'
#!/bin/bash
#
# File: sedona-init.sh
# 
# On cluster startup, this script will copy the Sedona jars to the cluster's default jar directory.
# In order to activate Sedona functions, remember to add to your spark configuration the Sedona extensions: "spark.sql.extensions org.apache.sedona.viz.sql.SedonaVizExtensions,org.apache.sedona.sql.SedonaSqlExtensions"

# Remove default H3 version of databricks, as it is not compatible with Sedona > 1.5.0
rm -f /databricks/jars/*com.uber*h3*.jar

# Copy jars
cp /dbfs/FileStore/sedona/jars/*.jar /databricks/jars

EOF

Note: This will break the builtin H3 functionality of Databricks. But I believe the H3 functions of Sedona supersedes those of the built-in H3 of Databricks. The builtin H3 functions will now throw a NoClassDefFoundError

I will keep this issue open, because I am going to create a PR for a change in the docs.
https://github.com/apache/sedona/blob/master/docs/setup/databricks.md

@jiayuasu
Copy link
Member

jiayuasu commented Dec 2, 2023

The main reason is that we shaded the uber-h3 jar into sedona-spark-shaded which leads to conflicts. Another alternative to fix this is that: use sedona-spark jar which does not shade anything, and manually download all dependency jars of Sedona: https://github.com/apache/sedona/blob/master/pom.xml#L139

@remibaar
Copy link
Author

remibaar commented Dec 2, 2023

Another alternative to fix this is that: use sedona-spark jar which does not shade anything, and manually download all dependency jars of Sedona

Please correct me if I am wrong. With this method you also will not be able to use both the H3 of Sedona and the H3 of Databricks. Because they use different major versions (Sedona uses 4.1.1, Databricks uses 3.7.0), which are incompatible.

My personal recommendation would be to remove the H3 3.7.0 jar from the Databricks runtime. This disables the H3 functions of Databricks, but allows the use of the H3 functions of Sedona.
In my opinion the H3 functions of Sedona are more feature complete.

For example one of the features I need is the fullCover of the ST_H3CellIDs function. Which is not available at the Databricks implementation, but is at Sedona

@jiayuasu
Copy link
Member

jiayuasu commented Dec 3, 2023

@remibaar Makes sense to me. Would you please update the doc of Sedona website and create a PR? I am happy to accept it!

@jacob-talroo
Copy link

On Databricks, I too am getting NoSuchMethodError: 'java.util.List com.uber.h3core.H3Core.polygonToCells(java.util.List, java.util.List, int)'. This is weird since I am on a Databricks cluster that should NOT support H3 - it neither SQL nor Photon.

The main reason is that we shaded the uber-h3 jar into sedona-spark-shaded which leads to conflicts. Another alternative to fix this is that: use sedona-spark jar which does not shade anything, and manually download all dependency jars of Sedona: https://github.com/apache/sedona/blob/master/pom.xml#L139

I think that the issue is that h3 is not shaded currently? It appears that Sedona is using the package com.uber. If it was shaded, wouldn't it use a different package name?

I think the current "shaded" JAR might currently just be an Uber Jar - not to be confused with the company behind H3. To shade, I think we need some relocations.

@jiayuasu
Copy link
Member

jiayuasu commented May 10, 2024

@jacob-talroo if you are not planning to use Databricks' h3 functions, maybe you can delete the h3 jars from databricks jars folder, like what described above? rm -f /databricks/jars/*com.uber*h3*.jar And please use sedona-spark-shaded.

Sedona h3 has been used extensively on AWS EMR and Glue. Maybe relocations solve the databricks problem but will cause problems on other platforms.

@jacob-talroo
Copy link

Thank you - that work around is working for now.

Does the comment about the shaded jar actually being an Uber JAR make sense?

@oliverangelil
Copy link

@jiayuasu @remibaar have the docs been updated to mention this workaround (i.e. adding rm -f /databricks/jars/*com.uber*h3*.jar to the init file)? Could you link the page here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants