Getting Exception in thread "main" org.apache.spark.SparkClassNotFoundException: [DATA_SOURCE_NOT_FOUND] Failed to find the data source: deltaSharing. Please find packages at `https://spark.apache.org/third-party-projects.html`. while trying to read table as dataframe from a share. #428

mohika-knoldus · 2023-10-25T08:09:23Z

import io.delta.sharing.client
import org.apache.spark.sql.SparkSession

object ReadSharedData extends App {

val spark = SparkSession.builder()
.master("local[1]")
.appName("Read Shared Data")
.getOrCreate()

val profilePath = "/home/knoldus/Desktop/Delta Open Sharing/resources/config.share"
val sharedFiles = client.DeltaSharingRestClient(profilePath).listAllTables()
sharedFiles.foreach(println) /// this works fine and lists all the tables in the share provided by data provider.

val popular_products_df = spark.read.format("deltaSharing").load("/home/knoldus/Desktop/Delta Open Sharing/resources/config.share#checkout_data_products.data_products.popular_products_data")
popular_products_df.show()

oliverangelil · 2024-02-22T19:26:34Z

@mohika-knoldus did you resolve this? I'm having the same issue.

mohika-knoldus · 2024-03-14T16:25:45Z

No @oliverangelil .

oliverangelil · 2024-03-14T17:17:22Z

@mohika-knoldus

The solution was to install apache Hadoop.
If you add some config to your spark context it will download it automatically:

spark = (SparkSession
.builder
.config('spark.jars.packages', 'org.apache.hadoop:hadoop-azure:3.3.1,io.delta:delta-core_2.12:2.2.0,io.delta:delta-sharing-spark_2.12:0.6.2')
.config('spark.sql.extensions', 'io.delta.sql.DeltaSparkSessionExtension')
.config('spark.sql.catalog.spark_catalog', 'org.apache.spark.sql.delta.catalog.DeltaCatalog')
.getOrCreate()
)

Or you can download it from the website.

Then you can read the table in like this:
delta_sharing.load_as_spark(table_url).show()
or like this:
spark.read.format("deltasharing").load(table_url).limit(100)

You can alternatively read the table in without Hadoop, if you use delta_sharing.load_as_pandas(table_url, limit=10)

mohika-knoldus · 2024-04-03T14:35:13Z

so either there is a dependency on python library or apache hadoop at the end ?

mohika-knoldus · 2024-04-03T14:35:40Z

Thank you for the solution. @oliverangelil

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting Exception in thread "main" org.apache.spark.SparkClassNotFoundException: [DATA_SOURCE_NOT_FOUND] Failed to find the data source: deltaSharing. Please find packages at `https://spark.apache.org/third-party-projects.html`. while trying to read table as dataframe from a share. #428

Getting Exception in thread "main" org.apache.spark.SparkClassNotFoundException: [DATA_SOURCE_NOT_FOUND] Failed to find the data source: deltaSharing. Please find packages at `https://spark.apache.org/third-party-projects.html`. while trying to read table as dataframe from a share. #428

mohika-knoldus commented Oct 25, 2023

oliverangelil commented Feb 22, 2024

mohika-knoldus commented Mar 14, 2024

oliverangelil commented Mar 14, 2024 •

edited

Loading

mohika-knoldus commented Apr 3, 2024

mohika-knoldus commented Apr 3, 2024

Getting Exception in thread "main" org.apache.spark.SparkClassNotFoundException: [DATA_SOURCE_NOT_FOUND] Failed to find the data source: deltaSharing. Please find packages at https://spark.apache.org/third-party-projects.html. while trying to read table as dataframe from a share. #428

Getting Exception in thread "main" org.apache.spark.SparkClassNotFoundException: [DATA_SOURCE_NOT_FOUND] Failed to find the data source: deltaSharing. Please find packages at https://spark.apache.org/third-party-projects.html. while trying to read table as dataframe from a share. #428

Comments

mohika-knoldus commented Oct 25, 2023

oliverangelil commented Feb 22, 2024

mohika-knoldus commented Mar 14, 2024

oliverangelil commented Mar 14, 2024 • edited Loading

mohika-knoldus commented Apr 3, 2024

mohika-knoldus commented Apr 3, 2024

Getting Exception in thread "main" org.apache.spark.SparkClassNotFoundException: [DATA_SOURCE_NOT_FOUND] Failed to find the data source: deltaSharing. Please find packages at `https://spark.apache.org/third-party-projects.html`. while trying to read table as dataframe from a share. #428

Getting Exception in thread "main" org.apache.spark.SparkClassNotFoundException: [DATA_SOURCE_NOT_FOUND] Failed to find the data source: deltaSharing. Please find packages at `https://spark.apache.org/third-party-projects.html`. while trying to read table as dataframe from a share. #428

oliverangelil commented Mar 14, 2024 •

edited

Loading