Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using spark sftp with databricks #70

Open
jch231 opened this issue Nov 11, 2019 · 1 comment
Open

using spark sftp with databricks #70

jch231 opened this issue Nov 11, 2019 · 1 comment

Comments

@jch231
Copy link

jch231 commented Nov 11, 2019

I was wondering if you could help me with using spark-sftp with databricks. Firstly, I am struggling to import the library in databricks - I can only see (very few) examples in the documentation on loading in a dataframe, but nothing on how we import the library into the notebook itself. Secondly, is there a python API for spark-sftp, or is the functionality only available in Scala? (I develop using pyspark by can get past this by loading in the dataframe using scala and creating a temp view to access the dataframe with python). Thanks!

@FurcyPin
Copy link

FurcyPin commented Dec 9, 2019

Hello,

All you have to do is make sure the jar of this project is added to your spark environment. I don't use Databricks but I think you could start there: https://docs.databricks.com/libraries.html

As for the pyspark API, it is exactly the same. You can write

df = spark.read.format("com.springml.spark.sftp").options(...).load()

and it should work.

And if you see this error:

java.lang.ClassNotFoundException: com.springml.spark.sftp.DefaultSource

It means that you did not add the jar correctly and that your spark install can't find it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants