Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Ensure that we can support DBR 14.3LTS #33

Open
3 tasks done
dannymeijer opened this issue May 30, 2024 · 10 comments · Fixed by #63
Open
3 tasks done

[FEATURE] Ensure that we can support DBR 14.3LTS #33

dannymeijer opened this issue May 30, 2024 · 10 comments · Fixed by #63
Labels
enhancement New feature or request
Milestone

Comments

@dannymeijer
Copy link
Member

dannymeijer commented May 30, 2024

Is your feature request related to a problem? Please describe.

N/A

Describe the solution you'd like

We should add support for DBR 14.3 LTS

This means we need compatibility with:

  • Apache Spark: 3.5.0
  • Python: 3.10.12 (already covered)
  • Delta Lake: 3.1.0

Additionally, we need to look at how Spark Connect changes things for us.
Any reference we have to JVM directly, we should investigate.
Only Shared cluster mode is affected according to docs.

Describe alternatives you've considered

N/A

Additional context

N/A

@dannymeijer dannymeijer added the enhancement New feature or request label May 30, 2024
@mikita-sakalouski
Copy link
Contributor

The whole idea was to introduce the internal koheesio spark session to be able to provide easy switch between remote and local modes.

Also if I'm not wrong pydantic is checking SparkSession type based on the full import path and with remote spark session it is imported from a different path, at least it was in a such way sometime ago

@pariksheet
Copy link
Contributor

@pariksheet
Copy link
Contributor

pariksheet commented Jul 29, 2024

check the affected/reference code within Koheesio

Not available on Databricks Connect for Databricks Runtime 13.3 LTS and below:

  • Streaming foreachBatch ==> @riccamini -- NA -- works for 14.3 & above. confirm this assumption.

Not available:

  • Databricks Utilities: credentials, library, notebook workflow, widgets -- not sure what does this mean (brickflow is affected ) ==> @pariksheet with @asingamaneni

  • SparkContext -- No jvm operations ==> @mikita-sakalouski @dannymeijer

  • Changing the log4j log level through SparkContext -- need to check the code ==> Nathan

@pariksheet
Copy link
Contributor

Run the unit test locally with spark-connect remote instant.

@pariksheet
Copy link
Contributor

Check how to manage SparkSession and DatabricksSession.

@riccamini
Copy link
Contributor

I have added details in here related to foreachbatch function: #56

If you prefer collecting everything here I will copy paste the comment and close the issue.

One additional point that I do not see in the list is Dataframe.rdd which is being used in some tests

@pariksheet
Copy link
Contributor

pariksheet commented Sep 9, 2024

  • Delta merge through Delta API not possible. (only supported from Spark 4.0 and Delta 4.0)
  • Delta Merge through SPARK SQL still works
  • How to support 3.x and 4.x spark/delta versions
  • Support for Snowflake DDLs through _jvm

@pariksheet
Copy link
Contributor

There is a way to check Spark Session is remote or native.

we should introduce api/function to get spark session flag and check against the specific APIs e.g. delta merge /snowflake and raise the exception.

@mikita-sakalouski mikita-sakalouski linked a pull request Sep 10, 2024 that will close this issue
9 tasks
@pariksheet
Copy link
Contributor

-- use snowflake-connector-python instead of spark._jvm

@dannymeijer dannymeijer added this to the 0.9.0 milestone Oct 2, 2024
@dannymeijer
Copy link
Member Author

All of these should be addressed as part of release 0.9.0 (currently in pre-release). Please verify your usecases accordingly so we can proceed with the release.

@dannymeijer dannymeijer moved this from Todo to In progress in Koheesio Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: In progress
Development

Successfully merging a pull request may close this issue.

4 participants