[FEATURE] DataFrame type fails pydantic validation when using Spark Connect / Serverless #62

maxim-mityutko · 2024-09-09T10:20:43Z

Is your feature request related to a problem? Please describe.

Pydantic enforces strict types. In the current implementation all Spark related logic (readers, writers, transforms, integrations) expect DataFrame (pyspark.sql.DataFrame) class as input or output. However in Spark Connect and subsequently in the Serverless compute the DataFrame class is pyspark.sql.connect.DataFrame, which causes errors in pydantic model validations.

Describe the solution you'd like

Model should except both native and connect DataFrames as a valid input / output

Describe alternatives you've considered

...

Additional context

...

mikita-sakalouski · 2024-09-09T12:41:56Z

Maybe take a look at here: https://github.com/Nike-Inc/koheesio/tree/33-feature-ensure-that-we-can-support-dbr-143lts

mikita-sakalouski · 2024-09-09T12:42:22Z

We will have to integrate changes you doing to this branch.

mikita-sakalouski · 2024-09-09T12:43:44Z

This commit especially: 2e86208#diff-5a83c2ed86f340ff6a3110cd2a8a71ddd29947487301ad7941e4d99fbc8def6cR12

maxim-mityutko · 2024-09-09T14:06:47Z

@mikita-sakalouski to be honest, I would prefer if we treat this a separate issue and merge to main as part of the #59 .
Reason being is that this small change only addresses the strict validation of the DF types, nothing else. It unlocks the next steps for the serverless compute poc.
The feature you are referring is much bigger and will probably require more testing and time.

maxim-mityutko added the enhancement New feature or request label Sep 9, 2024

maxim-mityutko mentioned this issue Sep 9, 2024

[feature] SparkDataFrame / SparkConnectDataFrame #59

Open

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] DataFrame type fails pydantic validation when using Spark Connect / Serverless #62

[FEATURE] DataFrame type fails pydantic validation when using Spark Connect / Serverless #62

maxim-mityutko commented Sep 9, 2024

mikita-sakalouski commented Sep 9, 2024

mikita-sakalouski commented Sep 9, 2024

mikita-sakalouski commented Sep 9, 2024

maxim-mityutko commented Sep 9, 2024

[FEATURE] DataFrame type fails pydantic validation when using Spark Connect / Serverless #62

[FEATURE] DataFrame type fails pydantic validation when using Spark Connect / Serverless #62

Comments

maxim-mityutko commented Sep 9, 2024

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

mikita-sakalouski commented Sep 9, 2024

mikita-sakalouski commented Sep 9, 2024

mikita-sakalouski commented Sep 9, 2024

maxim-mityutko commented Sep 9, 2024