Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execution slows down after 3 modifications of complex structure. #16

Open
chethan1212 opened this issue Jan 4, 2023 · 2 comments
Open

Comments

@chethan1212
Copy link

chethan1212 commented Jan 4, 2023

I am trying to modify multiple values in a complex DF.
I have copied the relevant parts below.

df.printSchema
root
..|--eventData
........|--stringVals
............|--idNumber
............|--firstName
............|--secondName
............|--email
............|--phone
............|--age
............|--dob
..|--idNumber
..|--firstName
..|--secondName
..|--email
..|--phone

var tempDF = df
val listOfFields = List("idNumber","firstName","secondName,","email","phone")

listOfFields.forEach( eachField => {
val lens = Lens("eventData.stringVals." + eachField)(tempDF.schema)
val tempFunc = lens.setDF(col(eachField)) //df contains column with values to replace at root level
tempDF = tempFunc(tempDF)
}

with 3 fileds in the listOfFields code executes in fast. But when I add 5 fields it slows down. I am trying to replace around 25 values in a complex DF which contians 100s of columns at multilelvels.

Please review and suggest a better option.

Thank you,
Che

@alfonsorr
Copy link
Member

Hi @chethan1212 , what version of spark are you using? in newer versions, there's an official alternative instead of using sparkOptics. https://towardsdatascience.com/spark-3-nested-fields-not-so-nested-anymore-9b8d34b00b95

@chethan1212
Copy link
Author

Hi @alfonsorr ,

Thanks for the quick response. Unfortunately we are still on spark 2.4.
Is there anything I could differently with the Optics API ?

Thank you,
Che

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants