Why is df.coalesce(1) necessary? #57

sunayansaikia · 2019-03-13T17:52:53Z

Hey folks,

Just wanted to understand why 'df.coalesce(1)' was done while writing the dataframe to DFS?
Please refer code here: https://github.com/springml/spark-sftp/blob/master/src/main/scala/com/springml/spark/sftp/DefaultSource.scala#L249

Thanks

samuel-pt · 2019-03-14T06:51:06Z

@sunayansaikia - That was done to have a single file in SFTP

sunayansaikia · 2019-03-17T08:21:40Z

Hey @samuel-pt : is this a hard requirement? Can we not download multiple files for upload via SFTP? Can't this option be made configurable?

samuel-pt · 2019-03-18T08:37:34Z

@sunayansaikia - Its not hard. we can just add a configurable parameter and use it. Following are needed

sunayansaikia · 2019-03-19T03:45:12Z

ok - cool. Will take a checkout and see.

shaikmanu797 · 2019-07-24T20:36:28Z

@sunayansaikia, the below PR should be able to fix the coalesce numPartitions configuration
#68

sunayansaikia · 2019-07-26T16:05:11Z

samuel-pt closed this as completed Mar 14, 2019

samuel-pt reopened this Mar 14, 2019

Provide feedback