Number of threads to use for processing
Default Value:
The number of CPU cores on the machine
Text encoding of input files
Default Value:
UTF8
Whether the first line of the CSV files are column names
Default Value:
false
The delimiter of the input files
Default Value:
Method of compression used for input files. Use UNCOMPRESSED for no compression
Default Value:
Application will attempt to deduce the compression used
Where to find the output schema
Default Value:
An all String schema will be generated with column names matching the header. If no header is present, they will be assigned names in the spark style ie _c0, _c1, ect
Splits into multiple output files. null will create a single output partition.
WARNING: More partitions means more memory usage
Default Value:
null
What compression to use for the output parquet files
Default Value:
SNAPPY
Final file name/path
Default Value:
null
How many records to store if a reader is unable to write to the output parquet file. This can be reduced to lower memory usage at the expense of processing time when there are highly skewed partitions; 0 is an acceptable value
Default Value:
4096
If true, the app will write the partitions to disk before attempting to convert them. If false, the partitioning will occur at the same time as conversion.
False will increase performance drastically, but memory usage will scale with the number of partitions.
Default Value:
Will be set based on the configured partitioning strategy.
Where to write temp files.
Default Value:
$outputPath/.tmp/
Whether to terminate if an input file is malformed
Default Value:
fail
{
"configVersion" : "1",
"threads" : null,
"encoding" : "UTF8",
"hasHeader" : false,
"separator" : ",",
"inputCompression" : "detect",
"schemaPath" : null,
"partitionBy" : null,
"outputCompression" : "SNAPPY",
"outputPath" : null,
"backlogLimit" : 0,
"useDiskBuffer" : null,
"workingDirectory" : null,
"inputErrorTolerance" : "fail"
}