Parquet output plugin for Embulk

Overview

Plugin type: output
Load all or nothing: no
Resume supported: no
Cleanup supported: no

Configuration

path_prefix: A prefix of output path. This is hadoop Path URI, and you can also include scheme and authority within this parameter. (string, required)
file_ext: An extension of output path. (string, default: .parquet)
sequence_format: (string, default: .%03d)
block_size: A block size of parquet file. (int, default: 134217728(128M))
page_size: A page size of parquet file. (int, default: 1048576(1M))
compression_codec: A compression codec. available: UNCOMPRESSED, SNAPPY, GZIP (string, default: UNCOMPRESSED)
default_timezone: Time zone of timestamp columns. This can be overwritten for each column using column_options
default_timestamp_format: Format of timestamp columns. This can be overwritten for each column using column_options
column_options: Specify timezone and timestamp format for each column. Format of this option is the same as the official csv formatter. See document.
config_files: List of path to Hadoop's configuration files (array of string, default: [])
extra_configurations: Add extra entries to Configuration which will be passed to ParquetWriter
overwrite: Overwrite if output files already exist. (default: fail if files exist)
enablesigv4: Enable Signature Version 4 Signing Process for S3 eu-central-1(Frankfurt) region
addUTF8: If true, string and timestamp columns are stored with OriginalType.UTF8 (boolean, default false)

Example

out:
  type: parquet
  path_prefix: file:///data/output

How to write parquet files into S3

out:
  type: parquet
  path_prefix: s3a://bucket/keys
  extra_configurations:
    fs.s3a.access.key: 'your_access_key'
    fs.s3a.secret.key: 'your_secret_access_key'

Build

$ ./gradlew gem

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Parquet output plugin for Embulk

Overview

Configuration

Example

How to write parquet files into S3

Build

Files

README.md

Latest commit

History

README.md

File metadata and controls

Parquet output plugin for Embulk

Overview

Configuration

Example

How to write parquet files into S3

Build