-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reorganize file source part #95
Conversation
@wcy-fdu let me know if any comments, and pls help provide the unsupported data type, thanks! |
delivery/overview.mdx
Outdated
@@ -105,7 +105,7 @@ When creating an `upsert` sink, note whether or not you need to specify the prim | |||
<Note> | |||
**PUBLIC PREVIEW** | |||
|
|||
Sink data in parquet encode is in the public preview stage, meaning it's nearing the final product but is not yet fully stable. If you encounter any issues or have feedback, please contact us through our [Slack channel](https://www.risingwave.com/slack). Your input is valuable in helping us improve the feature. For more information, see our [Public preview feature list](/changelog/product-lifecycle#features-in-the-public-preview-stage). | |||
Sink data in Parquet encode is in the public preview stage, meaning it's nearing the final product but is not yet fully stable. If you encounter any issues or have feedback, please contact us through our [Slack channel](https://www.risingwave.com/slack). Your input is valuable in helping us improve the feature. For more information, see our [Public preview feature list](/changelog/product-lifecycle#features-in-the-public-preview-stage). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WanYixian Let's always refer to Parquet as a format. "Sinking data in Parquet format..."
Co-authored-by: hengm3467 <[email protected]> Signed-off-by: IrisWan <[email protected]>
Co-authored-by: hengm3467 <[email protected]> Signed-off-by: IrisWan <[email protected]>
@@ -63,6 +63,15 @@ FORMAT [ DEBEZIUM | UPSERT | PLAIN ] ENCODE AVRO ( | |||
|
|||
Note that for `map.handling.mode = 'jsonb'`, the value types can only be: `null`, `boolean`, `int`, `string`, or `map`/`record`/`array` with these types. | |||
|
|||
### Bytes | |||
|
|||
RisingWave allows you to read data streams without decoding the data by using the `BYTES` row format. However, the table or source can have exactly one field of `BYTEA` data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to have an example here? I don't get this exactly one field part. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM. Thanks!
Co-authored-by: hengm3467 <[email protected]> Signed-off-by: IrisWan <[email protected]>
ingestion/overview.mdx
Outdated
| decimal | decimal | | ||
| Int8 | Int16 | | ||
| UInt8 | Int16 | | ||
| UInt16 | Int32 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Int16
should be changed into smallint
, Int32
-> int
, Int64
-> bigint
?
cc @xiangjinwu to confirm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The right column for RisingWave shall use the names smallint
, int
, bigint
, decimal
, real
, double precision
.
The left column for Parquet shall also be consistent. Names shall be int32
and int64
rather than integer
and long
. (And seems int16
is missing?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For unsupported datatype, I think Int96 , FIXED_LEN_BYTE_ARRAY , refer to https://parquet.apache.org/docs/file-format/types/
Not sure if there is any omission, can you help to confirm?
ingestion/overview.mdx
Outdated
| UInt16 | Int32 | | ||
| UInt32 | Int64 | | ||
| UInt64 | Decimal | | ||
| Float16 | Float32 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto. Float32
-> real
Co-authored-by: congyi wang <[email protected]> Signed-off-by: IrisWan <[email protected]>
This reverts commit 11018d2.
Description
Summarize file source part, document batch reading and data type mapping.
Related Code PR
risingwavelabs/risingwave#15358
risingwavelabs/risingwave#19561
Related Doc Issues
Resolve #51
Resolve #86
Preview
File source management: https://risingwavelabs-wyx-file-source-related.mintlify.app/ingestion/overview#file-source-management
Supported Parquet format: https://risingwavelabs-wyx-file-source-related.mintlify.app/ingestion/supported-sources-and-formats#parquet