-
Notifications
You must be signed in to change notification settings - Fork 334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(mito): Implement SST format for mito2 #2178
feat(mito): Implement SST format for mito2 #2178
Conversation
8cb4479
to
c619a17
Compare
c619a17
to
a466935
Compare
a554ce9
to
8123d3a
Compare
Codecov Report
@@ Coverage Diff @@
## develop #2178 +/- ##
===========================================
- Coverage 84.68% 84.29% -0.39%
===========================================
Files 698 700 +2
Lines 112701 113147 +446
===========================================
- Hits 95437 95377 -60
- Misses 17264 17770 +506 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except the missing tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* chore: update comment * feat: stream writer takes arrow's types * feat: Define Batch struct * feat: arrow_schema_to_store * refactor: rename * feat: write parquet in new format with tsids * feat: reader support projection * feat: Impl read compat * refactor: rename SchemaCompat to CompatRecordBatch * feat: changing sst format * feat: make it compile * feat: remove tsid and some structs * feat: from_sst_record_batch wip * chore: push array * chore: wip * feat: decode batches from RecordBatch * feat: reader converts record batches * feat: remove compat mod * chore: remove some codes * feat: sort fields by column id * test: test to_sst_arrow_schema * feat: do not sort fields * test: more test helpers * feat: simplify projection * fix: projection indices is incorrect * refactor: define write/read format * test: test write format * test: test projection * test: test convert record batch * feat: remove unused errors * refactor: wrap get_field_batch_columns * chore: clippy * chore: fix clippy * feat: build arrow schema from region meta in ReadFormat * feat: initialize the parquet reader at `build()` * chore: fix typo
I hereby agree to the terms of the GreptimeDB CLA
What's changed and what's your intention?
This PR implements the SST format for mito2 engine.
The new SST format encodes the primary keys in a memory-comparable format and stores them as dictionary arrays. We distinguish different time series by comparing the keys of the dictionary array while decoding the
RecordBatch
.We store three internal columns in parquet:
__primary_key
, the primary key of the row (tags).__sequence
, the sequence number of a row.__op_type
, the op type of the row.The schema of a parquet file is:
Checklist
Refer to a related PR or issue link (optional)