Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: vortex CLI convert parquet to Vortex #2649

Merged
merged 7 commits into from
Mar 13, 2025
Merged

feat: vortex CLI convert parquet to Vortex #2649

merged 7 commits into from
Mar 13, 2025

Conversation

a10y
Copy link
Contributor

@a10y a10y commented Mar 10, 2025

From our discussion earlier today

@a10y a10y requested a review from gatesn March 10, 2025 21:33
@a10y a10y force-pushed the aduffy/vx-convert branch from 45d9857 to 85f1e76 Compare March 10, 2025 21:35
@a10y a10y force-pushed the aduffy/vx-convert branch from 85f1e76 to 7d310bb Compare March 10, 2025 21:37
@a10y a10y enabled auto-merge (squash) March 10, 2025 21:41
@@ -19,12 +19,19 @@ use vortex::file::VortexWriteOptions;
use vortex::stream::{ArrayStream, ArrayStreamArrayExt};
use vortex::{Array, ArrayRef};

#[derive(Default)]
pub struct Flags {
pub quiet: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it usually the other way around?!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think usually you want the print. See curl or wget for example

);
pub async fn exec_convert(input_path: impl AsRef<Path>, flags: Flags) -> VortexResult<()> {
if !flags.quiet {
println!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe eprintln!

for batch in reader.by_ref() {
let batch = ArrowStructArray::from(batch?);
let next_chunk = ArrayRef::from_arrow(&batch, true);
chunks.push(next_chunk);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to just stream these straight into the writer without buffering everything in memory. using the row group count as your progress indicator

@a10y a10y force-pushed the aduffy/vx-convert branch from 04bce1a to be90b6e Compare March 13, 2025 00:37
@a10y a10y disabled auto-merge March 13, 2025 00:37
@a10y a10y force-pushed the aduffy/vx-convert branch from be90b6e to 8adef64 Compare March 13, 2025 00:41
@a10y a10y merged commit 273de98 into develop Mar 13, 2025
27 checks passed
@a10y a10y deleted the aduffy/vx-convert branch March 13, 2025 01:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants