Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read in multiple csvs when file paths aren't amenable to glob syntax #146

Closed
nicki-dese opened this issue May 2, 2024 · 4 comments
Closed

Comments

@nicki-dese
Copy link

nicki-dese commented May 2, 2024

I routinely work with multiple large csvs with a mess of file paths that aren't amenable to glob syntax. When working with duckdb I can supply these as, say SELECT * FROM read_csv([file_1.csv, file_2.csv]) and that works. I can't figure out how to do the equivalent in duckplyr.

I've tried:

file_paths <- c("file_1.csv", "file_2.csv) OR
file_paths <- list("file_1.csv", "file_2.csv")

duckplyr_df_from_csv(file_paths) %>% do_something

It doesn't error, but it only reads in the first file.

Is this possible? if so how? If not, I think there should at least be a warning if a list or vector of multiple file paths are passed.

@krlmlr
Copy link
Member

krlmlr commented May 2, 2024

Thanks. Code like file_paths %>% map(duckplyr_df_from_csv) %>% bind_rows() has worked for me in practice, but I agree that this should be streamlined. Would you like to contribute a PR?

@nicki-dese
Copy link
Author

I hadn't thouight to use map, thanks for the tip.

I'm sorry I do not have the experience or knowledge of how to do a PR :(

@krlmlr
Copy link
Member

krlmlr commented Jul 8, 2024

bind_rows() reads into memory, %>% reduce(union_all) is better but will also read into memory in duckplyr 0.4.0 (works better in duckplyr 0.3.0): tidyverse/dplyr#7049 .

What should work is duckplyr_df_from_csv("file_*.csv"), but I hear this is not an option here, and I'm seeing mixed results too: duckdb/duckdb#12903 .

Action item: Implement bind_rows() to use reduce(union_all) under the hood.

@krlmlr
Copy link
Member

krlmlr commented Jul 8, 2024

The action items here are a subset of those in #181 (comment), let's move the discussion there.

@krlmlr krlmlr closed this as completed Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants