Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for gz zip files (feature request) #351

Open
jhelvy opened this issue Sep 7, 2021 · 3 comments
Open

Support for gz zip files (feature request) #351

jhelvy opened this issue Sep 7, 2021 · 3 comments

Comments

@jhelvy
Copy link

jhelvy commented Sep 7, 2021

Hi! Loving this packing!

I wanted to ask if support for gz files could be added. Right now if I use the zip_to_disk.frame() function on a .gz file I get an error. Here's a simple example:

> library(disk.frame)
> write.csv(mtcars, "mtcars.csv") # Create csv
> system("gzip mtcars.csv") # Zip it to .gz file
> df <- zip_to_disk.frame("mtcars.csv.gz") # Attempt to read .gz file to disk.frame

This produces this error:

Error in unzip(zipfile, list = TRUE) : 
  zip file 'mtcars.csv.gz' cannot be opened

My solution for now is to use {vroom} to read in the .gz and then convert it to a disk.frame:

library(vroom)
df <- as.disk.frame(vroom("mtcars.csv.gz"))

It works just fine, but I have a file with several hundred .gz files so it'd be nicer if I could just point to the folder and have {disk.frame} read them in sequentially.

@xiaodaigh
Copy link
Collaborator

I see. For now I think this would work

list_of_disk.frames = lapply(files, function(file) {
  as.disk.frame(vroom(file))
})


fnl_disk.frame = rbindlist.disk.frame(list_of_disk.frames)

@jhelvy
Copy link
Author

jhelvy commented Sep 7, 2021

Oh nice work around! I didn't even know about rbindlist.disk.frame().
This should work as a generic solution too for lots of different file types.

@xiaodaigh
Copy link
Collaborator

Yep. The only think I would say is that you can parallelize the lapply using future.apply

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants