Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

write to a big CSV but sparse on where all values are NA #10

Open
mdsumner opened this issue Sep 27, 2022 · 1 comment
Open

write to a big CSV but sparse on where all values are NA #10

mdsumner opened this issue Sep 27, 2022 · 1 comment

Comments

@mdsumner
Copy link
Member

mdsumner commented Sep 27, 2022

library(grout) ## remotes::install_github(c("hypertidy/vaster", "hypertidy/vapour", "hypertidy/grout"))
library(vaster)
library(vapour)
library(readr)
library(raadfiles)


## these are files I have around
#file  <- ibcso_files()$fullname
##file <- oisst_monthly_files()$fullname[1]

info <- vapour_raster_info(file)

info$dimension
info$projstring
info$extent
info$block

## we don't want blocks, we want to write it in normal x*y order
## but let's do it in multiple lines at a time (200)
tiles <- grout(file, 
               blocksize = c(info$dimension[1],  200))

index <- tile_index(tiles)

index
# # A tibble: 96 × 9
# tile offset_x offset_y  ncol  nrow     xmin    xmax    ymin    ymax
# <int>    <dbl>    <dbl> <dbl> <dbl>    <dbl>   <dbl>   <dbl>   <dbl>
#   1     1        0        0 19200   200 -4800000 4800000 4700000 4800000
# 2     2        0      200 19200   200 -4800000 4800000 4600000 4700000
# 3     3        0      400 19200   200 -4800000 4800000 4500000 4600000
# 4     4        0      600 19200   200 -4800000 4800000 4400000 4500000
# 5     5        0      800 19200   200 -4800000 4800000 4300000 4400000
# 6     6        0     1000 19200   200 -4800000 4800000 4200000 4300000
# 7     7        0     1200 19200   200 -4800000 4800000 4100000 4200000
# 8     8        0     1400 19200   200 -4800000 4800000 4000000 4100000
# 9     9        0     1600 19200   200 -4800000 4800000 3900000 4000000
# 10    10        0     1800 19200   200 -4800000 4800000 3800000 3900000
# # … with 86 more rows
# 

library(dplyr)
outfile <- "outfile.csv"
## read the raster, create a dataframe write/append that
for (i in seq_len(nrow(index))) {
  tile <- slice(index, i)
  ext0 <- unlist(select(tile, xmin, xmax, ymin, ymax))
  ## here we could read multiple bands (or visit multiple sources for bands)
  v <- vapour_warp_raster(file, bands = 1, 
                          extent = ext0, 
                          dimension  = unlist(select(tile, ncol, nrow)))
  vals <- tibble::as_tibble(v)
  bad <- rowSums(is.na(vals)) == ncol(vals)
  cell <- cell_from_extent(info$dimension, info$extent, ext0)[!bad]
  xy <- xy_from_cell(info$dimension, info$extent, cell)
  d <- tibble::tibble(x = xy[,1], y = xy[,2])
  d <- bind_cols(d, vals[!bad, ])
  ## we might want to modify the names here
  write_csv(d, file = outfile, append = i > 1)
}
@mdsumner
Copy link
Member Author

mdsumner commented Sep 27, 2022

hideous great big CSV 8Gb for 19200x19200 IBCSO

here's a sample of 1e6 pixels from the CSV - there's no sparseness because no NA values, the background is 0

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant