write to a big CSV but sparse on where all values are NA #10

mdsumner · 2022-09-27T12:49:54Z

library(grout) ## remotes::install_github(c("hypertidy/vaster", "hypertidy/vapour", "hypertidy/grout"))
library(vaster)
library(vapour)
library(readr)
library(raadfiles)


## these are files I have around
#file  <- ibcso_files()$fullname
##file <- oisst_monthly_files()$fullname[1]

info <- vapour_raster_info(file)

info$dimension
info$projstring
info$extent
info$block

## we don't want blocks, we want to write it in normal x*y order
## but let's do it in multiple lines at a time (200)
tiles <- grout(file, 
               blocksize = c(info$dimension[1],  200))

index <- tile_index(tiles)

index
# # A tibble: 96 × 9
# tile offset_x offset_y  ncol  nrow     xmin    xmax    ymin    ymax
# <int>    <dbl>    <dbl> <dbl> <dbl>    <dbl>   <dbl>   <dbl>   <dbl>
#   1     1        0        0 19200   200 -4800000 4800000 4700000 4800000
# 2     2        0      200 19200   200 -4800000 4800000 4600000 4700000
# 3     3        0      400 19200   200 -4800000 4800000 4500000 4600000
# 4     4        0      600 19200   200 -4800000 4800000 4400000 4500000
# 5     5        0      800 19200   200 -4800000 4800000 4300000 4400000
# 6     6        0     1000 19200   200 -4800000 4800000 4200000 4300000
# 7     7        0     1200 19200   200 -4800000 4800000 4100000 4200000
# 8     8        0     1400 19200   200 -4800000 4800000 4000000 4100000
# 9     9        0     1600 19200   200 -4800000 4800000 3900000 4000000
# 10    10        0     1800 19200   200 -4800000 4800000 3800000 3900000
# # … with 86 more rows
# 

library(dplyr)
outfile <- "outfile.csv"
## read the raster, create a dataframe write/append that
for (i in seq_len(nrow(index))) {
  tile <- slice(index, i)
  ext0 <- unlist(select(tile, xmin, xmax, ymin, ymax))
  ## here we could read multiple bands (or visit multiple sources for bands)
  v <- vapour_warp_raster(file, bands = 1, 
                          extent = ext0, 
                          dimension  = unlist(select(tile, ncol, nrow)))
  vals <- tibble::as_tibble(v)
  bad <- rowSums(is.na(vals)) == ncol(vals)
  cell <- cell_from_extent(info$dimension, info$extent, ext0)[!bad]
  xy <- xy_from_cell(info$dimension, info$extent, cell)
  d <- tibble::tibble(x = xy[,1], y = xy[,2])
  d <- bind_cols(d, vals[!bad, ])
  ## we might want to modify the names here
  write_csv(d, file = outfile, append = i > 1)
}

mdsumner · 2022-09-27T13:32:04Z

hideous great big CSV 8Gb for 19200x19200 IBCSO

here's a sample of 1e6 pixels from the CSV - there's no sparseness because no NA values, the background is 0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

write to a big CSV but sparse on where all values are NA #10

write to a big CSV but sparse on where all values are NA #10

mdsumner commented Sep 27, 2022 •

edited

Loading

mdsumner commented Sep 27, 2022 •

edited

Loading

write to a big CSV but sparse on where all values are NA #10

write to a big CSV but sparse on where all values are NA #10

Comments

mdsumner commented Sep 27, 2022 • edited Loading

mdsumner commented Sep 27, 2022 • edited Loading

mdsumner commented Sep 27, 2022 •

edited

Loading

mdsumner commented Sep 27, 2022 •

edited

Loading