Download bop_iip6_q via JSON API #309

jpmurphy69 · 2024-12-28T11:16:50Z

bop_iip6_q is too big for me to download without filtering (Error: cannot allocate vector of size 2.3 Gb )
I also can't get the example code from the Eurostat tutorial (https://ropengov.github.io/eurostat/articles/eurostat_tutorial.html) to work for this dataset. The code does work for the example dataset given. I've also tried changing filters and removing them altogether.

Example code from Eurostat tutorial - altered,

#Download BoP IIP6
BOP_IIP6 <- get_eurostat_json("bop_iip6_q", filters = list(
geo = "EU28"
))
Error in get_eurostat_json("bop_iip6_q", filters = list(geo = "EU28")) :

pitkant · 2025-02-05T14:13:31Z

@jpmurphy69 Thank you for opening this issue.

"Error: cannot allocate vector of size 2.3 Gb" sounds like something that is more related to your local system than the eurostat package / Eurostat API / R. Here is a Stackoverflow issue that deals with something similar: https://stackoverflow.com/questions/5171593/r-memory-management-cannot-allocate-vector-of-size-n-mb

With regards to the dataset bop_iip6_q, I tried replicating your issue:

BOP_IIP6 <- get_eurostat_json("bop_iip6_q", filters = list(
    geo = "EU28"
))

which resulted in the following error message:

Error in get_eurostat_json("bop_iip6_q", filters = list(geo = "EU28")) : 
  HTTP status: 413 (Request entity too large)
  Error id: 413 
  Error label from API: EXTRACTION_TOO_BIG: The requested extraction is too big, estimated 9691557120 rows, max authorised is 5000000, please change your filters to reduce the extraction size

In the Eurostat API, some queries are indeed too big for download. From the Eurostat documentation:

Depending on the request, a data query can result in a (potentially very) large response in which case data is delivered asynchronously.

For the SDMX APIs, data can be returned either synchronously or asynchronously: […] Asynchronously: the data is not returned directly in the response. Instead a key is returned in the response which allows to access the data through the async API to check for its availability and eventually retrieve it once available.

Also:

The decision whether to deliver the data synchronously or asynchronously is related to factors such as the complexity of the query and the volume of the data (number of cells) to be returned:

if the data is cached -> the data is returned synchronously

if the data has to be extracted, the "cost" of the request is estimated and:

if below 500 000 cells, the data is returned synchronously

if between 500 000 cells and 5 000 000 cells, the data is returned asynchronously

if above 5 000 000 cells, a client request error is returned and more filters need to be added to the extraction query to reduce its estimated cost.

I would guess that based on the name of the dataset it is quarterly data and if the time series is long enough it can be quite big. I suggest downloading the dataset by using start and end dates:

BOP_IIP6 <- get_eurostat_json("bop_iip6_q", filters = list(
    geo = "EU28",
    sinceTimePeriod = 2016,
    untilTimePeriod = 2022
))

Results in the message:

Error in get_eurostat_json("bop_iip6_q", filters = list(geo = "EU28",  : 
  HTTP status: 413 (Request entity too large)
  Error id: 413 
  Error label from API: EXTRACTION_TOO_BIG: The requested extraction is too big, estimated 2019074400 rows, max authorised is 5000000, please change your filters to reduce the extraction size

Still too big! Let's try one quarter:

BOP_IIP6 <- get_eurostat_json("bop_iip6_q", filters = list(
    geo = "EU28",
    time = "2022-Q1"
))


Error in get_eurostat_json("bop_iip6_q", filters = list(geo = "EU28",  : 
  HTTP status: 413 (Request entity too large)
  Error id: 413 
  Error label from API: EXTRACTION_TOO_BIG: The requested extraction is too big, estimated 57687840 rows, max authorised is 5000000, please change your filters to reduce the extraction size

Ok, it's a big dataset... Let's look at the variables:

> label_eurostat_vars(id = "bop_iip6_q")
 [1] "Time frequency"                        
 [2] "Currency"                              
 [3] "Balance of payments item"              
 [4] "Sector (ESA 2010)"                     
 [5] "Sector (ESA 2010)"                     
 [6] "Stock or flow"                         
 [7] "Geopolitical entity (partner)"         
 [8] "Geopolitical entity (reporting)"       
 [9] "Time"                                  
[10] "Observation status (Flag) V2 structure"
[11] "Confidentiality status (flag)"         
[12] "Observation value"

Maybe if you include filters for balance of payments item that would reduce the number of rows significantly?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Download bop_iip6_q via JSON API #309

Download bop_iip6_q via JSON API #309

jpmurphy69 commented Dec 28, 2024

pitkant commented Feb 5, 2025

Download bop_iip6_q via JSON API #309

Download bop_iip6_q via JSON API #309

Comments

jpmurphy69 commented Dec 28, 2024

pitkant commented Feb 5, 2025