Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download bop_iip6_q via JSON API #309

Open
jpmurphy69 opened this issue Dec 28, 2024 · 1 comment
Open

Download bop_iip6_q via JSON API #309

jpmurphy69 opened this issue Dec 28, 2024 · 1 comment

Comments

@jpmurphy69
Copy link

bop_iip6_q is too big for me to download without filtering (Error: cannot allocate vector of size 2.3 Gb )
I also can't get the example code from the Eurostat tutorial (https://ropengov.github.io/eurostat/articles/eurostat_tutorial.html) to work for this dataset. The code does work for the example dataset given. I've also tried changing filters and removing them altogether.

Example code from Eurostat tutorial - altered,

#Download BoP IIP6
BOP_IIP6 <- get_eurostat_json("bop_iip6_q", filters = list(
geo = "EU28"
))
Error in get_eurostat_json("bop_iip6_q", filters = list(geo = "EU28")) :

@pitkant
Copy link
Member

pitkant commented Feb 5, 2025

@jpmurphy69 Thank you for opening this issue.

"Error: cannot allocate vector of size 2.3 Gb" sounds like something that is more related to your local system than the eurostat package / Eurostat API / R. Here is a Stackoverflow issue that deals with something similar: https://stackoverflow.com/questions/5171593/r-memory-management-cannot-allocate-vector-of-size-n-mb

With regards to the dataset bop_iip6_q, I tried replicating your issue:

BOP_IIP6 <- get_eurostat_json("bop_iip6_q", filters = list(
    geo = "EU28"
))

which resulted in the following error message:

Error in get_eurostat_json("bop_iip6_q", filters = list(geo = "EU28")) : 
  HTTP status: 413 (Request entity too large)
  Error id: 413 
  Error label from API: EXTRACTION_TOO_BIG: The requested extraction is too big, estimated 9691557120 rows, max authorised is 5000000, please change your filters to reduce the extraction size

In the Eurostat API, some queries are indeed too big for download. From the Eurostat documentation:

Depending on the request, a data query can result in a (potentially very) large response in which case data is delivered asynchronously.

For the SDMX APIs, data can be returned either synchronously or asynchronously: […] Asynchronously: the data is not returned directly in the response. Instead a key is returned in the response which allows to access the data through the async API to check for its availability and eventually retrieve it once available.

Also:

The decision whether to deliver the data synchronously or asynchronously is related to factors such as the complexity of the query and the volume of the data (number of cells) to be returned:

  • if the data is cached -> the data is returned synchronously
  • if the data has to be extracted, the "cost" of the request is estimated and:
    • if below 500 000 cells, the data is returned synchronously
    • if between 500 000 cells and 5 000 000 cells, the data is returned asynchronously
  • if above 5 000 000 cells, a client request error is returned and more filters need to be added to the extraction query to reduce its estimated cost.

I would guess that based on the name of the dataset it is quarterly data and if the time series is long enough it can be quite big. I suggest downloading the dataset by using start and end dates:

BOP_IIP6 <- get_eurostat_json("bop_iip6_q", filters = list(
    geo = "EU28",
    sinceTimePeriod = 2016,
    untilTimePeriod = 2022
))

Results in the message:

Error in get_eurostat_json("bop_iip6_q", filters = list(geo = "EU28",  : 
  HTTP status: 413 (Request entity too large)
  Error id: 413 
  Error label from API: EXTRACTION_TOO_BIG: The requested extraction is too big, estimated 2019074400 rows, max authorised is 5000000, please change your filters to reduce the extraction size

Still too big! Let's try one quarter:

BOP_IIP6 <- get_eurostat_json("bop_iip6_q", filters = list(
    geo = "EU28",
    time = "2022-Q1"
))

Error in get_eurostat_json("bop_iip6_q", filters = list(geo = "EU28",  : 
  HTTP status: 413 (Request entity too large)
  Error id: 413 
  Error label from API: EXTRACTION_TOO_BIG: The requested extraction is too big, estimated 57687840 rows, max authorised is 5000000, please change your filters to reduce the extraction size

Ok, it's a big dataset... Let's look at the variables:

> label_eurostat_vars(id = "bop_iip6_q")
 [1] "Time frequency"                        
 [2] "Currency"                              
 [3] "Balance of payments item"              
 [4] "Sector (ESA 2010)"                     
 [5] "Sector (ESA 2010)"                     
 [6] "Stock or flow"                         
 [7] "Geopolitical entity (partner)"         
 [8] "Geopolitical entity (reporting)"       
 [9] "Time"                                  
[10] "Observation status (Flag) V2 structure"
[11] "Confidentiality status (flag)"         
[12] "Observation value"      

Maybe if you include filters for balance of payments item that would reduce the number of rows significantly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants