Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Json parsing error #62

Open
Skealz opened this issue Jan 29, 2024 · 9 comments
Open

Json parsing error #62

Skealz opened this issue Jan 29, 2024 · 9 comments
Labels
bug Something isn't working

Comments

@Skealz
Copy link

Skealz commented Jan 29, 2024

🐛 Bug Report

 41%|████      | 50/122 [00:21<00:29,  2.44it/s]ERROR:root:impossible to get data from CDSfor query: https://catalogue
.dataspace.copernicus.eu/odata/v1/Products?$filter=OData.CSC.Intersects(area=geography'SRID=4326;POLYGON((127.25 31.23
333333333333, 127.2432586173411 31.09610933687195, 127.2230993925645 30.96020688251075, 127.1897164700251 30.826934785
17709, 127.1434313455158 30.69757652802221, 127.0846897700877 30.57337790177694, 127.0140574572236 30.45553500710589, 
126.9322146347078 30.34518273550423, 126.8399494936612 30.24338383967217, 126.7381505978291 30.1511186986255, 126.6277
983262274 30.06927587610977, 126.5099554315564 29.99864356324564, 126.3857568053111 29.93990198781754, 126.25639854815
62 29.89361686330824, 126.1231264508226 29.86023394076881, 125.9872239964614 29.84007471599226, 125.85 29.833333333333
34, 125.7127760035386 29.84007471599226, 125.5768735491774 29.86023394076881, 125.4436014518437 29.89361686330824, 125
.3142431946889 29.93990198781753, 125.1900445684436 29.99864356324564, 125.0722016737726 30.06927587610977, 124.961849
4021709 30.1511186986255, 124.8600505063388 30.24338383967217, 124.7677853652922 30.34518273550423, 124.6859425427764 
30.45553500710589, 124.6153102299123 30.57337790177694, 124.5565686544842 30.69757652802221, 124.5102835299749 30.8269
3478517708, 124.4769006074355 30.96020688251075, 124.4567413826589 31.09610933687195, 124.45 31.23333333333333, 124.45
67413826589 31.37055732979472, 124.4769006074355 31.50645978415591, 124.5102835299749 31.63973188148958, 124.556568654
4842 31.76909013864446, 124.6153102299123 31.89328876488973, 124.6859425427764 32.01113165956077, 124.7677853652922 32
.12148393116244, 124.8600505063388 32.2232828269945, 124.9618494021709 32.31554796804117, 125.0722016737726 32.3973907
905569, 125.1900445684436 32.46802310342103, 125.3142431946889 32.52676467884913, 125.4436014518437 32.57304980335843,
 125.5768735491774 32.60643272589785, 125.7127760035386 32.62659195067441, 125.85 32.63333333333333, 125.9872239964614
 32.62659195067441, 126.1231264508226 32.60643272589786, 126.2563985481562 32.57304980335843, 126.3857568053111 32.526
76467884914, 126.5099554315564 32.46802310342103, 126.6277983262274 32.3973907905569, 126.7381505978291 32.31554796804
117, 126.8399494936612 32.22328282699451, 126.9322146347078 32.12148393116244, 127.0140574572236 32.01113165956078, 12
7.0846897700877 31.89328876488974, 127.1434313455158 31.76909013864447, 127.1897164700251 31.63973188148959, 127.22309
93925645 31.50645978415593, 127.2432586173411 31.37055732979473, 127.25 31.23333333333334, 127.25 31.23333333333333))'
) and Collection/Name eq 'SENTINEL-1' and Attributes/OData.CSC.StringAttribute/any(att:att/Name eq 'productType' and a
tt/OData.CSC.StringAttribute/Value eq 'GRD') and ContentDate/Start gt 2022-09-05T06:30:00.000Z and ContentDate/Start l
t 2022-09-05T07:30:00.000Z&$top=1000&$expand=Attributes: Traceback (most recent call last):
  File "/home1/datahome/oarcher/storm_watch/conda3/lib/python3.8/site-packages/cdsodatacli/query.py", line 500, in fet
ch_one_url
    json_data = requests.get(url).json()
  File "/home1/datahome/oarcher/storm_watch/conda3/lib/python3.8/site-packages/requests/models.py", line 898, in json
    return complexjson.loads(self.text, **kwargs)
  File "/home1/datahome/oarcher/storm_watch/conda3/lib/python3.8/json/__init__.py", line 357, in loads
    return _default_decoder.decode(s)
  File "/home1/datahome/oarcher/storm_watch/conda3/lib/python3.8/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home1/datahome/oarcher/storm_watch/conda3/lib/python3.8/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

🔬 How To Reproduce

Il semble que ça ne se reproduise pas systématiquement du tout (même avec le même geodataframe en entrée). Je me demande si c'est pas lié à du rate-limiting, car on dirait que ça n'arrive que lorsque j'effectue plusieurs requêtes + ou - d'affilée.

Il faudrait que je chope le contenu du json en réponse...
Une première chose à faire dans le code de cdsodatacli, c'est d'afficher le contenu que renvoie le site web en cas derreur, avant de parser avec json.
J'essaye de faire ça de mon côté.

Environment

conda list

_libgcc_mutex             0.1                  main.conda
argon2-cffi               20.1.0           py38h1e0a361_1    conda-forge
async_generator           1.10                       py_0    conda-forge
atcf                      0.0.4.dev8+g3ed850d           <pip>
attrs                     20.2.0             pyh9f0ad1d_0    conda-forge
backcall                  0.2.0              pyh9f0ad1d_0    conda-forge
backports                 1.0                        py_2    conda-forge
backports.functools_lru_cache 1.6.1                      py_0    conda-forge
bleach                    3.2.1              pyh9f0ad1d_0    conda-forge
boost-cpp                 1.74.0               h9359b55_0    conda-forge
brotlipy                  0.7.0           py38h1e0a361_1000    conda-forge
bzip2                     1.0.8                h516909a_3    conda-forge
c-ares                    1.16.1               h516909a_3    conda-forge
ca-certificates           2020.6.20            hecda079_0    conda-forge
cairo                     1.16.0            h3fc0475_1005    conda-forge
cdsodatacli               2023.12.19                <pip>
certifi                   2020.6.20        py38h32f6830_0    conda-forge
cffi                      1.14.3           py38h5bae8af_0    conda-forge
cfitsio                   3.470                hce51eda_6    conda-forge
chardet                   3.0.4           py38h32f6830_1007    conda-forge
click                     8.1.7                     <pip>
click                     7.1.2              pyh9f0ad1d_0    conda-forge
click-plugins             1.1.1                      py_0    conda-forge
cligj                     0.5.0                      py_0    conda-forge
cryptography              3.1              py38h766eaa4_0    conda-forge
curl                      7.71.1               he644dc0_6    conda-forge
cycler                    0.10.0                     py_2    conda-forge
dbus                      1.13.6               he372182_0    conda-forge
decorator                 4.4.2                      py_0    conda-forge
defusedxml                0.6.0                      py_0    conda-forge
descartes                 1.1.0                      py_4    conda-forge
entrypoints               0.3             py38h32f6830_1001    conda-forge
expat                     2.2.9                he1b5a44_2    conda-forge
fiona                     1.9.5                     <pip>
fiona                     1.8.17           py38h676c6b2_0    conda-forge
fontconfig                2.13.1            h1056068_1002    conda-forge
freetype                  2.10.2               he06d7ca_0    conda-forge
freexl                    1.0.5             h516909a_1002    conda-forge
future                    0.18.2                    <pip>
gdal                      3.1.2            py38hb61cb63_1    conda-forge
geo-shapely               0.0.6.dev7+g913bbce           <pip>
geopandas                 0.8.1                      py_0    conda-forge
geopandas                 0.13.2                    <pip>
geopandas-coloc           0.0.4.dev2+g43cb5f3           <pip>
geos                      3.8.1                he1b5a44_0    conda-forge
geotiff                   1.6.0                ha04d9d0_1    conda-forge
gettext                   0.19.8.1          hc5be6a0_1002    conda-forge
giflib                    5.2.1                h516909a_2    conda-forge
glib                      2.66.0               h0dae87d_0    conda-forge
gst-plugins-base          1.14.5               h0935bb2_2    conda-forge
gstreamer                 1.14.5               h36ae1b5_2    conda-forge
hdf4                      4.2.13            hf30be14_1003    conda-forge
hdf5                      1.10.6          nompi_h3c11f04_101    conda-forge
html2text                 2020.1.16                 <pip>
icu                       67.1                 he1b5a44_0    conda-forge
idna                      2.10               pyh9f0ad1d_0    conda-forge
importlib-metadata        1.7.0            py38h32f6830_0    conda-forge
importlib_metadata        1.7.0                         0    conda-forge
ipykernel                 5.3.4            py38h23f93f0_0    conda-forge
ipython                   7.18.1           py38h1cdfbd6_0    conda-forge
ipython_genutils          0.2.0                      py_1    conda-forge
ipywidgets                7.5.1              pyh9f0ad1d_1    conda-forge
jedi                      0.15.2                   py38_0    conda-forge
jinja2                    2.11.2             pyh9f0ad1d_0    conda-forge
jpeg                      9d                   h516909a_0    conda-forge
json-c                    0.13.1            hbfbb72e_1002    conda-forge
jsonschema                3.2.0            py38h32f6830_1    conda-forge
jupyter                   1.0.0                      py_2    conda-forge
jupyter_client            6.1.7                      py_0    conda-forge
jupyter_console           6.2.0                      py_0    conda-forge
jupyter_core              4.6.3            py38h32f6830_1    conda-forge
jupyterlab_pygments       0.1.1              pyh9f0ad1d_0    conda-forge
kealib                    1.4.13               h33137a7_1    conda-forge
kiwisolver                1.2.0            py38hbf85e49_0    conda-forge
krb5                      1.17.1               hfafb76e_3    conda-forge
lcms2                     2.11                 hbd6801e_0    conda-forge
ld_impl_linux-64          2.35                 h769bd43_9    conda-forge
libblas                   3.8.0               17_openblas    conda-forge
libcblas                  3.8.0               17_openblas    conda-forge
libclang                  10.0.1          default_hde54327_1    conda-forge
libcurl                   7.71.1               hcdd3856_6    conda-forge
libdap4                   3.20.6               h1d1bd15_1    conda-forge
libedit                   3.1.20191231    h14c3975_1.conda
libev                     4.33                 h516909a_1    conda-forge
libevent                  2.1.10               hcdb4288_2    conda-forge
libffi                    3.2.1             he1b5a44_1007    conda-forge
libgcc-ng                 9.1.0           hdf63c60_0.conda
libgdal                   3.1.2                hb2a6f5f_1    conda-forge
libgfortran-ng            7.5.0               hdf63c60_16    conda-forge
libiconv                  1.16                 h516909a_0    conda-forge
libkml                    1.3.0             h74f7ee3_1012    conda-forge
liblapack                 3.8.0               17_openblas    conda-forge
libllvm10                 10.0.1               he513fc3_3    conda-forge
libnetcdf                 4.7.4           nompi_h84807e1_105    conda-forge
libnghttp2                1.41.0               h8cfc5f6_2    conda-forge
libopenblas               0.3.10          pthreads_hb3c22a3_4    conda-forge
libpng                    1.6.37               hed695b0_2    conda-forge
libpq                     12.3                 h5513abc_0    conda-forge
libsodium                 1.0.18               h516909a_0    conda-forge
libspatialindex           1.9.3                he1b5a44_3    conda-forge
libspatialite             4.3.0a            h57f1b35_1039    conda-forge
libssh2                   1.9.0                hab1572f_5    conda-forge
libstdcxx-ng              9.1.0           hdf63c60_0.conda
libtiff                   4.1.0                hc7e4089_6    conda-forge
libuuid                   2.32.1            h14c3975_1000    conda-forge
libwebp-base              1.1.0                h516909a_3    conda-forge
libxcb                    1.13              h14c3975_1002    conda-forge
libxkbcommon              0.10.0               he1b5a44_0    conda-forge
libxml2                   2.9.10               h68273f3_2    conda-forge
libxslt                   1.1.33               h572872d_1    conda-forge
lxml                      4.5.2            py38hbb43d70_0    conda-forge
lz4-c                     1.9.2                he1b5a44_3    conda-forge
markupsafe                1.1.1            py38h1e0a361_1    conda-forge
matplotlib                3.3.2                         0    conda-forge
matplotlib-base           3.3.2            py38h91b0d89_0    conda-forge
mistune                   0.8.4           py38h1e0a361_1001    conda-forge
munch                     2.5.0                      py_0    conda-forge
mysql-common              8.0.21                        2    conda-forge
mysql-libs                8.0.21               hf3661c5_2    conda-forge
nbclient                  0.5.0                      py_0    conda-forge
nbconvert                 6.0.3            py38h32f6830_0    conda-forge
nbformat                  5.0.7                      py_0    conda-forge
ncurses                   6.2             he6710b0_1.conda
nest-asyncio              1.4.0                      py_0    conda-forge
notebook                  6.1.4            py38h32f6830_0    conda-forge
nspr                      4.28                 he1b5a44_0    conda-forge
nss                       3.57                 he751ad9_0    conda-forge
numpy                     1.19.1           py38hbc27379_2    conda-forge
olefile                   0.46                       py_0    conda-forge
openjpeg                  2.3.1                h981e76c_3    conda-forge
openssl                   1.1.1g               h516909a_1    conda-forge
packaging                 20.4               pyh9f0ad1d_0    conda-forge
pandas                    1.1.2            py38h950e882_0    conda-forge
pandoc                    2.10.1               h516909a_0    conda-forge
pandocfilters             1.4.2                      py_1    conda-forge
parso                     0.5.2                py_0.conda
pcre                      8.44                 he1b5a44_0    conda-forge
pexpect                   4.8.0            py38h32f6830_1    conda-forge
pickleshare               0.7.5           py38h32f6830_1001    conda-forge
pillow                    7.2.0            py38h9776b28_1    conda-forge
pip                       20.2.2             py38_0.conda
pixman                    0.38.0            h516909a_1003    conda-forge
poppler                   0.89.0               h4190859_1    conda-forge
poppler-data              0.4.9                         1    conda-forge
postgresql                12.3                 h8573dbc_0    conda-forge
proj                      7.1.0                h966b41f_1    conda-forge
prometheus_client         0.8.0              pyh9f0ad1d_0    conda-forge
prompt-toolkit            3.0.7                      py_0    conda-forge
prompt_toolkit            3.0.7                         0    conda-forge
pthread-stubs             0.4               h14c3975_1001    conda-forge
ptyprocess                0.6.0                   py_1001    conda-forge
pycparser                 2.20               pyh9f0ad1d_2    conda-forge
pygments                  2.7.1                      py_0    conda-forge
pyopenssl                 19.1.0                     py_1    conda-forge
pyparsing                 2.4.7              pyh9f0ad1d_0    conda-forge
pyproj                    3.5.0                     <pip>
pyproj                    2.6.1.post1      py38h8e47818_1    conda-forge
pyqt                      5.12.3           py38ha8c2ead_3    conda-forge
pyrsistent                0.17.3           py38h1e0a361_0    conda-forge
pysocks                   1.7.1            py38h32f6830_1    conda-forge
python                    3.8.5           h1103e12_8_cpython    conda-forge
python-dateutil           2.8.1                      py_0    conda-forge
python_abi                3.8                      1_cp38    conda-forge
pytz                      2020.1             pyh9f0ad1d_0    conda-forge
PyYAML                    6.0.1                     <pip>
pyzmq                     19.0.2           py38ha71036d_0    conda-forge
qt                        5.12.9               h1f2b2cb_0    conda-forge
qtconsole                 4.7.7              pyh9f0ad1d_0    conda-forge
qtpy                      1.9.0                      py_0    conda-forge
readline                  8.0             h7b6447c_0.conda
requests                  2.24.0             pyh9f0ad1d_0    conda-forge
rtree                     0.9.4            py38h08f867b_1    conda-forge
send2trash                1.5.0                      py_0    conda-forge
sentinelRequest           0.0.4.dev16+g2562afb           <pip>
setuptools                49.6.0             py38_0.conda
shapely                   1.7.1            py38hc7361b7_0    conda-forge
six                       1.15.0             pyh9f0ad1d_0    conda-forge
sqlite                    3.33.0          h62c20be_0.conda
tbb                       2020.2               hc9558a2_0    conda-forge
terminado                 0.8.3            py38h32f6830_1    conda-forge
testpath                  0.4.4                      py_0    conda-forge
tiledb                    2.0.8                h3effe38_0    conda-forge
tk                        8.6.10          hbc83047_0.conda
tornado                   6.0.4            py38h1e0a361_1    conda-forge
tqdm                      4.49.0                    <pip>
traitlets                 5.0.4                      py_0    conda-forge
tzcode                    2020a                h516909a_0    conda-forge
urllib3                   1.25.10                    py_0    conda-forge
wcwidth                   0.2.5              pyh9f0ad1d_1    conda-forge
webencodings              0.5.1                      py_1    conda-forge
wheel                     0.35.1               py_0.conda
widgetsnbextension        3.5.1            py38h32f6830_1    conda-forge
xerces-c                  3.2.3                hfe33f54_1    conda-forge
xorg-kbproto              1.0.7             h14c3975_1002    conda-forge
xorg-libice               1.0.10               h516909a_0    conda-forge
xorg-libsm                1.2.3             h84519dc_1000    conda-forge
xorg-libx11               1.6.12               h516909a_0    conda-forge
xorg-libxau               1.0.9                h14c3975_0    conda-forge
xorg-libxdmcp             1.1.3                h516909a_0    conda-forge
xorg-libxext              1.3.4                h516909a_0    conda-forge
xorg-libxrender           0.9.10            h516909a_1002    conda-forge
xorg-renderproto          0.11.1            h14c3975_1002    conda-forge
xorg-xextproto            7.3.0             h14c3975_1002    conda-forge
xorg-xproto               7.0.31            h14c3975_1007    conda-forge
xz                        5.2.5           h7b6447c_0.conda
zeromq                    4.3.2                he1b5a44_3    conda-forge
zipp                      3.1.0                      py_0    conda-forge
zlib                      1.2.11          h7b6447c_3.conda
zstd                      1.4.5                h6597ccf_2    conda-forge

Screenshots

📈 Expected behavior

📎 Additional context

@Skealz Skealz added the bug Something isn't working label Jan 29, 2024
@Skealz
Copy link
Author

Skealz commented Jan 29, 2024

Donc j'ai modifié le code de cdsodatacli pour voir ce que me retournait le site (j'ai affiché le response.text), voici :
reponse data : 'upstream connect error or disconnect/reset before headers. reset reason: connection termination'

@agrouaze
Copy link
Member

I think it could be related to this: https://dataspace.copernicus.eu/node/1023

@Skealz
Copy link
Author

Skealz commented Jan 29, 2024

I'm not sure because I just ran it again and still get the error.

@Skealz
Copy link
Author

Skealz commented Jan 30, 2024

@agrouaze I use the cdsodatacli query command in a script launched using xargs, meaning there are several (the last I tried was 5 in parallel) queries in parallel.

I'll try without multi-process to see if the error still occurs.

@agrouaze
Copy link
Member

Can you give us the snippet to reproduce your query?

@Skealz
Copy link
Author

Skealz commented Feb 5, 2024

I deactivated the multi-process, and I still got the issue.
Maybe this is still due to rate-limiting ? It seems related to how speedy I make the requests, because without using multi-process, I got a lot more results in the end using my script than with it, suggesting that I go a lot less JSON parse error maybe ?? I'm not sure.

To reproduce you can try :

ls /home/datawork-cersat-public/cache/project/hurricanes/analysis/best-tracks-atcf-merge/b*.dat | grep "$1" | egrep -v 'b..[89].*' | egrep -v 'b....201[01]' | xargs -n 1 -P 1 -r  stdbuf -oL /home1/datahome/oarcher/storm_watch/bt2sar_new.py --minspeed=34 --ddeg=0.4 --ddegcatfactor=2 --outdir=/tmp/bt_test

I just tried it and got the error.

You can use this conda env to launch the code : /home1/datahome/oarcher/storm_watch/conda_bt2sar_new

@Skealz
Copy link
Author

Skealz commented Feb 6, 2024

I made some kind of patch, in query.py

def get_json_with_retries(url, retries=3, delay=2):
    """Attempt to get JSON data from URL with specified retries and delay between retries."""
    for attempt in range(retries):
        try:
            response = requests.get(url)
            response.raise_for_status()  # Raises HTTPError for bad responses
            return response.json(), True
        except requests.exceptions.HTTPError as e:
            logging.error("HTTP Error for URL %s: %s", url, e)
        except requests.exceptions.ConnectionError as e:
            logging.error("Connection Error for URL %s: %s", url, e)
        except requests.exceptions.Timeout as e:
            logging.error("Timeout Error for URL %s: %s", url, e)
        except requests.exceptions.RequestException as e:
            logging.error("Request Exception for URL %s: %s", url, e)
        except KeyboardInterrupt:
            logging.info("Operation cancelled by user.")
            raise
        except Exception as e:
            logging.error("An error occurred for URL %s: %s", url, traceback.format_exc())

        # Log the attempt and wait before retrying
        logging.info("Attempt %d for URL %s failed, retrying in %d seconds...", attempt + 1, url, delay)
        time.sleep(delay)

    return None, False

def fetch_one_url(url, cpt, index, cache_dir):
    """

    Parameters
    ----------
    url (str)
    cpt (defaultdict(int))
    index (int)
    cache_dir (str)

    Returns
    -------
    cpt (defaultdict(int))
    collected_data (pandas.GeoDataframe)

    """
    json_data = None
    collected_data = None
    if cache_dir is not None:
        cache_file = get_cache_filename(url, cache_dir)
        if os.path.exists(cache_file):
            cpt["cache_used"] += 1
            logging.debug("cache file exists: %s", cache_file)
            with open(cache_file, "r") as f:
                json_data = json.load(f)
                collected_data = process_data(json_data)
    if (
        json_data is None
    ):  # means that cache cannot be used (or user used cache_dir=None or there is no associated json file
        logging.debug("no cache file -> go for query CDS")
        cpt["urls_tested"] += 1
        try:
            json_data, success = get_json_with_retries(url, retries=10, delay=2)
            if not success:
                cpt["urls_KO"] += 1
                logging.error("Couldn't get data from API after multiple tries")
            else:
            #json_data = requests.get(url).json()
                cpt["urls_OK"] += 1
... rest of the function is the same

@agrouaze
Copy link
Member

agrouaze commented Feb 7, 2024

@Skealz The snippet you provided doesnt seem to be related to the cdsodatacli.
About the proposition of source modification, could you open a PR so that we could easily investigate your proposition?

@Skealz
Copy link
Author

Skealz commented Feb 7, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants