-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Json parsing error #62
Comments
Donc j'ai modifié le code de cdsodatacli pour voir ce que me retournait le site (j'ai affiché le response.text), voici : |
I think it could be related to this: https://dataspace.copernicus.eu/node/1023 |
I'm not sure because I just ran it again and still get the error. |
@agrouaze I use the cdsodatacli query command in a script launched using xargs, meaning there are several (the last I tried was 5 in parallel) queries in parallel. I'll try without multi-process to see if the error still occurs. |
Can you give us the snippet to reproduce your query? |
I deactivated the multi-process, and I still got the issue. To reproduce you can try :
I just tried it and got the error. You can use this conda env to launch the code : /home1/datahome/oarcher/storm_watch/conda_bt2sar_new |
I made some kind of patch, in query.py def get_json_with_retries(url, retries=3, delay=2):
"""Attempt to get JSON data from URL with specified retries and delay between retries."""
for attempt in range(retries):
try:
response = requests.get(url)
response.raise_for_status() # Raises HTTPError for bad responses
return response.json(), True
except requests.exceptions.HTTPError as e:
logging.error("HTTP Error for URL %s: %s", url, e)
except requests.exceptions.ConnectionError as e:
logging.error("Connection Error for URL %s: %s", url, e)
except requests.exceptions.Timeout as e:
logging.error("Timeout Error for URL %s: %s", url, e)
except requests.exceptions.RequestException as e:
logging.error("Request Exception for URL %s: %s", url, e)
except KeyboardInterrupt:
logging.info("Operation cancelled by user.")
raise
except Exception as e:
logging.error("An error occurred for URL %s: %s", url, traceback.format_exc())
# Log the attempt and wait before retrying
logging.info("Attempt %d for URL %s failed, retrying in %d seconds...", attempt + 1, url, delay)
time.sleep(delay)
return None, False
def fetch_one_url(url, cpt, index, cache_dir):
"""
Parameters
----------
url (str)
cpt (defaultdict(int))
index (int)
cache_dir (str)
Returns
-------
cpt (defaultdict(int))
collected_data (pandas.GeoDataframe)
"""
json_data = None
collected_data = None
if cache_dir is not None:
cache_file = get_cache_filename(url, cache_dir)
if os.path.exists(cache_file):
cpt["cache_used"] += 1
logging.debug("cache file exists: %s", cache_file)
with open(cache_file, "r") as f:
json_data = json.load(f)
collected_data = process_data(json_data)
if (
json_data is None
): # means that cache cannot be used (or user used cache_dir=None or there is no associated json file
logging.debug("no cache file -> go for query CDS")
cpt["urls_tested"] += 1
try:
json_data, success = get_json_with_retries(url, retries=10, delay=2)
if not success:
cpt["urls_KO"] += 1
logging.error("Couldn't get data from API after multiple tries")
else:
#json_data = requests.get(url).json()
cpt["urls_OK"] += 1
... rest of the function is the same |
@Skealz The snippet you provided doesnt seem to be related to the |
It is related, the fetch_one_url function comes from cdsodatacli.
Le 7 février 2024 10:01:06 GMT+01:00, Antoine Grouazel ***@***.***> a écrit :
…
@Skealz The snippet you provided doesnt seem to be related to the `cdsodatacli`.
About the proposition of source modification, could you open a PR so that we could easily investigate your proposition?
--
Reply to this email directly or view it on GitHub:
#62 (comment)
You are receiving this because you were mentioned.
Message ID: ***@***.***>
|
🐛 Bug Report
🔬 How To Reproduce
Il semble que ça ne se reproduise pas systématiquement du tout (même avec le même geodataframe en entrée). Je me demande si c'est pas lié à du rate-limiting, car on dirait que ça n'arrive que lorsque j'effectue plusieurs requêtes + ou - d'affilée.
Il faudrait que je chope le contenu du json en réponse...
Une première chose à faire dans le code de cdsodatacli, c'est d'afficher le contenu que renvoie le site web en cas derreur, avant de parser avec json.
J'essaye de faire ça de mon côté.
Environment
conda list
Screenshots
📈 Expected behavior
📎 Additional context
The text was updated successfully, but these errors were encountered: