Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Locale seems to get ignored #183

Open
mwegnr opened this issue Sep 12, 2024 · 5 comments
Open

Locale seems to get ignored #183

mwegnr opened this issue Sep 12, 2024 · 5 comments
Assignees

Comments

@mwegnr
Copy link

mwegnr commented Sep 12, 2024

Platform: Arch Linux (also happens in dockered Debian)
Python version: 3.12.5
Pyzotero version: 1.5.20

Problem Description

  • What were you trying to do?
    For our project, we refresh a json containing our zotero library every night for different locales (en-US, de-DE). The resulting files seem to be identical. However, if fetched directly from the API using request, they are different.

  • What API call did it involve? top items (see code example below)

  • What error was raised? no direct error

More Details

Our group library is public, using the following code it should be possible to reproduce the error.

Minimal code example:

import json

import requests
from pyzotero import zotero

def fetch_zotero_entries(locale: str) -> list:
    # initialize Text+ library object
    tplus_zotero_library = zotero.Zotero(library_id='4533881', library_type='group', locale=locale)
    tplus_zotero_library.add_parameters(format='json', include='bibtex,bib,csljson,data', linkwrap='1')
    tplus_entries = tplus_zotero_library.top()
    return tplus_entries

def fetch_zotero_entries_requests(offset: int = 0, locale: str = "de-DE"):
    URL = "https://api.zotero.org/groups/4533881/items/top"
    url_with_params = URL + f"?start={offset}&limit=100&format=json&include=bibtex,bib,csljson,data&linkwrap=1&locale={locale}"
    zotero_response = requests.get(url_with_params)
    items = zotero_response.json()
    return items

def write_zotero_json(data, suffix: str, locale: str):
    path = f"zotero-unprocessed-min.{suffix}.{locale}.json"
    with open(path, 'w') as output_file:
        json.dump(data, output_file, indent=2)

def refresh_json(use_lib: True, locale: str):
    if use_lib:
        zotero_items = fetch_zotero_entries(locale=locale)
    else:
        zotero_items = fetch_zotero_entries_requests(locale=locale)
    if zotero_items is not None:
        suffix = "pyzotero" if use_lib else "requests"
        print(f"Successfully fetched entries for {locale} with {suffix}")
        write_zotero_json(zotero_items, suffix=suffix, locale=locale)

refresh_json(use_lib=True, locale="de-DE")
refresh_json(use_lib=True, locale="en-US")
refresh_json(use_lib=False, locale="de-DE")
refresh_json(use_lib=False, locale="en-US")

Hashes of obtained files:

sha256sum zotero-unprocessed-min*

6df7f88aa966ad47c1cb43e33d87d60f0bdf5ac0f6ead382141cc558513a84af  zotero-unprocessed-min.pyzotero.de-DE.json
6df7f88aa966ad47c1cb43e33d87d60f0bdf5ac0f6ead382141cc558513a84af  zotero-unprocessed-min.pyzotero.en-US.json
f0ff0dc6df7906bb3bfcd32606fd6b2b0f7ebce18d1f3626109e146676ac286f  zotero-unprocessed-min.requests.de-DE.json
6df7f88aa966ad47c1cb43e33d87d60f0bdf5ac0f6ead382141cc558513a84af  zotero-unprocessed-min.requests.en-US.json
@urschrei
Copy link
Owner

Could you try again using v1.5.24?

@mwegnr
Copy link
Author

mwegnr commented Sep 12, 2024

Works as expected with v1.5.24

sha256sum zotero-unprocessed-min*
f0ff0dc6df7906bb3bfcd32606fd6b2b0f7ebce18d1f3626109e146676ac286f  zotero-unprocessed-min.pyzotero.de-DE.json
6df7f88aa966ad47c1cb43e33d87d60f0bdf5ac0f6ead382141cc558513a84af  zotero-unprocessed-min.pyzotero.en-US.json
f0ff0dc6df7906bb3bfcd32606fd6b2b0f7ebce18d1f3626109e146676ac286f  zotero-unprocessed-min.requests.de-DE.json
6df7f88aa966ad47c1cb43e33d87d60f0bdf5ac0f6ead382141cc558513a84af  zotero-unprocessed-min.requests.en-US.json

Thank you for the really quick fix!

@mwegnr mwegnr closed this as completed Sep 12, 2024
@mwegnr
Copy link
Author

mwegnr commented Sep 13, 2024

I encountered an error with the added locale when combining the top() with everything(). I was using v1.5.25.

It seems, that at the second top() call in everything(), the locale is added again, which leads to this invalid request URL after the first 100 items have been obtained:
URL: https://api.zotero.org/groups/4533881/items/top?include=bib%2Cbibtex%2Ccsljson%2Cdata&limit=100&linkwrap=1&locale=de-DE&start=100&locale=de-DE

Code reproducing this error:
from pyzotero import zotero, zotero_errors

def fetch_zotero_entries(locale: str) -> list:
    # initialize Text+ library object
    tplus_zotero_library = zotero.Zotero(library_id='4533881', library_type='group', locale=locale)
    try:
        tplus_zotero_library.add_parameters(format='json', include='bibtex,bib,csljson,data', linkwrap='1')
        tplus_entries = tplus_zotero_library.everything(tplus_zotero_library.top())
        return tplus_entries
    except zotero_errors.HTTPError as error:
        print(error)

fetch_zotero_entries(locale="de-DE")

@mwegnr mwegnr reopened this Sep 13, 2024
@urschrei
Copy link
Owner

Can you install master and try now? The new solution is a bit more robust about adding the locale if it already exists but I want to make sure it works before I push a new release.

@mwegnr
Copy link
Author

mwegnr commented Sep 16, 2024

I do not get an error anymore using top() in everything(), but the locale seems to get ignored again (en-US and de-DE-JSON have same hash) using the code from the original report.

I also noticed that the JSON using pyzotero is missing the bib,bibtex and csljson fields, which were added as a parameter. Therefore the hash is different from the JSON file generated using requests.

sha256sum zotero-unprocessed*

cf775c6bf78158d780d91665e9cc55a1089795f61ba1ba0dfc67f566380b15db  zotero-unprocessed-min.pyzotero.de-DE.json
cf775c6bf78158d780d91665e9cc55a1089795f61ba1ba0dfc67f566380b15db  zotero-unprocessed-min.pyzotero.en-US.json
f0ff0dc6df7906bb3bfcd32606fd6b2b0f7ebce18d1f3626109e146676ac286f  zotero-unprocessed-min.requests.de-DE.json
6df7f88aa966ad47c1cb43e33d87d60f0bdf5ac0f6ead382141cc558513a84af  zotero-unprocessed-min.requests.en-US.json
pip list
Package            Version
------------------ --------------------
bibtexparser       1.4.1
certifi            2024.8.30
charset-normalizer 3.3.2
feedparser         6.0.11
idna               3.10
pip                24.2
pyparsing          3.1.4
pytz               2024.2
pyzotero           1.5.26.dev4+g12896b5
requests           2.32.3
sgmllib3k          1.0.0
urllib3            2.2.3

Also, this is my complete code generating the complete JSON files. Feel free to use it for testing

Complete Code
# This script collects entries from Zotero using pyzotero and stores them to a local JSON without any further processing
# This is needed, since the response from the API slows down every Hugo build
# Should run every night scheduled by the GitLab CI to keep the JSON updated
import json
import sys
import time

from pyzotero import zotero, zotero_errors


def fetch_zotero_entries(locale: str) -> list:
    # init some variables
    retry_request_max = 3

    # initialize Text+ library object
    tplus_zotero_library = zotero.Zotero(library_id='4533881', library_type='group', locale=locale)

    for i in range(retry_request_max):
        # this loop is needed, because the zotero library is big and API timeouts occur often
        try:
            # add required formats to request
            tplus_zotero_library.add_parameters(format='json', include='bibtex,bib,csljson,data', linkwrap='1')

            # request top level items and wrap them in zotero.everything
            # a single top() request would only allow up to 100 items per request
            tplus_entries = tplus_zotero_library.everything(tplus_zotero_library.top())
            return tplus_entries
        except zotero_errors.HTTPError:
            # wait for 180 seconds, since the Zotero API somtimes needs time to generate the answer
            print("Zotero API timeout. Trying again in 180 seconds")
            time.sleep(180)
            pass
    raise TimeoutError(f"Zotero API did not respond in time after {retry_request_max} retries")


def write_zotero_json(data, locale: str):
    path = f"zotero-unprocessed.{locale}.json"
    with open(path, 'w') as output_file:
        json.dump(data, output_file, indent=2)


def refresh_json(locale: str):
    zotero_items = fetch_zotero_entries(locale=locale)
    if zotero_items is not None:
        print(f"Successfully fetched entries for {locale}")
        write_zotero_json(zotero_items, locale)
    else:
        sys.exit(1)


refresh_json("de-DE")
refresh_json("en-US")

urschrei added a commit that referenced this issue Dec 3, 2024
This didn't work, and is causing other issues: see #195
urschrei added a commit that referenced this issue Dec 3, 2024
This didn't work, and is causing other issues: see #195
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants