diff --git a/.github/workflows/pytest.yaml b/.github/workflows/pytest.yaml index 8c355e576..217074c0f 100644 --- a/.github/workflows/pytest.yaml +++ b/.github/workflows/pytest.yaml @@ -50,11 +50,11 @@ jobs: cache: pip cache-dependency-path: "**/pyproject.toml" - - name: Upgrade pip, wheel - run: python -m pip install --upgrade pip wheel + - name: Upgrade pip + run: python -m pip install --upgrade pip - name: Install the Python package and dependencies - run: pip install .[cache,tests] + run: pip install .[tests] - name: Run pytest env: diff --git a/doc/whatsnew.rst b/doc/whatsnew.rst index 3f5523b37..3df64890a 100644 --- a/doc/whatsnew.rst +++ b/doc/whatsnew.rst @@ -68,6 +68,7 @@ Next release - Expand explicit marking of particular data sources that do not support the above endpoints. - Add support for validating SDMX-ML messages; see :func:`.validate_xml` (:issue:`51`; thanks :gh-user:`goatsweater` for :pull:`154`). +- :mod:`sdmx` is fully compatible with pandas 2.2.0, released 2024-01-19 (:pull:`156`). v2.12.1 (2023-12-20) ==================== @@ -113,7 +114,7 @@ All changes v2.10.0 (2023-05-20) ==================== -- Switch from third-party :mod:`.pydantic` to Python standard library :mod:`dataclasses` (:pull:`128`). +- Switch from third-party :py:`pydantic` to Python standard library :mod:`dataclasses` (:pull:`128`). This is a major change to the :mod:`sdmx` internals, but should come with few API changes and some performance improvements. Specific known changes: @@ -243,7 +244,7 @@ v2.6.3 (2022-09-29) - Update :ref:`ILO` web service URL and quirks handling (:pull:`97`, thanks :gh-user:`ethangelbach`). - Use HTTPS for :ref:`ESTAT` (:pull:`97`). -- Bump minimum version of :mod:`pydantic` to 1.9.2 (:pull:`98`). +- Bump minimum version of :py:`pydantic` to 1.9.2 (:pull:`98`). - Always return all objects parsed from a SDMX-ML :class:`.StructureMessage` (:pull:`99`). If two or more :class:`.MaintainableArtefact` have the same ID (e.g. "CL_FOO"); :mod:`sdmx` would formerly store only the last one parsed. @@ -313,7 +314,7 @@ v2.5.0 (2021-06-27) - Document how :ref:`Countdown to 2030 ` data can be accessed from the :ref:`UNICEF ` service (:pull:`83`). - Tolerate malformed SDMX-JSON from :ref:`OECD ` (:issue:`64`, :pull:`81`). - Reduce noise when :mod:`requests_cache` is not installed (:issue:`75`, :pull:`80`). - An exception is still raised if (a) the package is not installed and (b) cache-related arguments are passed to :class:`Client`. + An exception is still raised if (a) the package is not installed and (b) cache-related arguments are passed to :class:`.Client`. - Bugfix: `verify` = :obj:`False` was not passed to the preliminary request used to validate a :class:`dict` key for a data request (:pull:`80`; thanks :gh-user:`albertame` for :issue:`77`). - Handle ```` and ``>`` in SDMX-ML headers (:issue:`78`, :pull:`79`). @@ -333,8 +334,8 @@ v2.4.0 (2021-03-28) - Also add :meth:`.ContentConstraint.iter_keys`, :meth:`.DataflowDefinition.iter_keys`. - Implement or improve :meth:`.Constraint.__contains__`, :meth:`.CubeRegion.__contains__`, :meth:`.ContentConstraint.__contains__`, :meth:`.v21.KeyValue.__eq__`, and :meth:`.Key.__eq__`. -- Speed up creation of :class:`.Key` objects by improving :mod:`pydantic` usage, updating :meth:`.Key.__init__`, and adding :meth:`.Key._fast`. -- Simplify :func:`.validate_dictlike`; add :func:`.dictlike_field`, and simplify :mod:`pydantic` validation of :class:`.DictLike` objects, keys, and values. +- Speed up creation of :class:`.Key` objects by improving :py:`pydantic` usage, updating :meth:`.Key.__init__`, and adding :meth:`.Key._fast`. +- Simplify :func:`.validate_dictlike`; add :func:`.dictlike_field`, and simplify :py:`pydantic` validation of :class:`.DictLike` objects, keys, and values. v2.3.0 (2021-03-10) =================== @@ -342,7 +343,7 @@ v2.3.0 (2021-03-10) - :func:`.to_xml` can produce structure-specific SDMX-ML (:pull:`67`). - Improve typing of :class:`.Item` and subclasses, e.g. :class:`.Code` (:pull:`66`). :attr:`~Item.parent` and :attr:`~Item.child` elements are typed the same as a subclass. -- Require :mod:`pydantic` >= 1.8.1, and remove workarounds for limitations in earlier versions (:pull:`66`). +- Require :py:`pydantic` >= 1.8.1, and remove workarounds for limitations in earlier versions (:pull:`66`). - The default branch of the :mod:`sdmx` GitHub repository is renamed ``main``. Bug fixes @@ -353,7 +354,7 @@ Bug fixes v2.2.1 (2021-02-27) =================== -- Temporary exclude :mod:`pydantic` versions >= 1.8 (:pull:`62`). +- Temporary exclude :py:`pydantic` versions >= 1.8 (:pull:`62`). v2.2.0 (2021-02-26) =================== @@ -410,7 +411,7 @@ All changes - The large library of test specimens for :mod:`sdmx` is no longer shipped with the package, reducing the archive size by about 80% (:issue:`18`, :pull:`52`). The specimens can be retrieved for running tests locally; see :ref:`testing`. -- The :class:`Request` class is renamed :class:`.Client` for semantic clarity (:issue:`11`, :pull:`44`): +- The :py:`Request` class is renamed :class:`.Client` for semantic clarity (:issue:`11`, :pull:`44`): A Client can open a :class:`.requests.Session` and might make many :class:`requests.Requests <.requests.Request>` against the same web service. @@ -418,10 +419,10 @@ All changes - Some internal modules are renamed. These should not affect user code; if they do, adjust that code to use the top-level objects. - - :mod:`sdmx.api` is renamed :mod:`sdmx.client`. - - :mod:`sdmx.remote` is renamed :mod:`sdmx.session`. - - :mod:`sdmx.reader.sdmxml` is renamed :mod:`sdmx.reader.xml`, to conform with :mod:`sdmx.format.xml` and :mod:`sdmx.writer.xml`. - - :mod:`sdmx.reader.sdmxjson` is renamed :mod:`sdmx.reader.json`. + - :py:`sdmx.api` is renamed :mod:`sdmx.client`. + - :py:`sdmx.remote` is renamed :mod:`sdmx.session`. + - :py:`sdmx.reader.sdmxml` is renamed :mod:`sdmx.reader.xml`, to conform with :mod:`sdmx.format.xml` and :mod:`sdmx.writer.xml`. + - :py:`sdmx.reader.sdmxjson` is renamed :mod:`sdmx.reader.json`. v1.7 and earlier ================ @@ -474,7 +475,7 @@ New features - Enhance :func:`.to_xml` to handle :class:`DataMessages <.DataMessage>` (:pull:`13`). In v1.4.0, this feature supports a subset of DataMessages and DataSets. - If you have an example of a DataMessages that :mod:`sdmx1` 1.4.0 cannot write, please `file an issue on GitHub `_ with a file attachment. + If you have an example of a DataMessages that :mod:`sdmx` 1.4.0 cannot write, please `file an issue on GitHub `_ with a file attachment. SDMX-ML features used in such examples will be prioritized for future improvements. - Add ``compare()`` methods to :class:`.DataMessage`, :class:`.DataSet`, and related classes (:pull:`13`). @@ -524,7 +525,7 @@ New features - :attr:`.Item.hierarchical_id` and :meth:`.ItemScheme.get_hierarchical` create and search on IDs like ‘A.B.C’ for Item ‘A’ with child/grandchild Items ‘B’ and ‘C’ (:pull:`4`). - New methods :func:`.parent_class`, :func:`.get_reader_for_path`, :func:`.detect_content_reader`, and :func:`.reader.register` (:pull:`4`). -- :class:`.sdmxml.Reader` uses an event-driven, rather than recursive/tree iterating, parser (:pull:`4`). +- :class:`.sdmxml.Reader <.xml.v21.Reader>` uses an event-driven, rather than recursive/tree iterating, parser (:pull:`4`). - The codebase is improved to pass static type checking with `mypy `_ (:pull:`4`). - Add :func:`.to_xml` to generate SDMX-ML for a subset of the IM (:pull:`3`). @@ -544,11 +545,11 @@ v1.0.0 (2020-05-01) Users familiar with the IM can use :mod:`sdmx` without the need to understand implementation-specific details. - IM classes are no longer tied to :mod:`sdmx.reader` instances and can be created and manipulated outside of a read operation. -- :mod:`sdmx.api` and :mod:`sdmx.remote` are reimplemented to (1) match the semantics of the requests_ package and (2) be much thinner. +- :py:`sdmx.api` and :py:`sdmx.remote` are reimplemented to (1) match the semantics of the requests_ package and (2) be much thinner. - Data sources are modularized in :class:`~.source.Source`. - Idiosyncrasies of particular data sources (e.g. ESTAT's process for large requests) are handled by source-specific subclasses. - As a result, :mod:`sdmx.api` is leaner. + As a result, :py:`sdmx.api` is leaner. - Testing coverage is significantly expanded. @@ -636,8 +637,8 @@ v0.7.0 (2017-06-10) - UNESCO (free registration required) - World Bank - World Integrated Trade Solution (WITS) -* new feature: load metadata on data providers from json file; allow the user to add new agencies on the fly by specifying an appropriate JSON file using the :meth:`pandasdmx.api.Request.load_agency_profile`. -* new :meth:`pandasdmx.api.Request.preview_data` providing a powerful fine-grain key validation algorithm by downloading all series-keys of a dataset and exposing them as a pandas DataFrame which is then mapped to the cartesian product of the given dimension values. +* new feature: load metadata on data providers from json file; allow the user to add new agencies on the fly by specifying an appropriate JSON file using the :py:`pandasdmx.api.Request.load_agency_profile`. +* new :meth:`pandasdmx.api.Request.preview_data <.Client.preview_data>` providing a powerful fine-grain key validation algorithm by downloading all series-keys of a dataset and exposing them as a pandas DataFrame which is then mapped to the cartesian product of the given dimension values. Works only with data providers such as ECB and UNSD which support "series-keys-only" requests. This feature could be wrapped by a browser-based UI for building queries. * SDMX-JSON reader: add support for flat and cross-sectional datasets, preserve dimension order where possible @@ -677,7 +678,7 @@ New features * new reader module for SDMX JSON data messages * add OECD as data provider (data messages only) -* :class:`pandasdmx.model.Category` is now an iterator over categorised objects. +* :class:`pandasdmx.model.Category <.Category>` is now an iterator over categorised objects. This greatly simplifies category usage. Besides, categories with the same ID while belonging to multiple category schemes are no longer conflated. @@ -685,7 +686,7 @@ API changes ~~~~~~~~~~~ * Request constructor: make agency ID case-insensitive -* As :class:`Category` is now an iterator over categorised objects, :class:`Categorisations` is no longer considered part of the public API. +* As :class:`.Category` is now an iterator over categorised objects, :py:`Categorisations` is no longer considered part of the public API. Bug fixes ~~~~~~~~~ @@ -708,15 +709,15 @@ New features API changes ~~~~~~~~~~~ -* :class:`pandasdmx.api.Request` constructor accepts a ``log_level`` keyword argument which can be set to a log-level for the pandasdmx logger and its children (currently only pandasdmx.api) -* :class:`pandasdmx.api.Request` now has a ``timeout`` property to set the timeout for http requests +* :py:`pandasdmx.api.Request` constructor accepts a ``log_level`` keyword argument which can be set to a log-level for the pandasdmx logger and its children (currently only pandasdmx.api) +* :py:`pandasdmx.api.Request` now has a ``timeout`` property to set the timeout for http requests * extend api.Request._agencies configuration to specify agency- and resource-specific settings such as headers. Future versions may exploit this to provide reader selection information. * api.Request.get: specify http_headers per request. Defaults are set according to agency configuration * Response instances expose Message attributes to make application code more succinct -* rename :class:`pandasdmx.api.Message` attributes to singular form. +* rename :class:`pandasdmx.api.Message <.Message>` attributes to singular form. Old names are deprecated and will be removed in the future. -* :class:`pandasdmx.api.Request` exposes resource names such as data, datastructure, dataflow etc. as descriptors calling 'get' without specifying the resource type as string. +* :py:`pandasdmx.api.Request` exposes resource names such as data, datastructure, dataflow etc. as descriptors calling 'get' without specifying the resource type as string. In interactive environments, this saves typing and enables code completion. * data2pd writer: return attributes as namedtuples rather than dict * use patched version of namedtuple that accepts non-identifier strings as field names and makes all fields accessible through dict syntax. @@ -724,7 +725,7 @@ API changes * sdmxml reader: return strings or unicode strings instead of LXML smart strings * sdmxml reader: remove most of the specialized read methods. Adapt model to use generalized methods. This makes code more maintainable. -* :class:`sdmx.model.Representation` for DSD attributes and dimensions now supports text not just code lists. +* :class:`sdmx.model.Representation <.Representation>` for DSD attributes and dimensions now supports text not just code lists. Other changes and enhancements ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -757,7 +758,7 @@ v0.3.0 (2015-09-22) v0.2.2 ------ -* Make HTTP connections configurable by exposing the `requests.get API `_ through the :class:`pandasdmx.api.Request` constructor. +* Make HTTP connections configurable by exposing the `requests.get API `_ through the :py:`pandasdmx.api.Request` constructor. Hence, proxy servers, authorisation information and other HTTP-related parameters consumed by ``requests.get`` can be specified for each ``Request`` instance and used in subsequent requests. The configuration is exposed as a dict through a new ``Request.client.config`` attribute. * Responses have a new ``http_headers`` attribute containing the HTTP headers returned by the SDMX server diff --git a/pyproject.toml b/pyproject.toml index ff1dc8977..984761c65 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -34,7 +34,6 @@ dependencies = [ "platformdirs >= 4.1", "python-dateutil", "requests >= 2.7", - "typing_extensions", ] [project.optional-dependencies] @@ -42,10 +41,12 @@ cache = ["requests-cache"] docs = ["IPython", "sphinx >=4", "sphinx-book-theme"] tests = [ "Jinja2", + "pyarrow", # Suppress a warning from pandas >=2.2, <3.0 "pytest >= 5", "pytest-cov", "pytest-xdist", "requests-mock >= 1.4", + "sdmx1[cache]", ] [project.urls] diff --git a/sdmx/client.py b/sdmx/client.py index dab1a7261..9656c63f0 100644 --- a/sdmx/client.py +++ b/sdmx/client.py @@ -1,6 +1,6 @@ import logging from functools import partial -from typing import Any, Dict +from typing import TYPE_CHECKING, Any, Dict from warnings import warn import requests @@ -12,6 +12,9 @@ from sdmx.session import ResponseIO, Session from sdmx.source import NoSource, list_sources, sources +if TYPE_CHECKING: + import sdmx.source + log = logging.getLogger(__name__) @@ -52,7 +55,7 @@ class Client: cache: Dict[str, Message] = {} #: :class:`.source.Source` for requests sent from the instance. - source = None + source: "sdmx.source.Source" #: :class:`.Session` for queries sent from the instance. session: requests.Session diff --git a/sdmx/format/__init__.py b/sdmx/format/__init__.py index 6bf52696b..1262a70bf 100644 --- a/sdmx/format/__init__.py +++ b/sdmx/format/__init__.py @@ -2,12 +2,7 @@ from dataclasses import InitVar, dataclass, field from enum import Enum, IntFlag from functools import lru_cache -from typing import List, Optional, Union - -try: - from typing import Literal -except ImportError: # Python 3.7 - from typing_extensions import Literal # type: ignore +from typing import List, Literal, Optional, Union from sdmx.util import parse_content_type @@ -26,8 +21,12 @@ Flag = IntFlag("Flag", "data meta ss ts") f = Flag -#: SDMX standard versions. -Version = Enum("Version", "1.0.0 2.0.0 2.1 3.0.0 unknown") +#: SDMX standard versions. In this enumeration, the strings "3.0.0" and "3.0" evaluate +#: to the same member. +Version = Enum( + "Version", + {"1.0.0": 1, "2.0.0": 2, "2.1": 2.1, "3.0.0": 3, "3.0": 3, "unknown": None}, +) @dataclass(frozen=True) diff --git a/sdmx/format/xml/common.py b/sdmx/format/xml/common.py index e3f638fda..b50693afc 100644 --- a/sdmx/format/xml/common.py +++ b/sdmx/format/xml/common.py @@ -1,14 +1,18 @@ import logging import re +import zipfile from functools import lru_cache from itertools import chain from operator import itemgetter from pathlib import Path +from shutil import copytree from typing import IO, Iterable, List, Mapping, Optional, Tuple, Union from lxml import etree from lxml.etree import QName +from sdmx.format import Version + log = logging.getLogger(__name__) # Tags common to SDMX-ML 2.1 and 3.0 @@ -92,7 +96,7 @@ def validate_xml( msg: Union[Path, IO], schema_dir: Optional[Path] = None, - version: Optional[str] = "2.1", + version: Union[str, Version] = Version["2.1"], ) -> bool: """Validate and SDMX message against the XML Schema (XSD) documents. @@ -113,17 +117,7 @@ def validate_xml( bool True if validation passed. False otherwise. """ - import platformdirs - - # Supported versions according to install_schemas() - sdmx_ml_versions = ["2.1", "3.0"] - # Raise an error if the version doesn't match one of the defined values - if version not in sdmx_ml_versions: - raise NotImplementedError(f"SDMX-ML version must be one of {sdmx_ml_versions}") - - # If the user has no preference, get the schemas from the local cache directory - if not schema_dir: - schema_dir = platformdirs.user_cache_path("sdmx") / version + schema_dir, version = _handle_validate_args(schema_dir, version) msg_doc = etree.parse(msg) @@ -160,62 +154,119 @@ def validate_xml( return xml_schema.validate(msg_doc) -def install_schemas( - schema_dir: Optional[Path] = None, version: Optional[str] = "2.1" -) -> None: - """Cache XML Schema documents locally for use during message validation. +def _extracted_zipball(version: Version) -> Path: + """Retrieve, cache, and extract the SDMX-ML schemas for `version`. - Parameters - ---------- - schema_dir - The directory where XSD schemas will be downloaded to. - version - The SDMX-ML schema version to validate against. One of ``2.1`` or ``3.0``. - """ - import io - import zipfile + 1. Query the GitHub REST API to identify a URL for the `version` in zipball format. + 2. Download and cache the zipball. The file is not downloaded if it already exists. + 3. Unpack the archive. + + Actions (2) and (3) are performed in the user's cache directory (for instance, + :file:`$HOME/.cache/sdmx/`). :func:`install_schemas` handles copying the extracted + files to other locations. + Returns + ------- + Path + Path to the root folder of the unpacked archive. + """ import platformdirs import requests # Map SDMX-ML schema versions to repo paths - sdmx_ml_versions = { - "2.1": "sdmx-ml-v2_1", - "3.0": "sdmx-ml", + # Check the latest release to get the URL to the schema zip + url = ( + "https://api.github.com/repos/sdmx-twg/sdmx-ml/releases/tags/" + + {Version["2.1"]: "v2.1", Version["3.0.0"]: "v3.0.0"}[version] + ) + gh_headers = { + "Accept": "application/vnd.github+json", + "X-GitHub-Api-Version": "2022-11-28", } - # Raise an error if the version doesn't match one of the defined values - if version not in sdmx_ml_versions.keys(): + release_json = requests.get(url=url, headers=gh_headers).json() + try: + zipball_url = release_json["zipball_url"] + except KeyError: # pragma: no cover + log.debug(release_json) + raise RuntimeError("Failed to download SDMX-ML schema bundle") + + # Make a request for the zipball + resp = requests.get(url=zipball_url, headers=gh_headers) + + # Filename indicated by the HTTP response + filename = resp.headers["content-disposition"].split("filename=")[-1] + # Location for the cached zipball + target = platformdirs.user_cache_path("sdmx").joinpath(filename) + + # Avoid downloading if the same file is already present + if target.exists(): + log.info(f"Use existing {target}") + resp.close() + else: + # Write response content to file + target.parent.mkdir(parents=True, exist_ok=True) + target.write_bytes(resp.content) + + with zipfile.ZipFile(target) as zf: + # Unpack the entire archive + zf.extractall(target.parent) + # The first name list is the top-level directory within the file + subdir = zf.namelist()[0] + + return target.parent.joinpath(subdir) + + +def _handle_validate_args( + schema_dir: Optional[Path], version: Union[str, Version] +) -> Tuple[Path, Version]: + """Handle arguments for :func:`.install_schemas` and :func:`.validate_xml`.""" + import platformdirs + + supported = {Version["2.1"], Version["3.0.0"]} + try: + version = Version[version] if isinstance(version, str) else version + assert version in supported + except (AssertionError, KeyError): raise NotImplementedError( - f"SDMX-ML version must be one of {sdmx_ml_versions.keys()}" - ) + f"SDMX-ML version must be one of {supported}; got {version}" + ) from None # If the user has no preference, download the schemas to the local cache directory if not schema_dir: - schema_dir = platformdirs.user_cache_path("sdmx") / version + schema_dir = platformdirs.user_cache_path("sdmx") / version.name schema_dir.mkdir(exist_ok=True, parents=True) - # Check the latest release to get the URL to the schema zip - repo = sdmx_ml_versions.get(version) - release_url = f"https://api.github.com/repos/sdmx-twg/{repo}/releases/latest" - gh_headers = { - "Accept": "application/vnd.github+json", - "X-GitHub-Api-Version": "2022-11-28", - } - resp = requests.get(url=release_url, headers=gh_headers) - zipball_url = resp.json().get("zipball_url") + return schema_dir, version - # Download the zipped content and find the schemas within - resp = requests.get(url=zipball_url, headers=gh_headers) - zipped = zipfile.ZipFile(io.BytesIO(resp.content)) - schemas = [n for n in zipped.namelist() if "schemas" in n and n.endswith(".xsd")] - - # Extract the schemas to the destination directory - # We can't use ZipFile.extract here because it will keep the directory structure - for xsd in schemas: - xsd_path = zipfile.Path(zipped, at=xsd) - target = schema_dir.joinpath(xsd_path.name) - # The encoding needs to be supplied here for Windows to read the file - target.write_text(xsd_path.read_text(encoding="utf-8")) + +def install_schemas( + schema_dir: Optional[Path] = None, + version: Union[str, Version] = Version["2.1"], +) -> Path: + """Install SDMX-ML XML Schema documents for use with :func:`.validate_xml`. + + Parameters + ---------- + schema_dir : Path, optional + The directory where XSD schemas will be downloaded to. Default: a subdirectory + named :file:`sdmx/{version}` within the :meth:`platformdirs.user_cache_path`. + version : str or Version, optional + The SDMX-ML schema version to install. One of :py:`Version["2.1"]` (default), + :py:`Version["3.0.0"]`, or :class:`str` equivalent. + + Returns + ------- + Path + The path containing the installed schemas. If `schema_dir` is given, the return + value is identical to the parameter. + """ + schema_dir, version = _handle_validate_args(schema_dir, version) + + # Copy the entire "schemas" subtree recursively + copytree( + _extracted_zipball(version).joinpath("schemas"), schema_dir, dirs_exist_ok=True + ) + return schema_dir class XMLFormat: diff --git a/sdmx/tests/format/test_format_xml.py b/sdmx/tests/format/test_format_xml.py index fbb718554..30a6184c3 100644 --- a/sdmx/tests/format/test_format_xml.py +++ b/sdmx/tests/format/test_format_xml.py @@ -1,12 +1,12 @@ import io import re -import zipfile +from pathlib import Path import pytest -import requests import sdmx -from sdmx.format import xml +from sdmx.format import Version, xml +from sdmx.format.xml.common import _extracted_zipball from sdmx.message import StructureMessage from sdmx.model import v21 @@ -30,6 +30,35 @@ def test_class_for_tag(): assert xml.v30.class_for_tag("str:DataStructure") is not None +@pytest.fixture(scope="module") +def mock_gh_api(): + """Mock GitHub API responses to avoid hitting rate limits. + + For each API endpoint URL queried by :func:.`_gh_zipball`, return a pared-down JSON + response that contains the required "zipball_url" key. + """ + import requests_mock + + base = "https://api.github.com/repos/sdmx-twg/sdmx-ml" + + with requests_mock.Mocker(real_http=True) as m: + for v in "2.1", "3.0": + m.get( + f"{base}/releases/tags/v{v}", + json=dict(zipball_url=f"{base}/zipball/v{v}"), + ) + yield + + +@pytest.fixture(scope="module") +def installed_schemas(mock_gh_api, tmp_path_factory): + """Fixture that ensures schemas are installed locally in a temporary directory.""" + dir = tmp_path_factory.mktemp("schemas") + sdmx.install_schemas(dir.joinpath("2.1"), Version["2.1"]) + sdmx.install_schemas(dir.joinpath("3.0"), Version["3.0.0"]) + yield dir + + @pytest.mark.parametrize("version", ["1", 1, None]) def test_install_schemas_invalid_version(version): """Ensure invalid versions throw ``NotImplementedError``.""" @@ -39,15 +68,11 @@ def test_install_schemas_invalid_version(version): @pytest.mark.network @pytest.mark.parametrize("version", ["2.1", "3.0"]) -def test_install_schemas(tmp_path, version): +def test_install_schemas(installed_schemas, version): """Test that XSD files are downloaded and ready for use in validation.""" - sdmx.install_schemas(schema_dir=tmp_path, version=version) - # Look for a couple of the expected files - files = ["SDMXCommon.xsd", "SDMXMessage.xsd"] - for schema_doc in files: - doc = tmp_path.joinpath(schema_doc) - assert doc.exists() + for schema_doc in ("SDMXCommon.xsd", "SDMXMessage.xsd"): + assert installed_schemas.joinpath(version, schema_doc).exists() @pytest.mark.network @@ -62,7 +87,7 @@ def test_install_schemas_in_user_cache(): files = ["SDMXCommon.xsd", "SDMXMessage.xsd"] for schema_doc in files: doc = cache_dir.joinpath(schema_doc) - assert doc.exists() + assert doc.exists(), (cache_dir, sorted(cache_dir.glob("*"))) @pytest.mark.parametrize("version", ["1", 1, None]) @@ -73,7 +98,7 @@ def test_validate_xml_invalid_version(version): sdmx.validate_xml("samples/common/common.xml", version=version) -def test_validate_xml_no_schemas(specimen, tmp_path): +def test_validate_xml_no_schemas(tmp_path, specimen, installed_schemas): """Check that supplying an invalid schema path will raise ``ValueError``.""" with specimen("IPI-2010-A21-structure.xml", opened=False) as msg_path: with pytest.raises(ValueError): @@ -81,46 +106,32 @@ def test_validate_xml_no_schemas(specimen, tmp_path): @pytest.mark.network -def test_validate_xml_from_v2_1_samples(tmp_path): +def test_validate_xml_from_v2_1_samples(tmp_path, specimen, installed_schemas): """Use official samples to ensure validation of v2.1 messages works correctly.""" - # Grab the latest v2.1 schema release to get the URL to the zip - release_url = "https://api.github.com/repos/sdmx-twg/sdmx-ml-v2_1/releases/latest" - gh_headers = { - "Accept": "application/vnd.github+json", - "X-GitHub-Api-Version": "2022-11-28", - } - resp = requests.get(url=release_url, headers=gh_headers) - zipball_url = resp.json().get("zipball_url") - # Download the zipped content and find the schemas within - resp = requests.get(url=zipball_url, headers=gh_headers) - zipped = zipfile.ZipFile(io.BytesIO(resp.content)) - zipped.extractall(path=tmp_path) - extracted_content = list(tmp_path.glob("sdmx-twg-sdmx-ml*"))[0] + extracted_content = _extracted_zipball(Version["2.1"]) # Schemas as just in a flat directory schema_dir = extracted_content.joinpath("schemas") # Samples are somewhat spread out, and some are known broken so we pick a bunch - samples_dir = extracted_content.joinpath("samples") - samples = [ - samples_dir / "common" / "common.xml", - samples_dir / "demography" / "demography.xml", - samples_dir / "demography" / "esms.xml", - samples_dir / "exr" / "common" / "exr_common.xml", - samples_dir / "exr" / "ecb_exr_ng" / "ecb_exr_ng_full.xml", - samples_dir / "exr" / "ecb_exr_ng" / "ecb_exr_ng.xml", - samples_dir / "query" / "query_cl_all.xml", - samples_dir / "query" / "response_cl_all.xml", - samples_dir / "query" / "query_esms_children.xml", - samples_dir / "query" / "response_esms_children.xml", - ] - - for sample in samples: - assert sdmx.validate_xml(sample, schema_dir, version="2.1") + for parts in [ + ("v21", "xml", "common", "common.xml"), + ("v21", "xml", "demography", "demography.xml"), + ("v21", "xml", "demography", "esms.xml"), + ("ECB_EXR", "common.xml"), + ("ECB_EXR", "ng-structure-full.xml"), + ("ECB_EXR", "ng-structure.xml"), + ("v21", "xml", "query", "query_cl_all.xml"), + ("v21", "xml", "query", "response_cl_all.xml"), + ("v21", "xml", "query", "query_esms_children.xml"), + ("v21", "xml", "query", "response_esms_children.xml"), + ]: + with specimen(str(Path(*parts))) as sample: + assert sdmx.validate_xml(sample, schema_dir, version="2.1") @pytest.mark.network -def test_validate_xml_invalid_doc(tmp_path): +def test_validate_xml_invalid_doc(tmp_path, installed_schemas): """Ensure that an invalid document fails validation.""" msg_path = tmp_path / "invalid.xml" @@ -149,13 +160,8 @@ def test_validate_xml_invalid_doc(tmp_path): msg_path.write_bytes(sdmx.to_xml(msg)) - # Install schemas for use in validation - schema_dir = tmp_path / "schemas" - schema_dir.mkdir(exist_ok=True, parents=True) - sdmx.install_schemas(schema_dir=schema_dir) - # Expect validation to fail - assert not sdmx.validate_xml(msg_path, schema_dir=schema_dir) + assert not sdmx.validate_xml(msg_path, schema_dir=installed_schemas.joinpath("2.1")) def test_validate_xml_invalid_message_type(): @@ -169,21 +175,9 @@ def test_validate_xml_invalid_message_type(): @pytest.mark.network -def test_validate_xml_from_v3_0_samples(tmp_path): +def test_validate_xml_from_v3_0_samples(tmp_path, installed_schemas): """Use official samples to ensure validation of v3.0 messages works correctly.""" - # Grab the latest v3.0 schema release to get the URL to the zip - release_url = "https://api.github.com/repos/sdmx-twg/sdmx-ml/releases/latest" - gh_headers = { - "Accept": "application/vnd.github+json", - "X-GitHub-Api-Version": "2022-11-28", - } - resp = requests.get(url=release_url, headers=gh_headers) - zipball_url = resp.json().get("zipball_url") - # Download the zipped content and find the schemas within - resp = requests.get(url=zipball_url, headers=gh_headers) - zipped = zipfile.ZipFile(io.BytesIO(resp.content)) - zipped.extractall(path=tmp_path) - extracted_content = list(tmp_path.glob("sdmx-twg-sdmx-ml*"))[0] + extracted_content = _extracted_zipball(Version["3.0.0"]) # Schemas as just in a flat directory schema_dir = extracted_content.joinpath("schemas") diff --git a/sdmx/tests/reader/test_reader_xml.py b/sdmx/tests/reader/test_reader_xml.py index 3f546bb71..795373184 100644 --- a/sdmx/tests/reader/test_reader_xml.py +++ b/sdmx/tests/reader/test_reader_xml.py @@ -7,7 +7,7 @@ @pytest.mark.parametrize_specimens("path", format="xml") def test_read_xml(path) -> None: """XML specimens can be read.""" - if "esms_structured" in path.name: + if "esms_structured" in path.name or "query" in str(path): pytest.xfail("Not implemented") result = sdmx.read_sdmx(path) diff --git a/sdmx/tests/test_dataset.py b/sdmx/tests/test_dataset.py index 4b6b4f579..dbf35dda5 100644 --- a/sdmx/tests/test_dataset.py +++ b/sdmx/tests/test_dataset.py @@ -114,7 +114,7 @@ def test_pandas(self, msg): assert isinstance(s3, pd.Series) # Test a particular value - assert s3[0] == 1.2894 + assert s3.iloc[0] == 1.2894 # Length of index assert len(s3.index.names) == 6 diff --git a/sdmx/tests/test_dataset_ss.py b/sdmx/tests/test_dataset_ss.py index 128810a3d..a63692aa5 100644 --- a/sdmx/tests/test_dataset_ss.py +++ b/sdmx/tests/test_dataset_ss.py @@ -128,7 +128,7 @@ def test_pandas(self, msg): s3 = sdmx.to_pandas(data.series[3], attributes="") assert isinstance(s3, pd.Series) # With expected values - assert s3[0] == 1.2894 + assert s3.iloc[0] == 1.2894 # Single series can be converted with attributes s3_attr = sdmx.to_pandas(data.series[3], attributes="osgd") diff --git a/sdmx/tests/writer/test_pandas.py b/sdmx/tests/writer/test_pandas.py index fde0566a1..210739d1d 100644 --- a/sdmx/tests/writer/test_pandas.py +++ b/sdmx/tests/writer/test_pandas.py @@ -221,8 +221,8 @@ def expected(df, axis=0, cls=pd.DatetimeIndex): df = sdmx.to_pandas(ds, datetime=dict(dim="TIME_PERIOD", freq="M")) expected(df, cls=pd.PeriodIndex) - # Write with freq='A' works - df = sdmx.to_pandas(ds, datetime=dict(dim="TIME_PERIOD", freq="A")) + # Write with freq='Y' (in older pandas, freq='A') works + df = sdmx.to_pandas(ds, datetime=dict(dim="TIME_PERIOD", freq="Y")) expected(df, cls=pd.PeriodIndex) # …but the index is not unique, because month information was discarded assert not df.index.is_unique diff --git a/sdmx/writer/pandas.py b/sdmx/writer/pandas.py index 08dac0b70..6f6081bbb 100644 --- a/sdmx/writer/pandas.py +++ b/sdmx/writer/pandas.py @@ -3,7 +3,6 @@ import numpy as np import pandas as pd -from pandas.core.indexes.datetimes import prefix_mapping # type: ignore [attr-defined] from sdmx import message from sdmx.dictlike import DictLike @@ -390,14 +389,14 @@ def _maybe_convert_datetime(df, arg, obj, dsd=None): # noqa: C901 From the `obj` argument to :meth:`write_dataset`. dsd: ~.DataStructureDefinition, optional """ - # TODO Simplify this method to reduce its McCabe complexity from 27 to <= 13 - if not arg: - # False, None, empty dict: no datetime conversion - return df + # TODO Simplify this method to reduce its McCabe complexity from 23 to <= 13 # Check argument values param = dict(dim=None, axis=0, freq=False) - if isinstance(arg, str): + + if not arg: + return df # False, None, empty dict → no datetime conversion + elif isinstance(arg, str): param["dim"] = arg elif isinstance(arg, DimensionComponent): param["dim"] = arg.id @@ -411,33 +410,22 @@ def _maybe_convert_datetime(df, arg, obj, dsd=None): # noqa: C901 else: raise ValueError(arg) - def _get_dims(): - """Return an appropriate list of dimensions.""" - if len(obj.structured_by.dimensions.components): - return obj.structured_by.dimensions.components - elif dsd: - return dsd.dimensions.components - else: - return [] - - def _get_attrs(): - """Return an appropriate list of attributes.""" - if len(obj.structured_by.attributes.components): - return obj.structured_by.attributes.components + def _get(kind: str): + """Return an appropriate list of dimensions or attributes.""" + if len(getattr(obj.structured_by, kind).components): + return getattr(obj.structured_by, kind).components elif dsd: - return dsd.attributes.components + return getattr(dsd, kind).components else: return [] + # Determine time dimension + if not param["dim"]: + for dim in filter(lambda d: isinstance(d, TimeDimension), _get("dimensions")): + param["dim"] = dim + break if not param["dim"]: - # Determine time dimension - dims = _get_dims() - for dim in dims: - if isinstance(dim, TimeDimension): - param["dim"] = dim - break - if not param["dim"]: - raise ValueError(f"no TimeDimension in {dims}") + raise ValueError(f"no TimeDimension in {_get('dimensions')}") # Unstack all but the time dimension and convert other_dims = list(filter(lambda d: d != param["dim"], df.index.names)) @@ -446,22 +434,24 @@ def _get_attrs(): kw = dict(format="mixed") if _HAS_PANDAS_2 else {} df.index = pd.to_datetime(df.index, **kw) - if param["freq"]: - # Determine frequency string, Dimension, or Attribute - freq = param["freq"] - if isinstance(freq, str) and freq not in prefix_mapping: - # ID of a Dimension or Attribute - for component in chain(_get_dims(), _get_attrs()): + # Convert to a PeriodIndex with a particular frequency + if freq := param["freq"]: + try: + # A frequency string recognized by pandas.PeriodDtype + if isinstance(freq, str): + freq = pd.PeriodDtype(freq=freq).freq + except ValueError: + # ID of a Dimension; Attribute; or column of `df` + result = None + for component in chain( + _get("dimensions"), _get("attributes"), map(Dimension, df.columns.names) + ): if component.id == freq: - freq = component + freq = result = component break - # No named dimension in the DSD; but perhaps on the df - if isinstance(freq, str): - if freq in df.columns.names: - freq = Dimension(id=freq) - else: - raise ValueError(freq) + if not result: + raise ValueError(freq) if isinstance(freq, Dimension): # Retrieve Dimension values from pd.MultiIndex level @@ -470,9 +460,8 @@ def _get_attrs(): values = set(df.columns.levels[i]) if len(values) > 1: - values = sorted(values) raise ValueError( - "cannot convert to PeriodIndex with " f"non-unique freq={values}" + f"cannot convert to PeriodIndex with non-unique freq={sorted(values)}" ) # Store the unique value