Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement WIGOS_station_id computed key #80

Merged
merged 4 commits into from
Jan 14, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
97 changes: 86 additions & 11 deletions docs/read_bufr.rst
Original file line number Diff line number Diff line change
Expand Up @@ -51,20 +51,73 @@ BUFR keys

The "count" generated key, which refers to the message index, is also supported but please note that message indexing starts at 1 and not at 0!

Computed BUFR keys
-------------------

There is also a set of **computed keys** that can be used for :func:`read_bufr`:

* "data_datetime" (datetime.datetime): generated from the "year", "month", "day", "hour", "minute", "second" keys in the BUFR data section.
* "typical_datetime" (datetime.datetime): generated from the "typicalYear", "typicalMonth", "typicalDay", "typicalHour", "typicalMinute", "typicalSecond" keys in the BUFR header section.
* "WMO_station_id": generated from the "blockNumber" and "stationNumber" keys as::

.. _key_data_datetime:

data_datetime
+++++++++++++++

Generated from the "year", "month", "day", "hour", "minute", "second" keys in the BUFR data section. The values are converted to a **datetime.datetime** object.


.. _key_typical_datetime:

typical_datetime
+++++++++++++++++

Generated from the "typicalYear", "typicalMonth", "typicalDay", "typicalHour", "typicalMinute", "typicalSecond" keys in the BUFR header section. The values are converted to a **datetime.datetime** object.


.. _key_wmo_station_id:

WMO_station_id
++++++++++++++++

Generated from the "blockNumber" and "stationNumber" keys as::

blockNumber*1000+stationNumber

* "geometry": values extracted as a list of::

.. _key_wigos_station_id:

WIGOS_station_id
++++++++++++++++++

*New in version 0.12.0*

Generated from the "wigosIdentifierSeries", "wigosIssuerOfIdentifier", "wigosIssueNumber" and "wigosLocalIdentifierCharacter" keys as a str in the following format::

"{wigosIdentifierSeries}-{wigosIssuerOfIdentifier}-{wigosIssueNumber}-{wigosLocalIdentifierCharacter}


For example: "0-705-0-1931".

When using "WIGOS_station_id" in ``filters`` the value can be given as a str or as a tuple/list of 4 values. See: :ref:`filters-section`.

Details about the WIGOS identifiers can be found `here <https://community.wmo.int/en/activity-areas/WIGOS/implementation-WIGOS/FAQ-WSI>`_.

.. _key_geometry:

geometry
++++++++++

Values extracted as a list of::

[longitude,latitude,heightOfStationGroundAboveMeanSeaLevel]

as required for geopandas.
* "CRS": generated from the "coordinateReferenceSystem" key using the following mapping:
as required for geopandas.

.. _crs:

CRS
++++

Generated from the "coordinateReferenceSystem" key using the following mapping:

.. list-table::
:header-rows: 1
Expand All @@ -91,9 +144,11 @@ BUFR keys
- EPSG:4326


.. note::

The computed keys do not preserve their position in ``columns`` but are placed to the end of the resulting DataFrame.
.. note::

Computed keys do not preserve their position in ``columns`` but are placed to the end of the resulting DataFrame.


.. _filters-section:

Expand All @@ -110,16 +165,36 @@ Single value
.. code-block:: python

filters = {"blockNumber": 12}
filters = {"WMO_station_id": 12925}

List of values
++++++++++++++
# The "WIGOS_station_id" can be specified in various ways
# When tuple/list is used the first 3 values must be integers, the last one must be a string.
filters = {"WIGOS_station_id": "0-705-0-1931"}
filters = {"WIGOS_station_id": (0, 705, 0, "1931")}

# However, implicit str to int conversion is done for the first 3 values, so this is also valid.
filters = {"WIGOS_station_id": ("0", "705", "0", "1931")}

List/tuple/set of values
++++++++++++++++++++++++++

A list of values specifies an "in" relation:
A list/tuple/set of values specifies an "in" relation:

.. code-block:: python

filters = {"stationNumber": [843, 925]}
filters = {"blockNumber": range(10, 13)}
filters = {"WMO_station_id": [12925, 12843]}

# The "WIGOS_station_id" can be specified in various ways.
# When tuple/list is used in an id the first 3 values must be integers, the last one must be a string.
filters = {"WIGOS_station_id": ["0-705-0-1931", "0-705-0-1932"]}
filters = {"WIGOS_station_id": ((0, 705, 0, "1931"), (0, 705, 0, "1932"))}

# However, implicit str to int conversion is done for the first 3 values, so this is also valid.
filters = {
"WIGOS_station_id": [("0", "705", "0", "1931"), ("0", "705", "0", "1932")]
}

Slices
++++++++
Expand Down
5 changes: 3 additions & 2 deletions docs/release_notes/version_0.12_updates.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,6 @@ Version 0.12 Updates
Version 0.12.0
===============

- fixed issue when the ``filter`` was not applied if it contained a computed key and ``required_columns`` were set when (:pr:`79`)
- fixed issue when the ``filter`` containing a computed key matched subsets/messages where the computed key value was missing
- implemented the ``WIGOS_station_id`` computed key for the `WIGOS station identifier <https://community.wmo.int/en/activity-areas/WIGOS/implementation-WIGOS/FAQ-WSI>`_. See details :ref:`here <key_wigos_station_id>`.
- fixed issue when the ``filters`` was not applied if it contained a computed key and ``required_columns`` were set when (:pr:`79`)
- fixed issue when the ``filters`` containing a computed key matched subsets/messages where the computed key value was missing (:pr:`79`)
3 changes: 2 additions & 1 deletion src/pdbufr/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,10 @@
__version__ = "999"


from .bufr_filters import WIGOSId
from .bufr_structure import stream_bufr

__all__ = ["stream_bufr"]
__all__ = ["stream_bufr", "WIGOSId"]

try:
from .bufr_read import read_bufr
Expand Down
197 changes: 165 additions & 32 deletions src/pdbufr/bufr_filters.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,53 +8,186 @@

import logging
import typing as T

import attr # type: ignore
from abc import ABCMeta
from abc import abstractmethod

LOG = logging.getLogger(__name__)

WIGOS_ID_KEY = "WIGOS_station_id"

@attr.attrs(auto_attribs=True, frozen=True)
class BufrFilter:
filter: T.Union[slice, T.Set[T.Any], T.Callable[[T.Any], bool]]

@classmethod
def from_user(cls, user_filter: T.Any) -> "BufrFilter":
filter: T.Union[slice, T.Set[T.Any], T.Callable[[T.Any], bool]]

if isinstance(user_filter, slice):
if user_filter.step is not None:
LOG.warning(f"slice filters ignore the step {user_filter.step}")
filter = user_filter
elif isinstance(user_filter, T.Iterable) and not isinstance(user_filter, str):
filter = set(user_filter)
elif callable(user_filter):
filter = user_filter
def normalise_wigos(value):
if value is not None:
if isinstance(value, WIGOSId):
return value
try:
return WIGOSId.from_user(value)
except Exception:
if isinstance(value, (list, tuple)):
value = [normalise_wigos(x) for x in value]
return value

raise ValueError(f"Invalid WIGOS ID value: {value}")


class BufrFilter(metaclass=ABCMeta):
@abstractmethod
def match(self, value: T.Any) -> bool:
pass

@abstractmethod
def max(self) -> T.Any:
pass

@staticmethod
def from_user(value: T.Any, key: str = None) -> "BufrFilter":
if isinstance(value, slice):
return SliceBufrFilter(value)
elif callable(value):
return CallableBufrFilter(value)
else:
filter = {user_filter}
return cls(filter)
if key == WIGOS_ID_KEY:
value = normalise_wigos(value)
return WigosValueBufrFilter(value)
else:
return ValueBufrFilter(value)


class EmptyBufrFilter(BufrFilter):
def __init__(self) -> None:
super().__init__(slice(None, None, None))

def match(self, value: T.Any) -> bool:
return True

def max(self) -> T.Any:
return None


class SliceBufrFilter(BufrFilter):
def __init__(self, v: slice) -> None:
self.slice = v
if self.slice.step is not None:
LOG.warning(f"slice filters ignore the step={self.slice.step} in slice={self.slice}")

def match(self, value: T.Any) -> bool:
if value is None:
return False
if isinstance(self.filter, slice):
if self.filter.start is not None and value < self.filter.start:
return False
elif self.filter.stop is not None and value > self.filter.stop:
return False
elif callable(self.filter):
return bool(self.filter(value))
elif value not in self.filter:
if self.slice.start is not None and value < self.slice.start:
return False
elif self.slice.stop is not None and value > self.slice.stop:
return False
return True

def max(self) -> T.Any:
if isinstance(self.filter, slice):
return self.filter.stop
elif callable(self.filter):
return None
return self.slice.stop


class CallableBufrFilter(BufrFilter):
def __init__(self, v: T.Callable[[T.Any], bool]) -> None:
self.callable = v

def match(self, value: T.Any) -> bool:
if value is None:
return False
return bool(self.callable(value))

def max(self) -> T.Any:
return None


class ValueBufrFilter(BufrFilter):
def __init__(self, v: T.Any) -> None:
if isinstance(v, T.Iterable) and not isinstance(v, str):
self.set = set(v)
else:
return max(self.filter)
self.set = {v}

def match(self, value: T.Any) -> bool:
if value is None:
return False
return value in self.set

def max(self) -> T.Any:
return max(self.set)


class WigosValueBufrFilter(ValueBufrFilter):
def match(self, value: T.Any) -> bool:
if value is None:
return False
if isinstance(value, (str, WIGOSId)):
return value in self.set
return False


class WIGOSId:
def __init__(
self,
series: T.Union[str, int],
issuer: T.Union[str, int],
number: T.Union[str, int],
local: str,
) -> None:

def _convert(v):
return int(v) if v is not None else None

self.series = _convert(series)
self.issuer = _convert(issuer)
self.number = _convert(number)
self.local = local

if not isinstance(self.local, str) and self.local is not None:
raise ValueError("Invalid WIGOS local identifier={self.local}. Must be a string")

@classmethod
def from_str(cls, v: str) -> "WIGOSId":
v = v.split("-")
if len(v) != 4:
raise ValueError("Invalid WIGOS ID string")

return cls(*v)

@classmethod
def from_iterable(cls, v):
return cls(*v)

def __eq__(self, value):
if isinstance(value, WIGOSId):
return all(
x == y
for x, y in zip(
(self.series, self.issuer, self.number, self.local),
(value.series, value.issuer, value.number, value.local),
)
)
elif isinstance(value, (list, tuple)) and len(value) == 4:
return self == WIGOSId.from_iterable(value)
elif isinstance(value, str):
return self.as_str() == value
return False

@classmethod
def from_user(cls, value: T.Any) -> bool:
if isinstance(value, WIGOSId):
return cls.from_id(value)
elif isinstance(value, str):
return cls.from_str(value)
elif isinstance(value, (list, tuple)):
return cls.from_iterable(value)

def __hash__(self):
return hash(self.as_str())

def as_tuple(self):
return (self.series, self.issuer, self.number, self.local)

def as_str(self):
def _convert_str(v):
return str(v) if v is not None else "*"

return f"{_convert_str(self.series)}-{_convert_str(self.issuer)}-{_convert_str(self.number)}-{_convert_str(self.local)}"


def is_match(
Expand Down
Loading
Loading