Skip to content

Commit

Permalink
Implement WIGOS_station_id computed key (#80)
Browse files Browse the repository at this point in the history
* Implement WIGOS_station_id computed key
  • Loading branch information
sandorkertesz authored Jan 14, 2025
1 parent 95499c4 commit 67325e9
Show file tree
Hide file tree
Showing 11 changed files with 474 additions and 60 deletions.
97 changes: 86 additions & 11 deletions docs/read_bufr.rst
Original file line number Diff line number Diff line change
Expand Up @@ -51,20 +51,73 @@ BUFR keys

The "count" generated key, which refers to the message index, is also supported but please note that message indexing starts at 1 and not at 0!

Computed BUFR keys
-------------------

There is also a set of **computed keys** that can be used for :func:`read_bufr`:

* "data_datetime" (datetime.datetime): generated from the "year", "month", "day", "hour", "minute", "second" keys in the BUFR data section.
* "typical_datetime" (datetime.datetime): generated from the "typicalYear", "typicalMonth", "typicalDay", "typicalHour", "typicalMinute", "typicalSecond" keys in the BUFR header section.
* "WMO_station_id": generated from the "blockNumber" and "stationNumber" keys as::

.. _key_data_datetime:

data_datetime
+++++++++++++++

Generated from the "year", "month", "day", "hour", "minute", "second" keys in the BUFR data section. The values are converted to a **datetime.datetime** object.


.. _key_typical_datetime:

typical_datetime
+++++++++++++++++

Generated from the "typicalYear", "typicalMonth", "typicalDay", "typicalHour", "typicalMinute", "typicalSecond" keys in the BUFR header section. The values are converted to a **datetime.datetime** object.


.. _key_wmo_station_id:

WMO_station_id
++++++++++++++++

Generated from the "blockNumber" and "stationNumber" keys as::

blockNumber*1000+stationNumber

* "geometry": values extracted as a list of::

.. _key_wigos_station_id:

WIGOS_station_id
++++++++++++++++++

*New in version 0.12.0*

Generated from the "wigosIdentifierSeries", "wigosIssuerOfIdentifier", "wigosIssueNumber" and "wigosLocalIdentifierCharacter" keys as a str in the following format::

"{wigosIdentifierSeries}-{wigosIssuerOfIdentifier}-{wigosIssueNumber}-{wigosLocalIdentifierCharacter}


For example: "0-705-0-1931".

When using "WIGOS_station_id" in ``filters`` the value can be given as a str or as a tuple/list of 4 values. See: :ref:`filters-section`.

Details about the WIGOS identifiers can be found `here <https://community.wmo.int/en/activity-areas/WIGOS/implementation-WIGOS/FAQ-WSI>`_.

.. _key_geometry:

geometry
++++++++++

Values extracted as a list of::

[longitude,latitude,heightOfStationGroundAboveMeanSeaLevel]

as required for geopandas.
* "CRS": generated from the "coordinateReferenceSystem" key using the following mapping:
as required for geopandas.

.. _crs:

CRS
++++

Generated from the "coordinateReferenceSystem" key using the following mapping:

.. list-table::
:header-rows: 1
Expand All @@ -91,9 +144,11 @@ BUFR keys
- EPSG:4326


.. note::

The computed keys do not preserve their position in ``columns`` but are placed to the end of the resulting DataFrame.
.. note::

Computed keys do not preserve their position in ``columns`` but are placed to the end of the resulting DataFrame.


.. _filters-section:

Expand All @@ -110,16 +165,36 @@ Single value
.. code-block:: python
filters = {"blockNumber": 12}
filters = {"WMO_station_id": 12925}
List of values
++++++++++++++
# The "WIGOS_station_id" can be specified in various ways
# When tuple/list is used the first 3 values must be integers, the last one must be a string.
filters = {"WIGOS_station_id": "0-705-0-1931"}
filters = {"WIGOS_station_id": (0, 705, 0, "1931")}
# However, implicit str to int conversion is done for the first 3 values, so this is also valid.
filters = {"WIGOS_station_id": ("0", "705", "0", "1931")}
List/tuple/set of values
++++++++++++++++++++++++++

A list of values specifies an "in" relation:
A list/tuple/set of values specifies an "in" relation:

.. code-block:: python
filters = {"stationNumber": [843, 925]}
filters = {"blockNumber": range(10, 13)}
filters = {"WMO_station_id": [12925, 12843]}
# The "WIGOS_station_id" can be specified in various ways.
# When tuple/list is used in an id the first 3 values must be integers, the last one must be a string.
filters = {"WIGOS_station_id": ["0-705-0-1931", "0-705-0-1932"]}
filters = {"WIGOS_station_id": ((0, 705, 0, "1931"), (0, 705, 0, "1932"))}
# However, implicit str to int conversion is done for the first 3 values, so this is also valid.
filters = {
"WIGOS_station_id": [("0", "705", "0", "1931"), ("0", "705", "0", "1932")]
}
Slices
++++++++
Expand Down
5 changes: 3 additions & 2 deletions docs/release_notes/version_0.12_updates.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,6 @@ Version 0.12 Updates
Version 0.12.0
===============

- fixed issue when the ``filter`` was not applied if it contained a computed key and ``required_columns`` were set when (:pr:`79`)
- fixed issue when the ``filter`` containing a computed key matched subsets/messages where the computed key value was missing
- implemented the ``WIGOS_station_id`` computed key for the `WIGOS station identifier <https://community.wmo.int/en/activity-areas/WIGOS/implementation-WIGOS/FAQ-WSI>`_. See details :ref:`here <key_wigos_station_id>`.
- fixed issue when the ``filters`` was not applied if it contained a computed key and ``required_columns`` were set when (:pr:`79`)
- fixed issue when the ``filters`` containing a computed key matched subsets/messages where the computed key value was missing (:pr:`79`)
3 changes: 2 additions & 1 deletion src/pdbufr/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,10 @@
__version__ = "999"


from .bufr_filters import WIGOSId
from .bufr_structure import stream_bufr

__all__ = ["stream_bufr"]
__all__ = ["stream_bufr", "WIGOSId"]

try:
from .bufr_read import read_bufr
Expand Down
197 changes: 165 additions & 32 deletions src/pdbufr/bufr_filters.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,53 +8,186 @@

import logging
import typing as T

import attr # type: ignore
from abc import ABCMeta
from abc import abstractmethod

LOG = logging.getLogger(__name__)

WIGOS_ID_KEY = "WIGOS_station_id"

@attr.attrs(auto_attribs=True, frozen=True)
class BufrFilter:
filter: T.Union[slice, T.Set[T.Any], T.Callable[[T.Any], bool]]

@classmethod
def from_user(cls, user_filter: T.Any) -> "BufrFilter":
filter: T.Union[slice, T.Set[T.Any], T.Callable[[T.Any], bool]]

if isinstance(user_filter, slice):
if user_filter.step is not None:
LOG.warning(f"slice filters ignore the step {user_filter.step}")
filter = user_filter
elif isinstance(user_filter, T.Iterable) and not isinstance(user_filter, str):
filter = set(user_filter)
elif callable(user_filter):
filter = user_filter
def normalise_wigos(value):
if value is not None:
if isinstance(value, WIGOSId):
return value
try:
return WIGOSId.from_user(value)
except Exception:
if isinstance(value, (list, tuple)):
value = [normalise_wigos(x) for x in value]
return value

raise ValueError(f"Invalid WIGOS ID value: {value}")


class BufrFilter(metaclass=ABCMeta):
@abstractmethod
def match(self, value: T.Any) -> bool:
pass

@abstractmethod
def max(self) -> T.Any:
pass

@staticmethod
def from_user(value: T.Any, key: str = None) -> "BufrFilter":
if isinstance(value, slice):
return SliceBufrFilter(value)
elif callable(value):
return CallableBufrFilter(value)
else:
filter = {user_filter}
return cls(filter)
if key == WIGOS_ID_KEY:
value = normalise_wigos(value)
return WigosValueBufrFilter(value)
else:
return ValueBufrFilter(value)


class EmptyBufrFilter(BufrFilter):
def __init__(self) -> None:
super().__init__(slice(None, None, None))

def match(self, value: T.Any) -> bool:
return True

def max(self) -> T.Any:
return None


class SliceBufrFilter(BufrFilter):
def __init__(self, v: slice) -> None:
self.slice = v
if self.slice.step is not None:
LOG.warning(f"slice filters ignore the step={self.slice.step} in slice={self.slice}")

def match(self, value: T.Any) -> bool:
if value is None:
return False
if isinstance(self.filter, slice):
if self.filter.start is not None and value < self.filter.start:
return False
elif self.filter.stop is not None and value > self.filter.stop:
return False
elif callable(self.filter):
return bool(self.filter(value))
elif value not in self.filter:
if self.slice.start is not None and value < self.slice.start:
return False
elif self.slice.stop is not None and value > self.slice.stop:
return False
return True

def max(self) -> T.Any:
if isinstance(self.filter, slice):
return self.filter.stop
elif callable(self.filter):
return None
return self.slice.stop


class CallableBufrFilter(BufrFilter):
def __init__(self, v: T.Callable[[T.Any], bool]) -> None:
self.callable = v

def match(self, value: T.Any) -> bool:
if value is None:
return False
return bool(self.callable(value))

def max(self) -> T.Any:
return None


class ValueBufrFilter(BufrFilter):
def __init__(self, v: T.Any) -> None:
if isinstance(v, T.Iterable) and not isinstance(v, str):
self.set = set(v)
else:
return max(self.filter)
self.set = {v}

def match(self, value: T.Any) -> bool:
if value is None:
return False
return value in self.set

def max(self) -> T.Any:
return max(self.set)


class WigosValueBufrFilter(ValueBufrFilter):
def match(self, value: T.Any) -> bool:
if value is None:
return False
if isinstance(value, (str, WIGOSId)):
return value in self.set
return False


class WIGOSId:
def __init__(
self,
series: T.Union[str, int],
issuer: T.Union[str, int],
number: T.Union[str, int],
local: str,
) -> None:

def _convert(v):
return int(v) if v is not None else None

self.series = _convert(series)
self.issuer = _convert(issuer)
self.number = _convert(number)
self.local = local

if not isinstance(self.local, str) and self.local is not None:
raise ValueError("Invalid WIGOS local identifier={self.local}. Must be a string")

@classmethod
def from_str(cls, v: str) -> "WIGOSId":
v = v.split("-")
if len(v) != 4:
raise ValueError("Invalid WIGOS ID string")

return cls(*v)

@classmethod
def from_iterable(cls, v):
return cls(*v)

def __eq__(self, value):
if isinstance(value, WIGOSId):
return all(
x == y
for x, y in zip(
(self.series, self.issuer, self.number, self.local),
(value.series, value.issuer, value.number, value.local),
)
)
elif isinstance(value, (list, tuple)) and len(value) == 4:
return self == WIGOSId.from_iterable(value)
elif isinstance(value, str):
return self.as_str() == value
return False

@classmethod
def from_user(cls, value: T.Any) -> bool:
if isinstance(value, WIGOSId):
return cls.from_id(value)
elif isinstance(value, str):
return cls.from_str(value)
elif isinstance(value, (list, tuple)):
return cls.from_iterable(value)

def __hash__(self):
return hash(self.as_str())

def as_tuple(self):
return (self.series, self.issuer, self.number, self.local)

def as_str(self):
def _convert_str(v):
return str(v) if v is not None else "*"

return f"{_convert_str(self.series)}-{_convert_str(self.issuer)}-{_convert_str(self.number)}-{_convert_str(self.local)}"


def is_match(
Expand Down
Loading

0 comments on commit 67325e9

Please sign in to comment.