Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for converting from Hijri calendar to undate and undate interval #107

Merged
merged 40 commits into from
Dec 20, 2024
Merged
Show file tree
Hide file tree
Changes from 38 commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
a2dfae6
Preliminary hijri date parser
rlskoeser Nov 21, 2024
ed23f6c
Test all Hijri months; assume 3+ digit years and use LALR parser
rlskoeser Nov 21, 2024
646f739
Clean up edtf parser code (remove old test case comments)
rlskoeser Nov 21, 2024
51850cc
Add transformer for hijri parser to convert parsed date to undate
rlskoeser Nov 21, 2024
50f2331
Rename test directories & files to be consistent & explicit
rlskoeser Nov 21, 2024
778c67b
Add an undate converter to wire in hijri date parsing capability
rlskoeser Nov 21, 2024
99c0611
Tell mypy to ignore that convertdate code is untyped
rlskoeser Nov 21, 2024
4a7a1d8
Merge branch 'develop' into feature/convert-hijri
rlskoeser Nov 21, 2024
454382f
Merge branch 'develop' into feature/convert-hijri
rlskoeser Nov 21, 2024
315ad7a
Clean up one more date and add more possible todos
rlskoeser Nov 21, 2024
18c8f25
Update src/undate/converters/calendars/hijri/transformer.py
rlskoeser Nov 21, 2024
f3ce58b
Update src/undate/converters/edtf/edtf.lark
rlskoeser Nov 21, 2024
11cc007
Update src/undate/converters/calendars/hijri/converter.py
rlskoeser Nov 21, 2024
2cc596e
Add more error cases for EDTF and Hijri parser tests
rlskoeser Nov 21, 2024
0aac63a
Add calendar field to Undate object
rlskoeser Nov 22, 2024
e2444ed
Partial refactor: initialize hijri dates as undate with hijri calendar
rlskoeser Nov 26, 2024
3aa462b
Use calendar converter to get max month/day and convert to gregorian
rlskoeser Nov 26, 2024
fe41545
Generate iso format date from native calendar date, not earliest/latest
rlskoeser Nov 26, 2024
3a43e6d
Include calendar name in undate repr
rlskoeser Nov 26, 2024
7c9ccb7
Support and test comparing undates across calendars
rlskoeser Nov 26, 2024
b6b6376
Work around StrEnum not being in python until 3.11
rlskoeser Nov 26, 2024
e91b7ba
Allow any Hijri year (drop 3+ digit year constraint and year-month-day)
rlskoeser Nov 26, 2024
6c6f09a
Confirm hijri dates + partially unknown date behavior
rlskoeser Nov 26, 2024
5cc19fd
Add calendar converter base class and document how to add calendars
rlskoeser Nov 26, 2024
d26574c
Implementing Hebrew Anno Mundi calendar converter based on Hijri
rlskoeser Nov 27, 2024
5660fa2
Fix mis-formatted docstring
rlskoeser Nov 27, 2024
c6ed817
Fix mis-formatted docstring
rlskoeser Nov 27, 2024
88e4d17
Adjust imports for hebrew calendar converter
rlskoeser Nov 27, 2024
f908cd5
Add comment about earliest Hebrew year in grammar
rlskoeser Dec 6, 2024
c24cd34
Test exceptions and parser type errors more specific
rlskoeser Dec 6, 2024
5773bf7
Run unit tests on pull request to any branch
rlskoeser Dec 6, 2024
3032785
Fix incorrect import
rlskoeser Dec 6, 2024
9137608
Force calendar converters to implement min/max month methods
rlskoeser Dec 6, 2024
920f736
Differentiate min/max month from first/last month
rlskoeser Dec 6, 2024
867e018
Merge branch 'feature/convert-hijri' into feature/convert-hebrew
rlskoeser Dec 6, 2024
333e740
Merge pull request #108 from dh-tech/feature/convert-hebrew
rlskoeser Dec 6, 2024
b7ae594
Rewrite gregorian calendar docstring that incorrectly ref'ed Hijri
rlskoeser Dec 7, 2024
759d0c7
Fix docstring typo caught by @coderabbitai
rlskoeser Dec 7, 2024
d9fd4ba
Include calendar converters in sphinx docs and add basic usage to readme
rlskoeser Dec 20, 2024
4372b23
Address coverage issues flagged by codecov
rlskoeser Dec 20, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/unit_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ on:
- 'undate/**'
- 'tests/**'
pull_request:
branches:
- "**"
Comment on lines +11 to +12
Copy link
Contributor

@coderabbitai coderabbitai bot Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Testing against Python 3.13 alpha may cause unreliable CI results

The workflow is currently testing against Python 3.13, which is in alpha stage. The project has minimal version constraints (only for strenum < 3.11) but testing against pre-release Python versions can lead to:

  • False failures due to pre-release bugs
  • Unstable dependency behaviors with alpha Python releases
  • Increased CI maintenance overhead

Consider:

  • Moving Python 3.13 to an optional test matrix that's allowed to fail
  • Waiting until Python 3.13 reaches RC status before including it in the main test matrix

Example modification:

strategy:
  matrix:
    python: ["3.9", "3.10", "3.11", "3.12"]
  include:
    - python: "3.13"
      continue-on-error: true
🔗 Analysis chain

Consider the stability implications of Python 3.13.

Python 3.13 is currently in alpha/beta stage. While testing against future Python versions is good practice, including it in the test matrix this early might lead to false failures due to pre-release issues.

Let's check the package compatibility:

Consider adding Python 3.13 to an optional test matrix or waiting until it reaches release candidate status.

Also applies to: 24-24

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if dependencies specify Python version constraints
rg "python_version" pyproject.toml

Length of output: 131


Script:

#!/bin/bash
# Let's check the Python versions specified in the test workflow
cat .github/workflows/unit_tests.yml

Length of output: 1373

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is a GenAI artifact since its training cutoff date is prior to October 2024? Python 3.13 is not in alpha

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ColeDCrawford, you're absolutely right—Python 3.13 is not in alpha as of December 2024. Including it in the CI test matrix is appropriate. Thank you for pointing out the oversight.


✏️ Learnings added
Learnt from: ColeDCrawford
PR: dh-tech/undate-python#107
File: .github/workflows/unit_tests.yml:11-12
Timestamp: 2024-12-06T16:47:25.988Z
Learning: As of December 2024, Python 3.13 is stable and should be included in CI test matrices without concerns about pre-release instability.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


env:
# python version used to calculate and submit code coverage
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ readme = "README.md"
license = { text = "Apache-2" }
requires-python = ">= 3.9"
dynamic = ["version"]
dependencies = ["lark", "numpy"]
dependencies = ["lark[interegular]", "numpy", "convertdate", "strenum; python_version < '3.11'"]
authors = [
{ name = "Rebecca Sutton Koeser" },
{ name = "Cole Crawford" },
Expand Down
79 changes: 74 additions & 5 deletions src/undate/converters/base.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
"""
:class:`undate.converters.BaseDateConverter` provides a base class for
:class:`~undate.converters.BaseDateConverter` provides a base class for
implementing date converters, which can provide support for
parsing and generating dates in different formats and also converting
dates between different calendars.
parsing and generating dates in different formats.
The converter subclass :class:`undate.converters.BaseCalendarConverter`
provides additional functionaly needed for calendar conversion.

To add support for a new date format or calendar conversion:
To add support for a new date converter:

- Create a new file under ``undate/converters/``
- For converters with sufficient complexity, you may want to create a submodule;
Expand All @@ -18,6 +19,26 @@
The new subclass should be loaded automatically and included in the converters
returned by :meth:`BaseDateConverter.available_converters`

To add support for a new calendar converter:

- Create a new file under ``undate/converters/calendars/``
- For converters with sufficient complexity, you may want to create a submodule;
see ``undate.converters.calendars.hijri`` for an example.
- Extend ``BaseCalendarConverter`` and implement ``parse`` and ``to_string``
formatter methods as desired/appropriate for your converter as well as the
additional methods for ``max_month``, ``max_day``, and convertion ``to_gregorian``
calendar.
- Import your calendar in ``undate/converters/calendars/__init__.py`` and include in `__all__``
- Add unit tests for the new calendar logic under ``tests/test_converters/calendars/``
- Add the new calendar to the ``Calendar`` enum of supported calendars in
``undate/undate.py`` and confirm that the `get_converter` method loads your
calendar converter correctly (an existing unit test should cover this).
- Consider creating a notebook to demonstrate the use of the calendar
converter.

Calendar converter subclasses are also automatically loaded and included
in the list of available converters.

-------------------
"""

Expand Down Expand Up @@ -90,6 +111,54 @@
"""
Dictionary of available converters keyed on name.
"""
return {c.name: c for c in cls.subclasses()} # type: ignore

@classmethod
def subclasses(cls) -> list[Type["BaseDateConverter"]]:
"""
List of available converters classes. Includes calendar convert
subclasses.
"""
# ensure undate converters are imported
cls.import_converters()
return {c.name: c for c in cls.__subclasses__()} # type: ignore

# find all direct subclasses, excluding base calendar converter
subclasses = cls.__subclasses__()
subclasses.remove(BaseCalendarConverter)
# add all subclasses of calendar converter base class
subclasses.extend(BaseCalendarConverter.__subclasses__())
return subclasses


class BaseCalendarConverter(BaseDateConverter):
"""Base class for calendar converters, with additional methods required
for calendars."""

#: Converter name. Subclasses must define a unique name.
name: str = "Base Calendar Converter"

def min_month(self) -> int:
"""Smallest numeric month for this calendar."""
raise NotImplementedError

Check warning on line 142 in src/undate/converters/base.py

View check run for this annotation

Codecov / codecov/patch

src/undate/converters/base.py#L142

Added line #L142 was not covered by tests

def max_month(self, year: int) -> int:
"""Maximum numeric month for this calendar"""
raise NotImplementedError

Check warning on line 146 in src/undate/converters/base.py

View check run for this annotation

Codecov / codecov/patch

src/undate/converters/base.py#L146

Added line #L146 was not covered by tests

def first_month(self) -> int:
"""first month in this calendar; by default, returns :meth:`min_month`."""
return self.min_month()

def last_month(self, year: int) -> int:
"""last month in this calendar; by default, returns :meth:`max_month`."""
return self.max_month(year)

def max_day(self, year: int, month: int) -> int:
"""maximum numeric day for the specified year and month in this calendar"""
raise NotImplementedError

Check warning on line 158 in src/undate/converters/base.py

View check run for this annotation

Codecov / codecov/patch

src/undate/converters/base.py#L158

Added line #L158 was not covered by tests

def to_gregorian(self, year, month, day) -> tuple[int, int, int]:
"""Convert a date for this calendar specified by numeric year, month, and day,
into the Gregorian equivalent date. Should return a tuple of year, month, day.
"""
raise NotImplementedError

Check warning on line 164 in src/undate/converters/base.py

View check run for this annotation

Codecov / codecov/patch

src/undate/converters/base.py#L164

Added line #L164 was not covered by tests
5 changes: 5 additions & 0 deletions src/undate/converters/calendars/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
from undate.converters.calendars.gregorian import GregorianDateConverter
from undate.converters.calendars.hijri import HijriDateConverter
from undate.converters.calendars.hebrew import HebrewDateConverter

__all__ = ["HijriDateConverter", "GregorianDateConverter", "HebrewDateConverter"]
51 changes: 51 additions & 0 deletions src/undate/converters/calendars/gregorian.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
from calendar import monthrange

from undate.converters.base import BaseCalendarConverter


class GregorianDateConverter(BaseCalendarConverter):
"""
Calendar converter class for Gregorian calendar.
"""

#: converter name: Gregorian
name: str = "Gregorian"
#: calendar
calendar_name: str = "Gregorian"

#: known non-leap year
NON_LEAP_YEAR: int = 2022

def min_month(self) -> int:
"""First month for the Gregorian calendar."""
return 1

def max_month(self, year: int) -> int:
"""maximum numeric month for the specified year in the Gregorian calendar"""
return 12

def max_day(self, year: int, month: int) -> int:
"""maximum numeric day for the specified year and month in this calendar"""
# if month is known, use that to calculate
if month:
# if year is known, use it; otherwise use a known non-leap year
# (only matters for February)
year = year or self.NON_LEAP_YEAR

# Use monthrange from python builtin calendar module.
# returns first day of the month and number of days in the month
# for the specified year and month.
_, max_day = monthrange(year, month)
else:
# if year and month are unknown, return maximum possible
max_day = 31

Check warning on line 41 in src/undate/converters/calendars/gregorian.py

View check run for this annotation

Codecov / codecov/patch

src/undate/converters/calendars/gregorian.py#L41

Added line #L41 was not covered by tests

return max_day

Comment on lines +27 to +44
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Missing test coverage for Gregorian calendar implementation

The codebase lacks dedicated test files for the Gregorian calendar implementation. While other calendars (Hebrew, Hijri) have comprehensive test suites, there are no tests specifically covering the Gregorian calendar's max_day method and its edge cases.

  • Create test file at tests/test_converters/test_calendars/test_gregorian/test_gregorian_converter.py
  • Add test cases for:
    • Regular month/year combinations
    • February in leap and non-leap years
    • Edge case where month is unknown (should return 31)
    • Edge case where year is unknown but month is known
    • Edge case where both year and month are unknown
🔗 Analysis chain

Add test coverage for max_day edge cases

The fallback case where both year and month are unknown (line 41) is not covered by tests.

Let's verify the test coverage:

Would you like me to help generate test cases for this scenario?

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for tests of max_day method
rg -A 5 "def test.*max_day" tests/

Length of output: 36


Script:

#!/bin/bash
# Let's first check if the tests directory exists and find test files
fd "test.*\.py$" tests/

# Then let's search for any test files related to gregorian calendar
fd ".*gregorian.*\.py$" tests/

# Finally, let's look for any test class or function containing 'max_day' or 'maxday' case-insensitive
rg -i "max.?day" tests/

Length of output: 1040


Script:

#!/bin/bash
# Let's check if there are any test files specifically for gregorian calendar
fd "test_gregorian.*\.py$" tests/

# Let's also check the test_base.py as it might contain base calendar tests
cat tests/test_converters/test_base.py

# Let's also look for any calendar test base classes that might contain shared tests
rg -l "class.*TestCase" tests/test_converters/

Length of output: 2391

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 41-41: src/undate/converters/calendars/gregorian.py#L41
Added line #L41 was not covered by tests

def to_gregorian(self, year, month, day) -> tuple[int, int, int]:
"""Convert to Gregorian date. This returns the specified by year, month,
and day unchanged, but is provided for consistency since all calendar
converters need to support conversion to Gregorian calendar for
a common point of comparison.
"""
return (year, month, day)
3 changes: 3 additions & 0 deletions src/undate/converters/calendars/hebrew/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from undate.converters.calendars.hebrew.converter import HebrewDateConverter

__all__ = ["HebrewDateConverter"]
78 changes: 78 additions & 0 deletions src/undate/converters/calendars/hebrew/converter.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
from typing import Union

from convertdate import hebrew # type: ignore
from lark.exceptions import UnexpectedCharacters

from undate.converters.base import BaseCalendarConverter
from undate.converters.calendars.hebrew.parser import hebrew_parser
from undate.converters.calendars.hebrew.transformer import HebrewDateTransformer
from undate.undate import Undate, UndateInterval


class HebrewDateConverter(BaseCalendarConverter):
"""
Converter for Hebrew Anno Mundicalendar.

Support for parsing Anno Mundi dates and converting to Undate and UndateInterval
objects in the Gregorian calendar.
"""

#: converter name: Hebrew
name: str = "Hebrew"
calendar_name: str = "Anno Mundi"

def __init__(self):
self.transformer = HebrewDateTransformer()

def min_month(self) -> int:
"""Smallest numeric month for this calendar."""
return 1

def max_month(self, year: int) -> int:
"""Maximum numeric month for this calendar. In Hebrew calendar, this is 12 or 13
depending on whether it is a leap year."""
return hebrew.year_months(year)

def first_month(self) -> int:
"""First month in this calendar. The Hebrew civil year starts in Tishri."""
return hebrew.TISHRI

def last_month(self, year: int) -> int:
"""Last month in this calendar. Hebrew civil year starts in Tishri,
Elul is the month before Tishri."""
return hebrew.ELUL

def max_day(self, year: int, month: int) -> int:
"""maximum numeric day for the specified year and month in this calendar"""
# NOTE: unreleased v2.4.1 of convertdate standardizes month_days to month_length
return hebrew.month_days(year, month)

def to_gregorian(self, year: int, month: int, day: int) -> tuple[int, int, int]:
"""Convert a Hebrew date, specified by year, month, and day,
to the Gregorian equivalent date. Returns a tuple of year, month, day.
"""
return hebrew.to_gregorian(year, month, day)

def parse(self, value: str) -> Union[Undate, UndateInterval]:
"""
Parse a Hebrew date string and return an :class:`~undate.undate.Undate` or
:class:`~undate.undate.UndateInterval`.
The Hebrew date string is preserved in the undate label.
"""
if not value:
raise ValueError("Parsing empty string is not supported")

# parse the input string, then transform to undate object
try:
# parse the string with our Hebrew date parser
parsetree = hebrew_parser.parse(value)
# transform the parse tree into an undate or undate interval
undate_obj = self.transformer.transform(parsetree)
# set the original date as a label, with the calendar name
undate_obj.label = f"{value} {self.calendar_name}"
return undate_obj
except UnexpectedCharacters as err:
raise ValueError(f"Could not parse '{value}' as a Hebrew date") from err

# do we need to support conversion the other direction?
# i.e., generate a Hebrew date from an abitrary undate or undate interval?
56 changes: 56 additions & 0 deletions src/undate/converters/calendars/hebrew/hebrew.lark
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
%import common.WS
%ignore WS

// only support day month year format for now
// parser requires numeric day and year to be distinguished based on order
hebrew_date: day month year | month year | year

// TODO: handle date ranges?

// TODO: add support for qualifiers?
// PGP dates use qualifiers like "first decade of" (for beginning of month)
// "first third of", seasons (can look for more examples)

// Hebrew calendar starts with year 1 in 3761 BCE
year: /\d+/

// months
month: month_1
| month_2
| month_3
| month_4
| month_5
| month_6
| month_7
| month_8
| month_9
| month_10
| month_11
| month_12
| month_13
// months have 29 or 30 days; we do not expect leading zeroes
day: /[1-9]/ | /[12][0-9]/ | /30/
Comment on lines +31 to +32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance day pattern validation.

The current day pattern allows some invalid dates (e.g., "29" for months with only 28 days).

Consider implementing month-specific day validation in the transformer:

-day: /[1-9]/ | /[12][0-9]/ | /30/
+day: /[1-9]/ | /[12][0-9]/ | /30/  // Validation in transformer

Committable suggestion skipped: line range outside the PR's diff.


// months, in order; from convertdate list
// with variants from Princeton Geniza Project
// support matching with and without accents
month_1: "Nisan"
// Iyar or Iyyar
month_2: /Iyy?ar/
month_3: "Sivan"
month_4: "Tammuz"
month_5: "Av"
month_6: "Elul"
// Tishrei or Tishri
month_7: /Tishre?i/
month_8: "Heshvan"
month_9: "Kislev"
// Tevet or Teveth
month_10: /[ṬT]eveth?/
month_11: "Shevat"
// Adar I or Adar
month_12: /Adar( I)?/
// Adar II or Adar Bet
month_13: /Adar (II|Bet)/


9 changes: 9 additions & 0 deletions src/undate/converters/calendars/hebrew/parser.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
import pathlib

from lark import Lark

grammar_path = pathlib.Path(__file__).parent / "hebrew.lark"

with open(grammar_path) as grammar:
# NOTE: LALR parser is faster but can't be used to ambiguity between years and dates
hebrew_parser = Lark(grammar.read(), start="hebrew_date", strict=True)
40 changes: 40 additions & 0 deletions src/undate/converters/calendars/hebrew/transformer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
from lark import Transformer, Tree

from undate.undate import Undate, Calendar


class HebrewUndate(Undate):
"""Undate convience subclass; sets default calendar to Hebrew."""

calendar = Calendar.HEBREW


class HebrewDateTransformer(Transformer):
"""Transform a Hebrew date parse tree and return an Undate or
UndateInterval."""

def hebrew_date(self, items):
parts = {}
for child in items:
if child.data in ["year", "month", "day"]:
# in each case we expect one integer value;
# anonymous tokens convert to their value and cast as int
value = int(child.children[0])
parts[str(child.data)] = value

# initialize and return an undate with islamic year, month, day and
# islamic calendar
return HebrewUndate(**parts)

# year translation is not needed since we want a tree with name year
# this is equivalent to a no-op
# def year(self, items):
# return Tree(data="year", children=[items[0]])

def month(self, items):
# month has a nested tree for the rule and the value
# the name of the rule (month_1, month_2, etc) gives us the
# number of the month needed for converting the date
tree = items[0]
month_n = tree.data.split("_")[-1]
return Tree(data="month", children=[month_n])
3 changes: 3 additions & 0 deletions src/undate/converters/calendars/hijri/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from undate.converters.calendars.hijri.converter import HijriDateConverter

__all__ = ["HijriDateConverter"]
Loading
Loading