Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use numpy datetime64 instead of datetime.date #77 #84

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

rlskoeser
Copy link
Member

@rlskoeser rlskoeser commented Jun 6, 2024

ref #77

  • new local date object that lets us interact with numpy.datetime64 like a python datetime.date

previous notes

I implemented this as spike to see how hard it would be to use numpy's datetime64 instead of python's builtin datetime.date. Using np.datetime64 gets us way past the year limitation of datetime.date ([2.5e16 BC, 2.5e16 AD] for day-precision dates).

We can make this an optional dependency so you would only install numpy if you need the expanded year range. Making it optional makes the code and the testing more complicated. Do you think numpy as a dependency would be a concern for folks doing e.g web-based projects?

(type check is failing because I haven't bothered updating the types yet)

Summary by CodeRabbit

  • New Features

    • Introduced a custom Date class for enhanced date handling with properties for year, month, and day.
    • Added DatePrecision enumeration for categorizing date precision levels.
    • Included numpy as a required dependency to support numerical computations.
  • Bug Fixes

    • Improved date formatting logic for better handling of month and day portions.
  • Tests

    • Added unit tests for the new Date and DatePrecision classes to ensure functionality.
    • Updated tests for the Undate class to reflect changes in duration handling.

@rlskoeser rlskoeser force-pushed the experiment/numpy-datetime64 branch from 81299bb to 3306bde Compare June 6, 2024 21:09
@rlskoeser rlskoeser marked this pull request as draft June 6, 2024 21:17
@rlskoeser rlskoeser changed the title Experiment: use numpy datetime64 instead of datetime.date Experiment: use numpy datetime64 instead of datetime.date #77 Jun 7, 2024
@rlskoeser rlskoeser marked this pull request as ready for review August 16, 2024 21:56
@rlskoeser rlskoeser changed the title Experiment: use numpy datetime64 instead of datetime.date #77 Use numpy datetime64 instead of datetime.date #77 Aug 19, 2024
@rlskoeser
Copy link
Member Author

@coderabbitai full review

Copy link

coderabbitai bot commented Aug 30, 2024

Actions performed

Full review triggered.

Copy link

coderabbitai bot commented Aug 30, 2024

Walkthrough

The changes involve the addition of type checking with MyPy in pyproject.toml, the inclusion of numpy as a dependency in setup.cfg, and the introduction of a custom Date class in src/undate/date.py. The Undate class in src/undate/undate.py has been modified to utilize the new Date class, and corresponding unit tests have been added in tests/test_date.py and tests/test_undate.py to validate these updates.

Changes

File(s) Change Summary
pyproject.toml Added [tool.mypy] section with plugins = ["numpy.typing.mypy_plugin"].
setup.cfg Added numpy to the install_requires section.
src/undate/date.py Introduced Date class extending numpy.datetime64, with properties and methods for date handling.
src/undate/undate.py Updated Undate class to use Date type for earliest and latest, modified comparison methods.
src/undate/dateformat/iso8601.py Modified to_string method for enhanced date formatting.
tests/test_date.py Added unit tests for Date and DatePrecision classes.
tests/test_undate.py Updated tests for Undate class to accommodate changes in duration handling.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Undate
    participant Date

    User->>Undate: Initialize with dates
    Undate->>Date: Create Date instances
    Date-->>Undate: Return Date objects
    User->>Undate: Call duration()
    Undate->>Date: Calculate duration
    Date-->>Undate: Return duration value
    Undate-->>User: Return duration result
Loading

🐰 In a world of dates and times,
I hop along with joyful rhymes.
With NumPy's help, I leap and play,
Creating dates in a brand new way!
So let us cheer, oh what a sight,
For changes bright, our future's light! 🌟


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

Outside diff range, codebase verification and nitpick comments (2)
setup.cfg (1)

44-44: Approved: Addition of NumPy as a Dependency

The inclusion of numpy in the install_requires section is crucial for the project's functionality, especially given the shift to using numpy.datetime64. However, consider the implications of this addition on installation size and dependency management, particularly for users in constrained environments.

Consider making numpy an optional dependency if feasible, to accommodate users who may not require the extended functionality.

src/undate/undate.py (1)

Line range hint 287-316: Review the duration method implementation.

The duration method has been significantly changed to accommodate the new Date class and potentially return numpy.timedelta64. Ensure that the logic correctly calculates the duration based on the date precision and handles all edge cases, especially with unknown or partially known dates.

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between fe7c795 and 329fa3d.

Files selected for processing (7)
  • pyproject.toml (1 hunks)
  • setup.cfg (1 hunks)
  • src/undate/date.py (1 hunks)
  • src/undate/dateformat/iso8601.py (1 hunks)
  • src/undate/undate.py (9 hunks)
  • tests/test_date.py (1 hunks)
  • tests/test_undate.py (3 hunks)
Additional context used
Ruff
tests/test_date.py

1-1: numpy imported but unused

Remove unused import: numpy

(F401)


3-3: undate.date.ONE_DAY imported but unused

Remove unused import

(F401)


3-3: undate.date.ONE_YEAR imported but unused

Remove unused import

(F401)


3-3: undate.date.ONE_MONTH_MAX imported but unused

Remove unused import

(F401)


24-24: Redefinition of unused test_init_year_month from line 18

(F811)


42-42: Comparison to None should be cond is None

Replace with cond is None

(E711)

src/undate/date.py

4-4: typing.Dict imported but unused

Remove unused import

(F401)


4-4: typing.Union imported but unused

Remove unused import

(F401)


68-68: Use self.dtype != "datetime64[Y]" instead of not self.dtype == "datetime64[Y]"

Replace with != operator

(SIM201)

src/undate/undate.py

6-6: typing.Any imported but unused

Remove unused import: typing.Any

(F401)


8-8: numpy imported but unused

Remove unused import: numpy

(F401)


9-9: numpy.typing.ArrayLike imported but unused

Remove unused import

(F401)


9-9: numpy.typing.DTypeLike imported but unused

Remove unused import

(F401)

Additional comments not posted (3)
pyproject.toml (1)

8-9: Approved: Enhanced Type Checking for NumPy

The addition of the [tool.mypy] section with the NumPy typing plugin is a significant improvement for the project. It ensures enhanced type safety and better integration with NumPy, which is crucial given the project's new dependency on this library.

src/undate/undate.py (2)

267-267: Static method usage.

The from_datetime_date method is correctly updated to handle the datetime.date input and convert it to an Undate object. This method serves as a bridge between the old datetime.date usage and the new Date class, facilitating easier transition and integration.


128-129: Initialization of Date objects.

The earliest and latest attributes of the Undate class are now initialized using the custom Date class. Ensure that the Date class constructor correctly handles the extreme values and any edge cases that might arise from the broad range of years.

Verification successful

Initialization of Date objects is well-handled.

The Date class uses NumPy's datetime64 for date handling, and test cases cover various initialization scenarios, ensuring robustness against edge cases. The __new__ method in Date handles year, month, and day parameters effectively. No further action is needed.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify the Date class constructor with extreme values.
ast-grep --lang python --pattern $'class Date {
  $$$
  __init__($_, $_, $_) {
    $$$
  }
  $$$  
}'

Length of output: 447


Script:

#!/bin/bash
# Correct the pattern to search for the Date class and its __init__ method in Python.
ast-grep --lang python --pattern $'class Date:\n  def __init__(self, $_, $_, $_):\n    $$$'

Length of output: 90


Script:

#!/bin/bash
# Search for any class definitions related to dates in the codebase.
rg 'class\s+\w+' -A 2

# Search for any imports related to date functionality.
rg 'import\s+\w+' -A 2

Length of output: 11688


Script:

#!/bin/bash
# Extract the __init__ method of the Date class from src/undate/date.py to review its implementation.
ast-grep --lang python --pattern $'class Date(np.ndarray):\n  def __init__(self, $_, $_, $_):\n    $$$'

Length of output: 102


Script:

#!/bin/bash
# Search for method definitions within the Date class in src/undate/date.py.
rg 'def\s+\w+' src/undate/date.py -A 2

# Search for test cases that use the Date class to check for handling of extreme values.
rg 'Date\(' tests/ -A 2

Length of output: 1921

assert d.dtype == "datetime64[M]"
assert str(d) == "2010-05"

def test_init_year_month(self):
Copy link

@coderabbitai coderabbitai bot Aug 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct Redefinition of Test Function

The function test_init_year_month is defined twice (lines 18 and 24), which is likely an oversight. This redefinition could cause one of the tests to be skipped, leading to incomplete test coverage.

Consider renaming one of the functions or combining them if they are intended to test different scenarios. Here is a proposed fix to rename the second function to test_init_year_month_day:

-    def test_init_year_month(self):
+    def test_init_year_month_day(self):

Also applies to: 18-18

Tools
Ruff

24-24: Redefinition of unused test_init_year_month from line 18

(F811)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah, I think I found and fixed this in a different branch when I was working on it yesterday

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rlskoeser, thanks for the update! Since you've already fixed this in another branch, no further action is needed here regarding the duplicate test function. I'll proceed with the rest of the review.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Comment on lines +1 to +3
import numpy as np

from undate.date import Date, DatePrecision, ONE_DAY, ONE_YEAR, ONE_MONTH_MAX
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove Unused Imports

The static analysis has correctly identified several unused imports:

  • numpy is imported but not used directly in the tests.
  • ONE_DAY, ONE_YEAR, and ONE_MONTH_MAX are imported but not used.

Consider removing these to clean up the code and reduce confusion about dependencies.

Apply this diff to remove the unused imports:

-import numpy as np
-from undate.date import Date, DatePrecision, ONE_DAY, ONE_YEAR, ONE_MONTH_MAX
+from undate.date import Date, DatePrecision
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import numpy as np
from undate.date import Date, DatePrecision, ONE_DAY, ONE_YEAR, ONE_MONTH_MAX
from undate.date import Date, DatePrecision
Tools
Ruff

1-1: numpy imported but unused

Remove unused import: numpy

(F401)


3-3: undate.date.ONE_DAY imported but unused

Remove unused import

(F401)


3-3: undate.date.ONE_YEAR imported but unused

Remove unused import

(F401)


3-3: undate.date.ONE_MONTH_MAX imported but unused

Remove unused import

(F401)


def test_properties_day(self):
assert Date(2001).day is None
assert Date(2010, 5).day == None
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use is for None Comparisons

The comparison to None using == is not recommended. Using is is the correct practice in Python for clarity and correctness.

Apply this diff to correct the comparison:

-        assert Date(2010, 5).day == None
+        assert Date(2010, 5).day is None
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
assert Date(2010, 5).day == None
assert Date(2010, 5).day is None
Tools
Ruff

42-42: Comparison to None should be cond is None

Replace with cond is None

(E711)

Comment on lines +1 to +4
from enum import IntEnum

# Pre 3.10 requires Union for multiple types, e.g. Union[int, None] instead of int | None
from typing import Optional, Dict, Union
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove unused imports.

The imports for Dict and Union from the typing module are not used in the file. Removing these will clean up the code and reduce confusion about dependencies.

Apply this diff to remove the unused imports:

-from typing import Optional, Dict, Union
+from typing import Optional
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from enum import IntEnum
# Pre 3.10 requires Union for multiple types, e.g. Union[int, None] instead of int | None
from typing import Optional, Dict, Union
from enum import IntEnum
# Pre 3.10 requires Union for multiple types, e.g. Union[int, None] instead of int | None
from typing import Optional
Tools
Ruff

4-4: typing.Dict imported but unused

Remove unused import

(F401)


4-4: typing.Union imported but unused

Remove unused import

(F401)

Comment on lines +61 to +75
@property
def year(self):
return int(str(self.astype("datetime64[Y]")))

@property
def month(self):
# if date unit is year, don't return a month (only M/D)
if not self.dtype == "datetime64[Y]":
return int(str(self.astype("datetime64[M]")).split("-")[-1])

@property
def day(self):
# only return a day if date unit is in days
if self.dtype == "datetime64[D]":
return int(str(self.astype("datetime64[D]")).split("-")[-1])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refine logical expressions in property methods.

The properties month and day use a less readable form of checking the data type. Replacing not self.dtype == "datetime64[Y]" with self.dtype != "datetime64[Y]" will improve the readability and maintainability of the code.

Apply this diff to refine the logical expression:

-        if not self.dtype == "datetime64[Y]":
+        if self.dtype != "datetime64[Y]":
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@property
def year(self):
return int(str(self.astype("datetime64[Y]")))
@property
def month(self):
# if date unit is year, don't return a month (only M/D)
if not self.dtype == "datetime64[Y]":
return int(str(self.astype("datetime64[M]")).split("-")[-1])
@property
def day(self):
# only return a day if date unit is in days
if self.dtype == "datetime64[D]":
return int(str(self.astype("datetime64[D]")).split("-")[-1])
@property
def year(self):
return int(str(self.astype("datetime64[Y]")))
@property
def month(self):
# if date unit is year, don't return a month (only M/D)
if self.dtype != "datetime64[Y]":
return int(str(self.astype("datetime64[M]")).split("-")[-1])
@property
def day(self):
# only return a day if date unit is in days
if self.dtype == "datetime64[D]":
return int(str(self.astype("datetime64[D]")).split("-")[-1])
Tools
Ruff

68-68: Use self.dtype != "datetime64[Y]" instead of not self.dtype == "datetime64[Y]"

Replace with != operator

(SIM201)

Comment on lines +257 to +263
return all(
[
self.earliest <= other.earliest,
self.latest >= other.latest,
# is precision sufficient for comparing partially known dates?
self.precision > other.precision,
]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review precision comparison in __contains__ method.

The comparison of self.precision > other.precision in the __contains__ method assumes a certain hierarchy or ordering of precision values. This logic might not be intuitive or could lead to incorrect results if the precision levels are not strictly comparable in a linear fashion.


# Pre 3.10 requires Union for multiple types, e.g. Union[int, None] instead of int | None
from typing import Optional, Dict, Union
from typing import Optional, Dict, Union, Any
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove unused import.

The import for typing.Any is flagged as unused. Removing unused imports helps maintain the cleanliness and readability of the code.

Proposed removal:

-from typing import Optional, Dict, Union, Any
+from typing import Optional, Dict, Union
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from typing import Optional, Dict, Union, Any
from typing import Optional, Dict, Union
Tools
Ruff

6-6: typing.Any imported but unused

Remove unused import: typing.Any

(F401)

Comment on lines +8 to +9
import numpy as np
from numpy.typing import ArrayLike, DTypeLike
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove unused imports.

The imports for numpy, ArrayLike, and DTypeLike are flagged as unused by static analysis tools. If these are not used in the current implementation, they should be removed to clean up the code.

Proposed removal:

-import numpy as np
-from numpy.typing import ArrayLike, DTypeLike
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import numpy as np
from numpy.typing import ArrayLike, DTypeLike
Tools
Ruff

8-8: numpy imported but unused

Remove unused import: numpy

(F401)


9-9: numpy.typing.ArrayLike imported but unused

Remove unused import

(F401)


9-9: numpy.typing.DTypeLike imported but unused

Remove unused import

(F401)

Comment on lines +64 to +68
elif date_portion == "month":
date_parts.append("%02d" % undate.earliest.month)
elif date_portion == "day":
date_parts.append("%02d" % undate.earliest.day)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enhanced date portion handling in to_string method.

The modifications to explicitly handle "month" and "day" portions improve the method's ability to format dates accurately. The use of string formatting ensures that these portions are always represented with two digits, which enhances consistency.

However, the removal of the previously commented-out code (lines 69-74) reduces the method's flexibility to handle other date portions dynamically. If flexibility in handling various date formats is a requirement, consider reintroducing or refactoring this part of the code.

Comment on lines +289 to +303
# assert isinstance(day_duration, timedelta)
assert day_duration.astype("int") == 1

january_duration = Undate(2022, 1).duration()
assert january_duration.days == 31
assert january_duration.astype("int") == 31
feb_duration = Undate(2022, 2).duration()
assert feb_duration.days == 28
assert feb_duration.astype("int") == 28
# next leap year will be 2024
leapyear_feb_duration = Undate(2024, 2).duration()
assert leapyear_feb_duration.days == 29
assert leapyear_feb_duration.astype("int") == 29

year_duration = Undate(2022).duration()
assert year_duration.days == 365
assert year_duration.astype("int") == 365
leapyear_duration = Undate(2024).duration()
assert leapyear_duration.days == 366
assert leapyear_duration.astype("int") == 366
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated assertions in test_duration method.

The replacement of timedelta assertions with astype("int") reflects a change in the duration method's return type. This update simplifies the assertions and aligns them with the new method implementation.

However, it would be beneficial to add comments or documentation in the Undate class to clarify why the duration method's return type was changed and how it should be used.

Would you like me to help draft the necessary documentation or comments?

@ColeDCrawford
Copy link
Collaborator

ColeDCrawford commented Aug 30, 2024

Observations on coderabbit (will keep updating):

  • It used ruff even though we're using black / flake8. ruff is definitely worth moving too but worth noting that it didn't align with the project's conventions. I think this is going to be one of the negative externalities of gen AI code tools - they will work well on high profile languages / frameworks with lots of existing examples and much less well on those outside of that scope. In this case if you want to use a different formatter or linter that doesn't have support in their agent, then you are SOL.
  • Memaid sequence diagram generation is cool, if not particularly groundbreaking or even useful here.

# adapted from https://stackoverflow.com/a/27129510/9706217

def __new__(cls, year: int, month: Optional[int] = None, day: Optional[int] = None):
if isinstance(year, np.datetime64):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's my non-existing knowledge about typing in python, but shouldn't year be of type int per line 24? Or is datetime64 a subclass of int?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my question is if the type of year should be a different one :)

Copy link
Collaborator

@ColeDCrawford ColeDCrawford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of my comments are in setup.cfg around NumPy - mostly just thinking through the implications of adding it as a dependency. Overall I think I am sold as it expands the capability of the undate package quite a bit. It would be useful to add some benchmarks to the release notes for the next version to detail the previous and new installed sizes for undate, and the previous / new runtimes for some operations to see how numpy impacts performance and payload.

assert d.dtype == "datetime64[M]"
assert str(d) == "2010-05"

def test_init_year_month(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch

Comment on lines +70 to +74
# numpy datetime is stored as 64-bit integer, so min/max
# depends on the time unit; assume days for now
# See https://numpy.org/doc/stable/reference/arrays.datetime.html#datetime-units
max_year = int(2.5e16)
min_year = int(-2.5e16)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know that I agree with the suggestion that this be configurable or what the advantages of that would be, if we're initializing a 64bit int I don't think making the min/max smaller would save any memory. If we're committing to NumPy then we should use it consistently.

@@ -41,6 +41,7 @@ python_requires = >=3.8
install_requires =
python-dateutil
lark
numpy
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we merge #88 first, we'll have to move this to the pyproject.toml dependencies.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to limit this to NumPy v2, which was released in May? That was the first major release since 2006. If we do limit it to v2, we should also drop support for Python < 3.10. It looks like NumPy v2 requires Python >= 3.10.

I'm pretty sure that there are no direct Python dependencies, just some optional dependencies for speeding up operations. pip should provide pre-built wheels.

NumPy looks like it will add 30-80MB to the payload. For most applications that's not an issue. But maybe we should benchmark the current version of undate and the size with numpy? The environments that I'm typically space-constrained in are Lambdas. I think there is still a 50MB (zipped) and 250MB (unzipped) limit for Lambda functions, so this could take up a significant chunk of that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm still leaning towards using NumPy given the need to support dates outside of 9999. The other way that we could go is with struct_time, which is what python-edtf uses. But it's a little clunky and you'd still have to have a shim class (like you add here) or a conversion function.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to make NumPy optional seems like a nightmare

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking it would be nice to move to Python 3.10+ anyway, it gets us improved python type hinting and 3.9 is getting old enough at this point that it seems reasonable to drop.

@rlskoeser
Copy link
Member Author

I'll update this and reconcile with the other merged PRs, address the flagged comments about type hints, and then do some testing and documentation of benchmarking as suggested before I merge this.

Copy link

@kypso-io kypso-io bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📋 Review Summary

  • Number of files reviewed: 7
  • Number of comments: 3
  • Number of suggestions: 0

📚 File changes

File path File changes
pyproject.toml Added configuration for mypy to include numpy typing plugin.
setup.cfg Added numpy as a required dependency in the install_requires section.
src/undate/date.py Introduced a custom Date class for enhanced date handling with properties for year, month, and day.
src/undate/dateformat/iso8601.py Refactored the date formatting logic to directly append formatted strings for year, month, and day instead of using strftime.
src/undate/undate.py No summary.
tests/test_date.py Added tests for the new Date and DatePrecision classes.
tests/test_undate.py Removed the DatePrecision class and its associated tests, and updated duration tests to use astype('int') instead of checking for timedelta.

Comment on lines +46 to +48
raise Exception(
f"Unable to parse dates adequately as {expected_dtype}: {data}"
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error_handling_issue
The exception raised in the __new__ method does not specify the type of error. It would be better to raise a more specific exception, such as ValueError, to indicate that the input date could not be parsed correctly.

# if date unit is year, don't return a month (only M/D)
if not self.dtype == "datetime64[Y]":
return int(str(self.astype("datetime64[M]")).split("-")[-1])

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error_handling_issue
The month property does not have a return statement if the condition is not met. This could lead to returning None implicitly, which may not be the intended behavior. Consider raising an exception or returning a default value instead.

def day(self):
# only return a day if date unit is in days
if self.dtype == "datetime64[D]":
return int(str(self.astype("datetime64[D]")).split("-")[-1])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error_handling_issue
The day property also lacks a return statement if the condition is not met. Similar to the month property, consider raising an exception or returning a default value to avoid returning None implicitly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants