Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fractions for ranges #191

Merged
merged 3 commits into from
Jan 27, 2025
Merged

Conversation

synesthesiam
Copy link
Contributor

@synesthesiam synesthesiam commented Jan 24, 2025

Extends range lists with fractions, either halves or tenths:

lists:
  range_with_halves:
    from: 1
    to: 2
    fractions: halves

  range_with_tenths:
    from: 1
    to: 2
    fractions: tenths

The range_with_halves list will accept 1, 1.5, 2, and 2.5 as well as their spoken equivalents according to unicode-rbnf (e.g., "one point five").

The range_with_tenths list will accept 1, 1.1, ..., 2, 2.1, ..., 2.9 as well as the spoken equivalents.

Summary by CodeRabbit

  • New Features

    • Added support for fractional ranges (halves and tenths) in number recognition and parsing.
    • Enhanced numeric handling to include decimal and fractional values.
    • Expanded range slot list functionality to generate numbers with fractional representations.
    • Updated text cleaning process to preserve punctuation within numeric values.
  • Tests

    • Added test cases for range lists with halves and tenths.
    • Improved test coverage for fractional number parsing.

Copy link

coderabbitai bot commented Jan 24, 2025

📝 Walkthrough

Walkthrough

The pull request introduces a new feature in the hassil library to support fractional ranges. A new enumeration RangeFractionType is added with two types: HALVES and TENTHS. This enhancement allows the RangeSlotList class to generate and handle ranges that include fractional values. The changes span multiple files, updating parsing, matching, sampling, and testing logic to accommodate these new fractional range capabilities. The modifications provide more flexible numeric handling, enabling recognition and generation of ranges with half and tenth increments.

Changes

File Changes
hassil/intents.py - Added RangeFractionType enum
- Updated RangeSlotList with fraction_type attribute
- Added get_fractions() method
- Added get_numbers() method to generate numbers with fractions
hassil/sample.py - Modified sample_expression to use get_numbers() method
- Updated number generation logic
hassil/string_matcher.py - Updated regular expressions for number matching
- Modified _RANGE_TRIE_CACHE to support fraction types
- Updated range matching logic to use get_numbers()
tests/test_recognize.py - Added test_range_list_with_halves()
- Added test_range_list_with_tenths()
tests/test_sample.py - Added test_list_range_halves_words()
- Added test_list_range_tenths_words()
CHANGELOG.md - Added version entry for 2.2.0
- Documented changes related to fractions and text_clean function
hassil/models.py - Updated text_clean implementation to use remove_punctuation function
hassil/util.py - Removed PUNCTUATION_ALL variable
pyproject.toml - Updated version from 2.1.1 to 2.2.0

Sequence Diagram

sequenceDiagram
    participant RangeSlotList
    participant RangeFractionType
    participant StringMatcher
    participant SampleExpression

    RangeSlotList->>RangeFractionType: Specify fraction type (HALVES/TENTHS)
    RangeFractionType-->>RangeSlotList: Return fraction values
    RangeSlotList->>RangeSlotList: Generate numbers with fractions
    RangeSlotList->>StringMatcher: Provide range with fractions
    StringMatcher->>StringMatcher: Match numeric values
    RangeSlotList->>SampleExpression: Generate sample expressions
Loading
✨ Finishing Touches
  • 📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
tests/test_recognize.py (1)

1784-1815: Encourage a parameterized approach to fractional range tests
The tenths test is thorough, iterating through multiple decimal steps. Consider using a parameterized test framework (e.g., pytest.mark.parametrize) to make it more concise and readable.

hassil/intents.py (2)

58-58: Consider adding documentation for fraction_type
Adding a docstring or inline comment clarifying how fraction_type affects number generation can improve maintainability.


460-464: Ensure robust error handling for invalid fraction types
Currently, any unknown string for fractions_type will raise a ValueError from Enum. Consider catching it or logging a user-friendly error message.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d4ee6d4 and f23a71b.

📒 Files selected for processing (5)
  • hassil/intents.py (5 hunks)
  • hassil/sample.py (2 hunks)
  • hassil/string_matcher.py (5 hunks)
  • tests/test_recognize.py (1 hunks)
  • tests/test_sample.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
tests/test_sample.py (6)

Pattern */**(html|markdown|md): - For instructional content in documentation, use a direct and authoritative tone. Avoid expressions of politeness such as 'may' or 'please', and ensure the goal of the instruction is fronted.

  • Apply the Microsoft Style Guide to ensure documentation maintains clarity and conciseness.
  • In step-by-step instructions, front the location phrase in the instructional sentence.
  • In step-by-step instructions, front the 'goal' in the instructional sentence.
  • In step-by-step instructions, if in doubt what to front, front the 'goal' before the location phrase in the instructional sentence.
  • do not hyphenate terms like 'top-right' or 'bottom-left' with 'corner'

Pattern */**(html|markdown|md): - Use bold to mark UI strings.

  • If "" are used to mark UI strings, replace them by bold.

Pattern */**(html|markdown|md): - Be brief in your replies and don't add fluff like "thank you for..." and "Please let me know if"


Pattern */**(html|markdown|md): - Use sentence-style capitalization also in headings.


Pattern */**(html|markdown|md): do not comment on HTML used for icons


Pattern */**(html|markdown|md): Avoid flagging inline HTML for embedding videos in future reviews for this repository.

tests/test_recognize.py (6)

Pattern */**(html|markdown|md): - For instructional content in documentation, use a direct and authoritative tone. Avoid expressions of politeness such as 'may' or 'please', and ensure the goal of the instruction is fronted.

  • Apply the Microsoft Style Guide to ensure documentation maintains clarity and conciseness.
  • In step-by-step instructions, front the location phrase in the instructional sentence.
  • In step-by-step instructions, front the 'goal' in the instructional sentence.
  • In step-by-step instructions, if in doubt what to front, front the 'goal' before the location phrase in the instructional sentence.
  • do not hyphenate terms like 'top-right' or 'bottom-left' with 'corner'

Pattern */**(html|markdown|md): - Use bold to mark UI strings.

  • If "" are used to mark UI strings, replace them by bold.

Pattern */**(html|markdown|md): - Be brief in your replies and don't add fluff like "thank you for..." and "Please let me know if"


Pattern */**(html|markdown|md): - Use sentence-style capitalization also in headings.


Pattern */**(html|markdown|md): do not comment on HTML used for icons


Pattern */**(html|markdown|md): Avoid flagging inline HTML for embedding videos in future reviews for this repository.

hassil/sample.py (6)

Pattern */**(html|markdown|md): - For instructional content in documentation, use a direct and authoritative tone. Avoid expressions of politeness such as 'may' or 'please', and ensure the goal of the instruction is fronted.

  • Apply the Microsoft Style Guide to ensure documentation maintains clarity and conciseness.
  • In step-by-step instructions, front the location phrase in the instructional sentence.
  • In step-by-step instructions, front the 'goal' in the instructional sentence.
  • In step-by-step instructions, if in doubt what to front, front the 'goal' before the location phrase in the instructional sentence.
  • do not hyphenate terms like 'top-right' or 'bottom-left' with 'corner'

Pattern */**(html|markdown|md): - Use bold to mark UI strings.

  • If "" are used to mark UI strings, replace them by bold.

Pattern */**(html|markdown|md): - Be brief in your replies and don't add fluff like "thank you for..." and "Please let me know if"


Pattern */**(html|markdown|md): - Use sentence-style capitalization also in headings.


Pattern */**(html|markdown|md): do not comment on HTML used for icons


Pattern */**(html|markdown|md): Avoid flagging inline HTML for embedding videos in future reviews for this repository.

hassil/string_matcher.py (6)

Pattern */**(html|markdown|md): - For instructional content in documentation, use a direct and authoritative tone. Avoid expressions of politeness such as 'may' or 'please', and ensure the goal of the instruction is fronted.

  • Apply the Microsoft Style Guide to ensure documentation maintains clarity and conciseness.
  • In step-by-step instructions, front the location phrase in the instructional sentence.
  • In step-by-step instructions, front the 'goal' in the instructional sentence.
  • In step-by-step instructions, if in doubt what to front, front the 'goal' before the location phrase in the instructional sentence.
  • do not hyphenate terms like 'top-right' or 'bottom-left' with 'corner'

Pattern */**(html|markdown|md): - Use bold to mark UI strings.

  • If "" are used to mark UI strings, replace them by bold.

Pattern */**(html|markdown|md): - Be brief in your replies and don't add fluff like "thank you for..." and "Please let me know if"


Pattern */**(html|markdown|md): - Use sentence-style capitalization also in headings.


Pattern */**(html|markdown|md): do not comment on HTML used for icons


Pattern */**(html|markdown|md): Avoid flagging inline HTML for embedding videos in future reviews for this repository.

hassil/intents.py (6)

Pattern */**(html|markdown|md): - For instructional content in documentation, use a direct and authoritative tone. Avoid expressions of politeness such as 'may' or 'please', and ensure the goal of the instruction is fronted.

  • Apply the Microsoft Style Guide to ensure documentation maintains clarity and conciseness.
  • In step-by-step instructions, front the location phrase in the instructional sentence.
  • In step-by-step instructions, front the 'goal' in the instructional sentence.
  • In step-by-step instructions, if in doubt what to front, front the 'goal' before the location phrase in the instructional sentence.
  • do not hyphenate terms like 'top-right' or 'bottom-left' with 'corner'

Pattern */**(html|markdown|md): - Use bold to mark UI strings.

  • If "" are used to mark UI strings, replace them by bold.

Pattern */**(html|markdown|md): - Be brief in your replies and don't add fluff like "thank you for..." and "Please let me know if"


Pattern */**(html|markdown|md): - Use sentence-style capitalization also in headings.


Pattern */**(html|markdown|md): do not comment on HTML used for icons


Pattern */**(html|markdown|md): Avoid flagging inline HTML for embedding videos in future reviews for this repository.

🔇 Additional comments (16)
hassil/string_matcher.py (8)

21-28: Selective import aligns well with fractional support.

The addition of RangeFractionType extends functionality to handle fractional steps effectively. This import block is clean and appears ready for use.


45-46: Regex now supports negative decimals.

Including the optional decimal part matches fractional numbers properly. Consider verifying if negative fraction inputs are expected or should be constrained.


53-55: Cache expansion for fractional ranges.

Adding Optional[RangeFractionType] to the cache key is a solid approach for distinguishing different fraction types. No issues found.


592-592: Use of float parsing for numeric input.

Casting to float correctly accommodates fractional values. This change is essential for supporting fractional content.


595-597: Conditional logic for simple integer steps.

This block’s logic differentiates standard unit steps from fractional ones. Implementation is clear, with no issues detected.


603-604: Delegation to get_numbers() is streamlined.

By delegating all special-step and fraction logic to range_list.get_numbers(), the code remains DRY and more maintainable.


682-682: Fraction type inclusion for trie caching.

Ensuring range_list.fraction_type is part of the key clarifies that each fraction type has its own trie. This is correct and necessary.


826-826: Iterating over range_list.get_numbers() for trie-building.

Constructing trie entries for all fractional and integer permutations enhances coverage of numeric word conversions.

tests/test_sample.py (2)

93-106: Comprehensive test for half increments.

These assertions confirm coverage of 1.5 numeric and word equivalents, ensuring fractional logic is tested for halves.


108-137: Extensive tests for tenth increments.

The test systematically verifies every tenths fraction and its linguistic representation from 1.0 to 1.9, providing excellent coverage.

hassil/sample.py (2)

188-188: Refactored range generation to use get_numbers().

Eliminating direct usage of Python’s built-in range in favor of RangeSlotList.get_numbers() unifies the logic for fractional support. Nicely done.


202-202: Word-based expansion now leverages get_numbers().

This ensures consistent generation of decimal and fractional strings for conversion to words, reducing duplicate code paths.

tests/test_recognize.py (1)

1752-1783: Question about out-of-range fractions in test_range_list_with_halves
This test implicitly permits 2.5 even though the upper bound is set to 2. Currently, get_numbers does not clamp fractional results that exceed stop. Verify whether 2.5 is intentionally within range or if capping is expected.

Below is a script to locate any possible clamping logic. If it returns no matches, the code indeed allows exceeding the range:

✅ Verification successful

Range boundaries apply only to integers, fractions can exceed the upper bound
The current behavior is intentional. The RangeSlotList implementation adds fractions to every integer in the range, including the upper bound. For a range of 1-2 with halves, this correctly yields: 1, 1.5, 2, 2.5. The upper bound (2) only constrains the integers, not the resulting fractional values.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
rg "(clamp|clip|enforce_bounds)" -A 3

Length of output: 39


Script:

#!/bin/bash
# Find range list implementation
ast-grep --pattern 'class $_ {
  $$$
  range($$$) {
    $$$
  }
  $$$
}'

# Also search for range-related code
rg "def.*range" -A 5

# Search for number-related files
fd "number" --type f

Length of output: 5638


Script:

#!/bin/bash
# Find RangeSlotList class definition
ast-grep --pattern 'class RangeSlotList'

# Find RangeFractionType definition
ast-grep --pattern 'class RangeFractionType'

# Search for files containing RangeSlotList
rg "class.*RangeSlotList" -A 10

Length of output: 623


Script:

#!/bin/bash
# Get the full content of intents.py to see implementation
cat hassil/intents.py

# Search for range-related enums
rg "class.*(RangeType|RangeFractionType)" -A 5

# Search for methods that handle range validation
rg "def.*(validate|check|process).*range" -A 5

Length of output: 16717

hassil/intents.py (3)

33-48: Implementation looks solid
The RangeFractionType enum and its static method are simple and clear. This design easily supports future fractional types if needed.


70-80: Potential out-of-bounds fraction values
get_numbers yields integer + fraction even if it exceeds stop. Confirm that generating out-of-range decimals is acceptable. If not, filter out values greater than stop.


447-447: Great extension for fractional support
Reading fractions from YAML is straightforward and clean.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
hassil/string_matcher.py (1)

606-615: Improve the comment for range validation logic.

While the logic is correct, consider enhancing the comment to better explain the two cases:

-            # Unit step
+            # Simple case: Unit step without fractions (e.g., 1, 2, 3)
             in_range = (
                 range_list.start <= word_number <= range_list.stop
             )
         else:
-            # Non-unit step or fractions
+            # Complex case: Non-unit step (e.g., 2, 4, 6) or fractions (e.g., 1.5, 2.5)
             in_range = word_number in range_list.get_numbers()
tests/test_recognize.py (1)

1863-1936: Consider adding more edge cases to punctuation tests.

While the current tests are good, consider adding these edge cases:

  1. Test invalid punctuation patterns (e.g., "1..2", "1.2.")
  2. Test multiple punctuation marks (e.g., "1.2.3")
  3. Test different punctuation marks (e.g., "1,2" or "1/2")
  4. Test whitespace around punctuation (e.g., "1 . 2")

Example test cases:

def test_range_lists_invalid_punctuation() -> None:
    """Test invalid punctuation patterns in range lists."""
    # Implementation details...
    assert not recognize("test 1..2", intents)  # Double dots
    assert not recognize("test 1.2.", intents)  # Trailing dot
    assert not recognize("test 1 . 2", intents)  # Spaced dot
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6cf4407 and dff9920.

📒 Files selected for processing (2)
  • hassil/string_matcher.py (7 hunks)
  • tests/test_recognize.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
hassil/string_matcher.py (6)

Pattern */**(html|markdown|md): - For instructional content in documentation, use a direct and authoritative tone. Avoid expressions of politeness such as 'may' or 'please', and ensure the goal of the instruction is fronted.

  • Apply the Microsoft Style Guide to ensure documentation maintains clarity and conciseness.
  • In step-by-step instructions, front the location phrase in the instructional sentence.
  • In step-by-step instructions, front the 'goal' in the instructional sentence.
  • In step-by-step instructions, if in doubt what to front, front the 'goal' before the location phrase in the instructional sentence.
  • do not hyphenate terms like 'top-right' or 'bottom-left' with 'corner'

Pattern */**(html|markdown|md): - Use bold to mark UI strings.

  • If "" are used to mark UI strings, replace them by bold.

Pattern */**(html|markdown|md): - Be brief in your replies and don't add fluff like "thank you for..." and "Please let me know if"


Pattern */**(html|markdown|md): - Use sentence-style capitalization also in headings.


Pattern */**(html|markdown|md): do not comment on HTML used for icons


Pattern */**(html|markdown|md): Avoid flagging inline HTML for embedding videos in future reviews for this repository.

tests/test_recognize.py (6)

Pattern */**(html|markdown|md): - For instructional content in documentation, use a direct and authoritative tone. Avoid expressions of politeness such as 'may' or 'please', and ensure the goal of the instruction is fronted.

  • Apply the Microsoft Style Guide to ensure documentation maintains clarity and conciseness.
  • In step-by-step instructions, front the location phrase in the instructional sentence.
  • In step-by-step instructions, front the 'goal' in the instructional sentence.
  • In step-by-step instructions, if in doubt what to front, front the 'goal' before the location phrase in the instructional sentence.
  • do not hyphenate terms like 'top-right' or 'bottom-left' with 'corner'

Pattern */**(html|markdown|md): - Use bold to mark UI strings.

  • If "" are used to mark UI strings, replace them by bold.

Pattern */**(html|markdown|md): - Be brief in your replies and don't add fluff like "thank you for..." and "Please let me know if"


Pattern */**(html|markdown|md): - Use sentence-style capitalization also in headings.


Pattern */**(html|markdown|md): do not comment on HTML used for icons


Pattern */**(html|markdown|md): Avoid flagging inline HTML for embedding videos in future reviews for this repository.

🔇 Additional comments (6)
hassil/string_matcher.py (4)

21-28: LGTM! Well-structured foundation for fraction support.

The changes provide a solid foundation for handling fractional values:

  • Clear regex patterns for both integer and floating-point numbers
  • Properly typed cache to support the new fraction types

Also applies to: 45-48, 55-57


582-586: LGTM! Robust number pattern selection.

The logic correctly switches between integer and float patterns based on the fraction type, with proper type conversion to float for handling decimal values.

Also applies to: 592-596, 603-603


646-651: LGTM! Proper wildcard text handling.

The changes correctly handle open wildcards by updating both text and value fields.


841-844: LGTM! Proper range trie caching.

The changes correctly handle fraction types in the trie cache and properly generate all valid numbers including fractions.

tests/test_recognize.py (2)

1754-1809: LGTM! Comprehensive tests for half fractions.

The test function thoroughly validates the half fraction functionality:

  • Tests all valid values (1, 1.5, 2, 2.5)
  • Verifies both numeric and word forms
  • Checks text/value fields
  • Validates rejection of invalid fractions

1811-1861: LGTM! Comprehensive tests for tenth fractions.

The test function thoroughly validates the tenth fraction functionality:

  • Tests all valid values (1.1-1.9, 2.1-2.9)
  • Verifies both numeric and word forms
  • Checks text/value fields
  • Validates rejection of invalid fractions

@synesthesiam synesthesiam merged commit 1bcb09e into main Jan 27, 2025
8 checks passed
@synesthesiam synesthesiam deleted the synesthesiam-20250124-fractions branch January 27, 2025 20:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant