Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: set functions and tests #57

Merged
merged 21 commits into from
Sep 13, 2024
Merged

feat: set functions and tests #57

merged 21 commits into from
Sep 13, 2024

Conversation

ohrechykha
Copy link
Collaborator

No description provided.

@ohrechykha ohrechykha requested review from ianna and jpivarski July 30, 2024 15:38
@ohrechykha ohrechykha marked this pull request as draft July 30, 2024 15:38
@ohrechykha ohrechykha marked this pull request as ready for review July 30, 2024 16:17
Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implements the four unique functions in a way that flattens all nested list structure before returning a result that consists of just the unique values.

  • Does it work for scalars, such as ragged(3.14)? (Note that there are no square brackets in that expression.)
  • I've noticed that these functions are returning tuples. The specification requires namedtuples.

The fact that the output is based on flattened arrays makes me wonder what will happen for unique_inverse, since this needs to return inverse indices that would reconstruct the array, with all of its nested structure. Using NumPy as an example:

>>> import numpy as np
>>> array = np.array([[4, 3, 4, 2], [5, 2, 4, 4], [3, 3, 3, 2]])
>>> values, inverse_indices = np.unique_inverse(array)
>>> values
array([2, 3, 4, 5])
>>> inverse_indices
array([[2, 1, 2, 0],
       [3, 0, 2, 2],
       [1, 1, 1, 0]])
>>> values[inverse_indices]
array([[4, 3, 4, 2],
       [5, 2, 4, 4],
       [3, 3, 3, 2]])

where values[inverse_indices] reconstructs the original array. Our arrays can do that kind of slice only if they have regular dimensions (the shape has no None values in it). Perhaps ragged.array should support the multidimensional unique_inverse case only if all of its dimensions are regular? Other cases should be detected and raise NotImplementedError.

Copy link
Collaborator

@ianna ianna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ohrechykha - good start! Please, do follow up on Jim's comments. Please, also run pre-commit run --all to fix the CI errors locally.

@jpivarski - I'm not sure why the tests with pytest fail in CI. Do we need to update awkward version? or pin numpy to 1.x?

@ohrechykha
Copy link
Collaborator Author

This is funny: when doing [err=f"Expected ragged type but got {type(x)}", raise TypeError(err)] in specified lines, mypy says it's an unreachable statement. When I switch this back to [raise TypeError(f"Expected ragged type but got {type(x)}")] ruff says exception must be assigned to variable first...

@ohrechykha ohrechykha requested review from jpivarski and ianna August 19, 2024 13:33
Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In all, it's a good suite of tests and a good implementation for the common case—more than zero dimensions and more than length 1. (A question that just occurred to me: does it work for length 0 inputs?)

@ianna, we should check this for CUDA arrays, too. Code like

ragged.array([x])

where x is a zero-dimensional NumPy array (from ragged.array._impl when the ragged array's number of dimensions is also zero) would work but it would convert it to a Python object because it's inside of a Python list:

>>> x = np.array(3.14, np.float32)
>>> y = ragged.array([x])
>>> x.dtype, y.dtype
(dtype('float32'), dtype('float64'))

(The above can be a regular test.) If it's a CuPy array, the implicit conversion would also copy the data from the GPU to the CPU in order to make the Python object.

src/ragged/_spec_set_functions.py Outdated Show resolved Hide resolved
src/ragged/_spec_set_functions.py Outdated Show resolved Hide resolved
src/ragged/_spec_set_functions.py Outdated Show resolved Hide resolved
src/ragged/_spec_set_functions.py Outdated Show resolved Hide resolved
src/ragged/_spec_set_functions.py Outdated Show resolved Hide resolved
tests/test_spec_set_functions.py Outdated Show resolved Hide resolved
tests/test_spec_set_functions.py Outdated Show resolved Hide resolved
tests/test_spec_set_functions.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@ianna ianna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ohrechykha - please, check the failing jobs in the CI: it gives some indication how to fix them. Thanks!

src/ragged/_spec_set_functions.py Outdated Show resolved Hide resolved
src/ragged/_spec_set_functions.py Outdated Show resolved Hide resolved
src/ragged/_spec_set_functions.py Outdated Show resolved Hide resolved
@ohrechykha ohrechykha requested a review from ianna September 10, 2024 12:38
Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good! My previous comments were all addressed, and my new comments are all inline:

src/ragged/_spec_array_object.py Outdated Show resolved Hide resolved
src/ragged/_spec_set_functions.py Outdated Show resolved Hide resolved
src/ragged/_spec_set_functions.py Outdated Show resolved Hide resolved
src/ragged/_spec_set_functions.py Outdated Show resolved Hide resolved
src/ragged/_spec_set_functions.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@ianna ianna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ohrechykha - looks great! All comments were addressed and the tests pass. I'll merge it now.

@ianna ianna merged commit 0b31f72 into main Sep 13, 2024
12 checks passed
@ianna ianna deleted the oleksii-unique branch September 13, 2024 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants