diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index 204c757..df2322f 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -3,12 +3,6 @@ ci: autofix_commit_msg: "style: pre-commit fixes" repos: - - repo: https://github.com/adamchainz/blacken-docs - rev: "1.16.0" - hooks: - - id: blacken-docs - additional_dependencies: [black==23.*] - - repo: https://github.com/pre-commit/pre-commit-hooks rev: "v4.5.0" hooks: diff --git a/README.md b/README.md index 499ce63..8b682cd 100644 --- a/README.md +++ b/README.md @@ -1,9 +1,10 @@ -# ragged +# Ragged [![Actions Status][actions-badge]][actions-link] [![PyPI version][pypi-version]][pypi-link] [![PyPI platforms][pypi-platforms]][pypi-link] [![GitHub Discussion][github-discussions-badge]][github-discussions-link] + @@ -25,9 +26,12 @@ ## Introduction -**Ragged** is a library for manipulating ragged arrays as though they were **NumPy** or **CuPy** arrays, following the [Array API specification](https://data-apis.org/array-api/latest/API_specification). +**Ragged** is a library for manipulating ragged arrays as though they were +**NumPy** or **CuPy** arrays, following the +[Array API specification](https://data-apis.org/array-api/latest/API_specification). -For example, this is a [ragged/jagged array](https://en.wikipedia.org/wiki/Jagged_array): +For example, this is a +[ragged/jagged array](https://en.wikipedia.org/wiki/Jagged_array): ```python >>> import ragged @@ -48,20 +52,28 @@ The values are all floating-point numbers, so `a.dtype` is `float64`, dtype('float64') ``` -but `a.shape` has non-integer dimensions to account for the fact that some of its list lengths are non-uniform: +but `a.shape` has non-integer dimensions to account for the fact that some of +its list lengths are non-uniform: ```python >>> a.shape (4, None, None) ``` -In general, a `ragged.array` can have any mixture of regular and irregular dimensions, though `shape[0]` (the length) is always an integer. This convention follows the **Array API**'s specification for [array.shape](https://data-apis.org/array-api/latest/API_specification/generated/array_api.array.shape.html#array_api.array.shape), which must be a tuple of `int` or `None`: +In general, a `ragged.array` can have any mixture of regular and irregular +dimensions, though `shape[0]` (the length) is always an integer. This convention +follows the **Array API**'s specification for +[array.shape](https://data-apis.org/array-api/latest/API_specification/generated/array_api.array.shape.html#array_api.array.shape), +which must be a tuple of `int` or `None`: ```python array.shape: Tuple[Optional[int], ...] ``` -(Our use of `None` to indicate a dimension without a single-valued size differs from the **Array API**'s intention of specifying dimensions of _unknown_ size, but it follows the technical specification. **Array API**-consuming libraries can try using **Ragged** to find out if they are ragged-ready.) +(Our use of `None` to indicate a dimension without a single-valued size differs +from the **Array API**'s intention of specifying dimensions of _unknown_ size, +but it follows the technical specification. **Array API**-consuming libraries +can try using **Ragged** to find out if they are ragged-ready.) All of the normal elementwise and reducing functions apply, as well as slices: @@ -100,22 +112,42 @@ ragged.array([ ]) ``` -All of the methods, attributes, and functions in the **Array API** will be implemented for **Ragged**, as well as conveniences that are not required by the **Array API**. See [open issues marked "todo"](https://github.com/jpivarski/ragged/issues?q=is%3Aissue+is%3Aopen+label%3Atodo) for **Array API** functions that still need to be written (out of 120 in total). +All of the methods, attributes, and functions in the **Array API** will be +implemented for **Ragged**, as well as conveniences that are not required by the +**Array API**. See +[open issues marked "todo"](https://github.com/jpivarski/ragged/issues?q=is%3Aissue+is%3Aopen+label%3Atodo) +for **Array API** functions that still need to be written (out of 120 in total). -**Ragged** has two `device` values, `"cpu"` (backed by **NumPy**) and `"cuda"` (backed by **CuPy**). Eventually, all operations will be identical for CPU and GPU. +**Ragged** has two `device` values, `"cpu"` (backed by **NumPy**) and `"cuda"` +(backed by **CuPy**). Eventually, all operations will be identical for CPU and +GPU. ## Implementation -**Ragged** is implemented using **Awkward Array** ([code](https://github.com/scikit-hep/awkward), [docs](https://awkward-array.org/)), which is an array library for arbitrary tree-like (JSON-like) data. Because of its generality, **Awkward Array** cannot follow the **Array API**—in fact, its array objects can't have separate `dtype` and `shape` attributes (the array `type` can't be factorized). **Ragged** is therefore +**Ragged** is implemented using **Awkward Array** +([code](https://github.com/scikit-hep/awkward), +[docs](https://awkward-array.org/)), which is an array library for arbitrary +tree-like (JSON-like) data. Because of its generality, **Awkward Array** cannot +follow the **Array API**—in fact, its array objects can't have separate `dtype` +and `shape` attributes (the array `type` can't be factorized). **Ragged** is +therefore -- a _specialization_ of **Awkward Array** for numeric data in fixed-length and variable-length lists, and -- a _formalization_ to adhere to the **Array API** and its fully typed protocols. +- a _specialization_ of **Awkward Array** for numeric data in fixed-length and + variable-length lists, and +- a _formalization_ to adhere to the **Array API** and its fully typed + protocols. -See [Why does this library exist?](https://github.com/jpivarski/ragged/discussions/6) under the [Discussions](https://github.com/jpivarski/ragged/discussions) tab for more details. +See +[Why does this library exist?](https://github.com/jpivarski/ragged/discussions/6) +under the [Discussions](https://github.com/jpivarski/ragged/discussions) tab for +more details. -**Ragged** is a thin wrapper around **Awkward Array**, restricting it to ragged arrays and transforming its function arguments and return values to fit the specification. +**Ragged** is a thin wrapper around **Awkward Array**, restricting it to ragged +arrays and transforming its function arguments and return values to fit the +specification. -**Awkward Array**, in turn, is time- and memory-efficient, ready for big datasets. Consider the following: +**Awkward Array**, in turn, is time- and memory-efficient, ready for big +datasets. Consider the following: ```python import gc # control for garbage collection @@ -184,7 +216,11 @@ time: 4.180 sec time: 0.082 sec ``` -**Awkward Array** and **Ragged** are generally smaller and faster than their Python equivalents for the same reasons that **NumPy** is smaller and faster than Python lists. See **Awkward Array** [papers and presentations](https://awkward-array.org/doc/main/getting-started/papers-and-talks.html) for more. +**Awkward Array** and **Ragged** are generally smaller and faster than their +Python equivalents for the same reasons that **NumPy** is smaller and faster +than Python lists. See **Awkward Array** +[papers and presentations](https://awkward-array.org/doc/main/getting-started/papers-and-talks.html) +for more. ## Installation @@ -196,8 +232,13 @@ pip install ragged and will someday be on conda-forge. -`ragged` is a pure-Python library that only depends on `awkward` (which, in turn, only depends on `numpy` and a compiled extension). In principle (i.e. eventually), `ragged` can be loaded into Pyodide and JupyterLite. +`ragged` is a pure-Python library that only depends on `awkward` (which, in +turn, only depends on `numpy` and a compiled extension). In principle (i.e. +eventually), `ragged` can be loaded into Pyodide and JupyterLite. # Acknowledgements -Support for this work was provided by NSF grant [OAC-2103945](https://www.nsf.gov/awardsearch/showAward?AWD_ID=2103945) and the gracious help of [Awkward Array contributors](https://github.com/scikit-hep/awkward?tab=readme-ov-file#acknowledgements). +Support for this work was provided by NSF grant +[OAC-2103945](https://www.nsf.gov/awardsearch/showAward?AWD_ID=2103945) and the +gracious help of +[Awkward Array contributors](https://github.com/scikit-hep/awkward?tab=readme-ov-file#acknowledgements).