In-place assignment for awkward array? #609

Duchstf · 2020-12-18T18:48:30Z

Duchstf
Dec 18, 2020

Hello,

so suppose that I want to do this with an awkward array:

a[a>4000] = 4000

How would I do this? Is there any current support for this operation?

Thanks!!

Answered by jpivarski

Dec 18, 2020

In-place assignment isn't supported as a design choice. There are two corners of "parameter space" we could have chosen:

allow in-place assignment and defensively copy arrays in complex operations, so that assignments don't lead to surprising long-distance consequences;
forbid in-place assignment and view, rather than copy, for most operations.

I chose the latter (after initial experience with an early version that did allow in-place assignment). Defensive copies would be prohibitive for large data structures, such as records with many fields (which are common). The choice to make everything immutable was made for performance (both speed and memory), which might sound surprising, consid…

View full answer

jpivarski · 2020-12-18T19:32:04Z

jpivarski
Dec 18, 2020
Maintainer

In-place assignment isn't supported as a design choice. There are two corners of "parameter space" we could have chosen:

allow in-place assignment and defensively copy arrays in complex operations, so that assignments don't lead to surprising long-distance consequences;
forbid in-place assignment and view, rather than copy, for most operations.

I chose the latter (after initial experience with an early version that did allow in-place assignment). Defensive copies would be prohibitive for large data structures, such as records with many fields (which are common). The choice to make everything immutable was made for performance (both speed and memory), which might sound surprising, considering that in-place mutations are used in NumPy for performance reasons (both speed and memory).

NumPy has this problem, too, but not as acutely. Some operations cause views, some cause copies:

>>> import numpy as np
>>> original = np.array([1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9])
>>> sliced = original[::2]
>>> advanced = original[[0, 2, 4, 6, 8]]
>>> sliced
array([1.1, 3.3, 5.5, 7.7, 9.9])
>>> advanced
array([1.1, 3.3, 5.5, 7.7, 9.9])
>>> original[6] = 999
>>> sliced     # this is a view; it has changed
array([  1.1,   3.3,   5.5, 999. ,   9.9])
>>> advanced   # this is a copy; it has NOT changed
array([1.1, 3.3, 5.5, 7.7, 9.9])

(I think PyTorch has taken a positive step by making all the view operations have a different naming convention from the copy ones.)

This is a bit of an issue in NumPy because you have to be careful to check for view-vs-copy. It's endemic in Pandas (search for "SettingwithCopyWarning"). But it's harder in Awkward Array because rather than having one buffer that might be a view or might be a copy, almost all operations give you a tree with buffers attached to all the nodes of that tree in which some nodes are views and other nodes are new buffers. Which are views and which are new buffers is subject to change. This PR, for example.

Therefore, the Awkward Array library itself does not include any operations that change the values of these buffers in place. You can do this kind of assignment in place:

>>> import awkward as ak
>>> original = ak.Array([{"x": 1}, {"x": 2}, {"x": 3}])
>>> original["y"] = 10
>>> original
<Array [{x: 1, y: 10}, ... {x: 3, y: 10}] type='3 * {"x": int64, "y": int64}'>

but that's actually not changing any buffers in place: it's creating a new tree structure with the new buffer (y = 10) added. If you have any other references to parts of original elsewhere, they are unharmed.

The other exception is that while Awkward Array doesn't define any in-place operations, nothing is stopping you from casting an Awkward Array (or part of one) as a NumPy array and changing it in place. This can potentially have long-range consequences, so if you do this, you'll have to be aware of its history. For example, it's fine to change in place an array that you have just created—you know exactly where it's been.

In the online documentation, Mutability of Awkward Arrays from NumPy and Mutability of Awkward Arrays converted to NumPy discusses how to do this and what the issues are. Note that you can also cast Awkward Arrays as NumPy arrays in Numba and assign to them (PR #550).

0 replies

jpivarski · 2020-12-18T19:34:11Z

jpivarski
Dec 18, 2020
Maintainer

I also should have given you a direct answer to your question:

>>> a = ak.Array([1000, 2000, 3000, 4000, 5000, 6000])
>>> a
<Array [1000, 2000, 3000, 4000, 5000, 6000] type='6 * int64'>
>>> np.asarray(a)[a > 4000] = 4000
>>> a
<Array [1000, 2000, 3000, 4000, 4000, 4000] type='6 * int64'>

6 replies

lukasheinrich Apr 18, 2021
Maintainer

I tried following this explanation but get ListOffsetArray in both cases. Has something changed since the answer?

jpivarski Apr 18, 2021
Maintainer

Whether an operation returns a ListArray or a ListOffsetArray is an implementation detail. I can take a closer look later, though.

lukasheinrich Apr 19, 2021
Maintainer

I was mainly curious/studying up on forms vs types vs layouts and was curious to find any operation that returns a ListArray - not high priority

jpivarski Apr 19, 2021
Maintainer

The types—whether it's a list of not, and whether the list has variable length or a fixed length—is a predictable property of an operation. The form (or for a specific array, the layout) specifies exactly how data are arranged in memory, and that is chosen to optimize performance. It's stable for a given version of Awkward Array, but it's not guaranteed to start the same between versions. (As an example of that, some rearrangements of RecordArrays were made lazy by wrapping them in IndexedArrays.) ListArray and ListOffsetArray differ by whether the list elements of consecutive lists are known to be contiguous in memory. When an operation creates a new output array, it would be a ListOffsetArray, because you might as well make it contiguous, but when a list array is rearranged, it becomes a ListArray because the rearrangement can remain lazy at that level—we can pass through the original content with a new starts and stops to avoid copying (which is why Awkward Arrays aren't mutable in-place).

jpivarski Apr 19, 2021
Maintainer

I just ran it on a computer and it's true: the operation on b has changed so that it's now operating in place. This is the sort of thing that makes it dangerous to rely on the mutability through conversion to NumPy—memory layout in Awkward Array is an implementation detail that can change like this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In-place assignment for awkward array? #609

{{title}}

Replies: 2 comments 6 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

In-place assignment for awkward array? #609

Duchstf Dec 18, 2020

Replies: 2 comments · 6 replies

jpivarski Dec 18, 2020 Maintainer

jpivarski Dec 18, 2020 Maintainer

lukasheinrich Apr 18, 2021 Maintainer

jpivarski Apr 18, 2021 Maintainer

lukasheinrich Apr 19, 2021 Maintainer

jpivarski Apr 19, 2021 Maintainer

jpivarski Apr 19, 2021 Maintainer

Duchstf
Dec 18, 2020

Replies: 2 comments 6 replies

jpivarski
Dec 18, 2020
Maintainer

jpivarski
Dec 18, 2020
Maintainer

lukasheinrich Apr 18, 2021
Maintainer

jpivarski Apr 18, 2021
Maintainer

lukasheinrich Apr 19, 2021
Maintainer

jpivarski Apr 19, 2021
Maintainer

jpivarski Apr 19, 2021
Maintainer