Using a PartitionedArray, accessing its properties #807

drahnreb · 2021-04-07T20:20:21Z

drahnreb
Apr 7, 2021

Descriptors for class PartitionedArray and IrregularlyPartitionedArray are not accessible via:

>>> import awkward as ak
>>> ak.__version__
1.2.0
>>> array = ak.repartition([{"x": x, "y": x * 10} for x in range(10)], 2)
>>> array.layout
<IrregularlyPartitionedArray>
    <partition start="0" stop="2">
        <RecordArray length="2">
            <field index="0" key="x">
                <NumpyArray format="l" shape="2" data="0 1" at="0x00010f038200"/>
            </field>
            <field index="1" key="y">
                <NumpyArray format="l" shape="2" data="0 10" at="0x00010f03a200"/>
            </field>
        </RecordArray>
    </partition>
    <partition start="2" stop="4">
        <RecordArray length="2">
            <field index="0" key="x">
                <NumpyArray format="l" shape="2" data="2 3" at="0x00010f038200"/>
            </field>
            <field index="1" key="y">
                <NumpyArray format="l" shape="2" data="20 30" at="0x00010f03a200"/>
            </field>
        </RecordArray>
    </partition>
</IrregularlyPartitionedArray>

>>> ak.partitions(array)
[2,2]
>>> array.partitions

AttributeError: no field named 'partitions'

According to the docs, this should be a valid repartition?

The tests do not cover ak.repartition with partitioned array methods, but following one test leads to:

>>> one = ak.from_iter([[1.1, 2.2, 3.3], [], [4.4, 5.5]], highlevel=False)
>>> two = ak.from_iter([[6.6], [], [], [], [7.7, 8.8, 9.9]], highlevel=False)
>>> array = ak.partition.IrregularlyPartitionedArray([one, two])
>>> array.layout

AttributeError: no field named 'layout'

According to the docs, I would expect this to work as

[...] it should behave identically to a non-partitioned array [...]

Answered by jpivarski

Apr 7, 2021

Both of these are problems with mixing high-level and low-level arrays.

In the first example, array is a high-level array whose layout is partitioned. The fact that array is partitioned is not visible from the array level; you'd only know it if you delved into the layout (or used ak.partitions to get the length of each). To access the actual partitions, you could do array.layout.partitions, but this would be a low-level view.

In the second example, you created a low-level IrregularlyPartitionedArray, which has no layout because it is a layout. If wrapped in an ak.Array constructor, it would behave like an unpartitioned high-level array.

You might be trying to iterate over partitions. Ther…

View full answer

jpivarski · 2021-04-07T20:38:16Z

jpivarski
Apr 7, 2021
Maintainer

Both of these are problems with mixing high-level and low-level arrays.

In the first example, array is a high-level array whose layout is partitioned. The fact that array is partitioned is not visible from the array level; you'd only know it if you delved into the layout (or used ak.partitions to get the length of each). To access the actual partitions, you could do array.layout.partitions, but this would be a low-level view.

In the second example, you created a low-level IrregularlyPartitionedArray, which has no layout because it is a layout. If wrapped in an ak.Array constructor, it would behave like an unpartitioned high-level array.

You might be trying to iterate over partitions. There isn't actually a function for that (other than using the numbers from ak.partitions to make slices in a for loop; see below). I've been wondering if there need to be more tools like this.

>>> array = ak.repartition(np.arange(100), 10)
>>> # high-level array
>>> array
<Array [0, 1, 2, 3, 4, ... 95, 96, 97, 98, 99] type='100 * int64'>
>>> # low-level partitions
>>> array.layout.partitions
[
    <NumpyArray format="l" shape="10" data="0 1 2 3 4 5 6 7 8 9" at="0x562fa91d9a70"/>,
    <NumpyArray format="l" shape="10" data="10 11 12 13 14 15 16 17 18 19" at="0x562fa91d9a70"/>,
    <NumpyArray format="l" shape="10" data="20 21 22 23 24 25 26 27 28 29" at="0x562fa91d9a70"/>,
    <NumpyArray format="l" shape="10" data="30 31 32 33 34 35 36 37 38 39" at="0x562fa91d9a70"/>,
    <NumpyArray format="l" shape="10" data="40 41 42 43 44 45 46 47 48 49" at="0x562fa91d9a70"/>,
    <NumpyArray format="l" shape="10" data="50 51 52 53 54 55 56 57 58 59" at="0x562fa91d9a70"/>,
    <NumpyArray format="l" shape="10" data="60 61 62 63 64 65 66 67 68 69" at="0x562fa91d9a70"/>,
    <NumpyArray format="l" shape="10" data="70 71 72 73 74 75 76 77 78 79" at="0x562fa91d9a70"/>,
    <NumpyArray format="l" shape="10" data="80 81 82 83 84 85 86 87 88 89" at="0x562fa91d9a70"/>,
    <NumpyArray format="l" shape="10" data="90 91 92 93 94 95 96 97 98 99" at="0x562fa91d9a70"/>
]
>>> # number of entries in each partition
>>> ak.partitions(array)
[10, 10, 10, 10, 10, 10, 10, 10, 10, 10]
>>> # cumbersome way to iterate over partitions
>>> start = 0
>>> for count in ak.partitions(array):
...     stop = start + count
...     print(repr(array[start:stop]))
...     start = stop
... 
<Array [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] type='10 * int64'>
<Array [10, 11, 12, 13, 14, ... 16, 17, 18, 19] type='10 * int64'>
<Array [20, 21, 22, 23, 24, ... 26, 27, 28, 29] type='10 * int64'>
<Array [30, 31, 32, 33, 34, ... 36, 37, 38, 39] type='10 * int64'>
<Array [40, 41, 42, 43, 44, ... 46, 47, 48, 49] type='10 * int64'>
<Array [50, 51, 52, 53, 54, ... 56, 57, 58, 59] type='10 * int64'>
<Array [60, 61, 62, 63, 64, ... 66, 67, 68, 69] type='10 * int64'>
<Array [70, 71, 72, 73, 74, ... 76, 77, 78, 79] type='10 * int64'>
<Array [80, 81, 82, 83, 84, ... 86, 87, 88, 89] type='10 * int64'>
<Array [90, 91, 92, 93, 94, ... 96, 97, 98, 99] type='10 * int64'>

The biggest difference that partitions make is that every Awkward operation applies separately to each partition, returning a new partitioned array. the statement in the documentation is that these are not interface-visible differences (in the high-level view), but can be performance differences.

1 reply

drahnreb Apr 8, 2021
Author

Thanks, that makes a lot of sense. I was actually recursing on the low-level side for numba. While partitioning for chunked calculations I tried to iterate over the partitions on the wrong level. Must have been too late at night…

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using a PartitionedArray, accessing its properties #807

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Using a PartitionedArray, accessing its properties #807

drahnreb Apr 7, 2021

Replies: 1 comment · 1 reply

jpivarski Apr 7, 2021 Maintainer

drahnreb Apr 8, 2021 Author

drahnreb
Apr 7, 2021

Replies: 1 comment 1 reply

jpivarski
Apr 7, 2021
Maintainer

drahnreb Apr 8, 2021
Author