Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spatialdata aggregate 3D support + relatively slow on large images #677

Open
ArneDefauw opened this issue Aug 14, 2024 · 2 comments
Open
Labels

Comments

@ArneDefauw
Copy link
Contributor

ArneDefauw commented Aug 14, 2024

Is your feature request related to a problem? Please describe.

  • spatialdata.aggregate does not support 3D,
  • spatialdata.aggregate can be slow on large images.
  • spatialdata.aggregate does not allow aggregation of a labels layer with a points layer.

Describe the solution you'd like

There is currently no 3D support in spatialdata.aggregate, e.g. aggregate of two raster elements:


from spatialdata.datasets import blobs
from spatialdata import read_zarr

sdata = blobs( length = 2000 )
sdata.write( os.path.join( path, "sdata_blobs.zarr" ) )
sdata = read_zarr( sdata.path )

import spatialdata
import dask.array as da

arr = sdata["blobs_image"].data

arr = da.stack([arr, arr]).transpose(1, 0, 2, 3)

se=spatialdata.models.Image3DModel.parse(
                arr
            )
sdata["blobs_image_3d"] = se
if sdata.is_backed():
    sdata.write_element("blobs_image_3d")
    sdata = read_zarr(sdata.path)


arr = sdata["blobs_labels"].data

arr = da.stack([arr, arr])

se=spatialdata.models.Labels3DModel.parse(
                arr,
            )
sdata["blobs_labels_3d"] = se
if sdata.is_backed():
    sdata.write_element("blobs_labels_3d")
    sdata = read_zarr(sdata.path)


from spatialdata import aggregate

adata = aggregate(
    values=sdata["blobs_image_3d"],
    by=sdata["blobs_labels_3d"],
 )
adata["table"].to_df() #-> raises NotImplementedError

Aggregate can also be slow on large images:


import spatialdata
import dask.array as da

se=spatialdata.models.Image2DModel.parse(
                da.tile( sdata[ "blobs_image" ].data , (1, 20, 20)).rechunk(4000),
            )
sdata["blobs_image_large"] = se
if sdata.is_backed():
    sdata.write_element("blobs_image_large")
    sdata = read_zarr(sdata.path)


se=spatialdata.models.Labels2DModel.parse(
                da.tile( sdata[ "blobs_labels" ].data , ( 20, 20)).rechunk(4000),
            )
sdata["blobs_labels_large"] = se
if sdata.is_backed():
    sdata.write_element("blobs_labels_large")
    sdata = read_zarr(sdata.path)

from spatialdata import aggregate

adata = aggregate(
    values=sdata["blobs_image_large"],
    by=sdata["blobs_labels_large"],
 )
adata["table"].to_df()  # -->takes around 6m on a mac M2, 32Gb ram

While providing support for 3D with the current implementation using xrspatial.zonal_stats is quite straightforward (i.e. just iterate over the z-stacks), I think it makes sense to think about a custom implementation that does not rely on xrspatial.zonal_stats for aggregation of raster data, and that is faster. E.g. the solution here to do aggregation of an image layer with a labels layer ('sum') https://github.com/saeyslab/harpy/blob/4f557433168f5b4cf8b776c884214b3254e9e6d2/src/sparrow/table/_allocation_intensity.py#L251 , is both much faster (30s on same hardware), and supports 3D.

Aggregate does not support aggregation of points layer by a labels layer:

adata = aggregate(
    values=sdata["blobs_points"],
    by=sdata["blobs_labels"],
 )
adata["table"].to_df() # -> raises not Implementederror

This means we have to convert a labels layer first to a shapes layer before we can do aggregation. Also, because shapes layer has limited support in 3D, supporting aggregation using a labels layer makes sense.

I understand that full 3D support is probably not the main priority, but to future prove some features of spatialdata I think it can be worthwhile to already think about it.

@BioinfoTongLI
Copy link

for aggregate points on labels. I've got this walk-around:
https://github.com/BioinfoTongLI/Image-ST/blob/main/bin/to_spatialdata.py#L111-L118
it is however not very memory efficient.

@ArneDefauw
Copy link
Contributor Author

ArneDefauw commented Sep 17, 2024

for aggregate points on labels. I've got this walk-around: https://github.com/BioinfoTongLI/Image-ST/blob/main/bin/to_spatialdata.py#L111-L118 it is however not very memory efficient.

Hi @BioinfoTongLI , for aggregation of points on labels, I have this memory efficient implementation:
https://github.com/saeyslab/harpy/blob/f563bcda27ac0138df94224840496a0d2680008b/src/sparrow/table/_allocation.py#L27

For aggregation of images and labels, we now have this implementation:
https://github.com/saeyslab/harpy/blob/f563bcda27ac0138df94224840496a0d2680008b/src/sparrow/utils/_aggregate.py#L16

which basically does all aggregations xrspatial.zonal_stats and dask_image implement, but much faster

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants