refactor properties.py - 4d peak labelling #265

jonwright · 2024-03-19T07:07:26Z

In sinograms/properties.py and sinograms/point_by_point.py the code works via shared memory.
This does not scale beyond one node and it fails on python2.7.

For global read-only memory we could use mmap with numpy on a non-compressed hdf5 file (https://gist.github.com/maartenbreddels/09e1da79577151e5f7fec660c209f06e):

assert dset.chunks is None and dset.compression is None and dset.is_virtual is None and dset.is_external is None and etc.
file = open(path, "rb")
fileno = file.fileno()
mapping = mmap.mmap(fileno, 0, access=mmap.ACCESS_READ)
np.frombuffer(mapping, dtype=dset.dtype, count=dset.size, offset=dset.id.get_offset()).reshape(dset.shape)

This may be useful for reducing some out-of-memory problems.

Another upgrade path could be looking into dask.dataframe for distributed processing.

The text was updated successfully, but these errors were encountered:

jonwright · 2024-05-21T13:24:58Z

To make some progress try to break this up into a bunch of smaller tasks:

Point by point code: write a hdf colfile. Each worker process reads it during pool init.

Properties.py is more challenging:

Make lima_segmenter record the pixel -> peak labeling for both of the cp and lm schemes.
Refactor lima_segmenter to write fewer files (e.g. one per process?).
Labels will be saved with pixels.
Peaks2d properties array (s1, sI, scI, srI, id) are available during segmenting to be saved with sparse pixels.
Check the io speed / size with/without compression for saving pixels peaks. Pick something.
Find and save the overlaps. This is one 'peaksearch' per overlap dimension. Output of a form (peak_i, peak_j, score).
Determine the peaks3d labels across omega or dty
Determine the peaks4d labels across the sinogram

jonwright · 2024-05-21T13:26:58Z

Note : multiprocessing + shared memory seems to be buggy. The remove from tracker monkeypatch does not work. Abandon it.

Exception ignored in: <Finalize object, dead>
Traceback (most recent call last):
  File "/cvmfs/hpc.esrf.fr/software/packages/linux/x86_64/jupyter-slurm/2023.10.7/envs/jupyter-slurm/lib/python3.11/multiprocessing/util.py", line 224, in __call__
    res = self._callback(*self._args, **self._kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/cvmfs/hpc.esrf.fr/software/packages/linux/x86_64/jupyter-slurm/2023.10.7/envs/jupyter-slurm/lib/python3.11/multiprocessing/synchronize.py", line 87, in _cleanup
    sem_unlink(name)
FileNotFoundError: [Errno 2] No such file or directory

jonwright · 2025-01-10T17:02:40Z

There is still the evil monkeypatch in properties.py

jonwright mentioned this issue Dec 5, 2024

Documentation needed for the newer peaksearching / segmentation #359

Open

1 task

jonwright mentioned this issue Jan 10, 2025

New release #371

Open

jonwright changed the title ~~Get rid of the shared memory arrays : add a colfile mmap option~~ refactor properties.py - 4d peak labelling Jan 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor properties.py - 4d peak labelling #265

refactor properties.py - 4d peak labelling #265

jonwright commented Mar 19, 2024

jonwright commented May 21, 2024 •

edited

Loading

jonwright commented May 21, 2024

jonwright commented Jan 10, 2025

refactor properties.py - 4d peak labelling #265

refactor properties.py - 4d peak labelling #265

Comments

jonwright commented Mar 19, 2024

jonwright commented May 21, 2024 • edited Loading

jonwright commented May 21, 2024

jonwright commented Jan 10, 2025

jonwright commented May 21, 2024 •

edited

Loading