Skip to content

SpykingCircus2 clustering crash at template estimation #3722

Open
@b-grimaud

Description

@b-grimaud

I'm trying to run SC2 on 3Brain HD-MEA data (4096 channels, 20 kHz) with mostly default parameters :

p = {
 'apply_motion_correction': False,
 'apply_preprocessing': True,
 'cache_preprocessing': {'delete_cache': True,
                         'memory_limit': 0.5,
                         'mode': 'zarr'},
 'clustering': {'legacy': True},
 'debug': False,
 'detection': {'detect_threshold': 5, 'peak_sign': 'neg'},
 'filtering': {'filter_order': 2,
               'freq_max': 7000,
               'freq_min': 150,
               'ftype': 'bessel',
               'margin_ms': 10},
 'general': {'ms_after': 2, 'ms_before': 2, 'radius_um': 75},
 'job_kwargs': {'n_jobs': 40,},# 'total_memory': '50G'},
 'matched_filtering': True,
 'matching': {'method': 'circus-omp-svd'},
 'merging': {'auto_merge': {'corr_diff_thresh': 0.25, 'min_spikes': 10},
             'correlograms_kwargs': {},
             'similarity_kwargs': {'max_lag_ms': 0.2,
                                   'method': 'cosine',
                                   'support': 'union'}},
 'motion_correction': {'preset': 'dredge_fast'},
 'multi_units_only': False,
 'seed': 42,
 'selection': {'method': 'uniform',
               'min_n_peaks': 100000,
               'n_peaks_per_channel': 5000,
               'seed': 42,
               'select_per_channel': False},
 'sparsity': {'amplitude_mode': 'peak_to_peak',
              'method': 'snr',
              'threshold': 0.25},
 'whitening': {'mode': 'local', 'regularize': False}}

The only trace I get is :

spykingcircus2 could benefit from using torch. Consider installing it
Preprocessing the recording (bandpass filtering + CMR + whitening)
noise_level (workers: 20 processes): 100%|███████████████████████████████████████████████████| 20/20 [00:24<00:00,  1.23s/it]
Use zarr_path=/tmp/spikeinterface_cache/tmpdx30z0db/CCLII4NL.zarr
write_zarr_recording 
engine=process - n_jobs=40 - samples_per_chunk=19,753 - chunk_memory=308.64 MiB - total_memory=12.06 GiB - chunk_duration=1.00s (999.96 ms)
write_zarr_recording (workers: 40 processes): 100%|██████████████████████████████████████████| 61/61 [01:34<00:00,  1.54s/it]
detect peaks using locally_exclusive + 1 node (workers: 40 processes): 100%|█████████████████| 61/61 [00:11<00:00,  5.10it/s]
detect peaks using matched_filtering (workers: 40 processes): 100%|██████████████████████████| 61/61 [02:18<00:00,  2.27s/it]
Kept 179242 peaks for clustering
extracting features (workers: 40 processes): 100%|███████████████████████████████████████████| 61/61 [00:06<00:00,  9.63it/s]
split_clusters with local_feature_clustering: 100%|███████████████████████████████████| 4210/4210 [00:00<00:00, 42564.83it/s]
Bus error (core dumped)

I've been able to trace the crash back to a call to estimate_templates (here) which then seems to call estimate_templates_with_accumulator.

From what I could gather this looks like an out of memory error, but I've never seen something quite like this with other OOM Python issues.

The GUI monitor shows a modest 17x10⁶ TB being used :

Image

And dmesg shows the following :

[  954.327570] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0-1,global_oom,task_memcg=/user.slice/user-1000.slice/[email protected]/tmux-spawn-ad081ad7-fd73-4200-8a54-e76e0ce4b80d.scope,task=python,pid=8624,uid=1000
[  954.327924] Out of memory: Killed process 8624 (python) total-vm:14242184kB, anon-rss:10079372kB, file-rss:6260kB, shmem-rss:0kB, UID:1000 pgtables:20984kB oom_score_adj:0
[  956.739762] systemd-journald[641]: Under memory pressure, flushing caches.
[  957.536557] [drm:nv_drm_master_set [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00007300] Failed to grab modeset ownership
[  957.631410] rfkill: input handler disabled

I'm not sure if this is normal behavior and there are simply too many redundant units or if there's an actual issue with memory handling.

I've tried passing total_memory to both the sorters's job_kwargs and SpikeInterface's global job_kwargs, but I'm not sure it's taken into account when not dealing with the recording itself.

Metadata

Metadata

Assignees

No one assigned

    Labels

    sortersRelated to sorters module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions