CPD With Long Period of Missing Data #17

sssaha · 2025-01-20T04:45:30Z

I have a data which has long period of missing data - I tried the CPD from CLASSPY and it did not work

Runs an overflow error

Traceback (most recent call last):
  File "C:\Github\dash_repo_issue\ClassPYTEST.py", line 22, in <module>
    x = clasp.fit_predict(time_series)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Github\dash_repo_issue\venvdashtest\Lib\site-packages\claspy\segmentation.py", line 331, in fit_predict
    return self.fit(time_series).predict(sparse)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Github\dash_repo_issue\venvdashtest\Lib\site-packages\claspy\segmentation.py", line 205, in fit
    window_sizes.append(max(3, map_window_size_methods(self.window_size)(time_series[:, dim]) // 2))
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Github\dash_repo_issue\venvdashtest\Lib\site-packages\claspy\window_size.py", line 107, in suss
    score = 1 - (_suss_score(time_series, window_size, stats) - min_score) / (max_score - min_score)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Github\dash_repo_issue\venvdashtest\Lib\site-packages\claspy\window_size.py", line 35, in _suss_score
    roll_mean = roll.mean().to_numpy()[window_size:]
                ^^^^^^^^^^^
  File "C:\Github\dash_repo_issue\venvdashtest\Lib\site-packages\pandas\core\window\rolling.py", line 2259, in mean
    return super().mean(
           ^^^^^^^^^^^^^
  File "C:\Github\dash_repo_issue\venvdashtest\Lib\site-packages\pandas\core\window\rolling.py", line 1625, in mean
    return self._apply(window_func, name="mean", numeric_only=numeric_only)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Github\dash_repo_issue\venvdashtest\Lib\site-packages\pandas\core\window\rolling.py", line 619, in _apply
    return self._apply_columnwise(homogeneous_func, name, numeric_only)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Github\dash_repo_issue\venvdashtest\Lib\site-packages\pandas\core\window\rolling.py", line 472, in _apply_columnwise
    return self._apply_series(homogeneous_func, name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Github\dash_repo_issue\venvdashtest\Lib\site-packages\pandas\core\window\rolling.py", line 456, in _apply_series
    result = homogeneous_func(values)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Github\dash_repo_issue\venvdashtest\Lib\site-packages\pandas\core\window\rolling.py", line 614, in homogeneous_func
    result = calc(values)
             ^^^^^^^^^^^^
  File "C:\Github\dash_repo_issue\venvdashtest\Lib\site-packages\pandas\core\window\rolling.py", line 611, in calc
    return func(x, start, end, min_periods, *numba_args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "aggregations.pyx", line 262, in pandas._libs.window.aggregations.roll_mean
OverflowError: int too big to convert

Tried other segmentations too - all ended up with error - Is using Classpy possible with this kind of scenarios

ermshaua · 2025-01-20T07:36:37Z

Hi,

no, ClaSP cannot handle missing data. You could, however, run ClaSP on the first part of the data and the second part separately and concatenate the found CPs. Does this work for you?

Best
Arik

sssaha · 2025-01-20T21:01:08Z

There can be missing data in other regions too - like one or two timestamps. These are measurements from real sensors - and from sensors the data can go missing.

ermshaua · 2025-01-21T07:31:50Z

ClaSP expects data sampled at equi-distant timestamps, e.g. every 10 milliseconds. I'd suggest preprocessing the data, e.g. using interpolation, and applying ClaSP afterwards.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CPD With Long Period of Missing Data #17

CPD With Long Period of Missing Data #17

sssaha commented Jan 20, 2025

ermshaua commented Jan 20, 2025

sssaha commented Jan 20, 2025

ermshaua commented Jan 21, 2025

CPD With Long Period of Missing Data #17

CPD With Long Period of Missing Data #17

Comments

sssaha commented Jan 20, 2025

ermshaua commented Jan 20, 2025

sssaha commented Jan 20, 2025

ermshaua commented Jan 21, 2025