Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPD With Long Period of Missing Data #17

Open
sssaha opened this issue Jan 20, 2025 · 3 comments
Open

CPD With Long Period of Missing Data #17

sssaha opened this issue Jan 20, 2025 · 3 comments

Comments

@sssaha
Copy link

sssaha commented Jan 20, 2025

I have a data which has long period of missing data - I tried the CPD from CLASSPY and it did not work

Image

Runs an overflow error

Traceback (most recent call last):
  File "C:\Github\dash_repo_issue\ClassPYTEST.py", line 22, in <module>
    x = clasp.fit_predict(time_series)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Github\dash_repo_issue\venvdashtest\Lib\site-packages\claspy\segmentation.py", line 331, in fit_predict
    return self.fit(time_series).predict(sparse)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Github\dash_repo_issue\venvdashtest\Lib\site-packages\claspy\segmentation.py", line 205, in fit
    window_sizes.append(max(3, map_window_size_methods(self.window_size)(time_series[:, dim]) // 2))
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Github\dash_repo_issue\venvdashtest\Lib\site-packages\claspy\window_size.py", line 107, in suss
    score = 1 - (_suss_score(time_series, window_size, stats) - min_score) / (max_score - min_score)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Github\dash_repo_issue\venvdashtest\Lib\site-packages\claspy\window_size.py", line 35, in _suss_score
    roll_mean = roll.mean().to_numpy()[window_size:]
                ^^^^^^^^^^^
  File "C:\Github\dash_repo_issue\venvdashtest\Lib\site-packages\pandas\core\window\rolling.py", line 2259, in mean
    return super().mean(
           ^^^^^^^^^^^^^
  File "C:\Github\dash_repo_issue\venvdashtest\Lib\site-packages\pandas\core\window\rolling.py", line 1625, in mean
    return self._apply(window_func, name="mean", numeric_only=numeric_only)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Github\dash_repo_issue\venvdashtest\Lib\site-packages\pandas\core\window\rolling.py", line 619, in _apply
    return self._apply_columnwise(homogeneous_func, name, numeric_only)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Github\dash_repo_issue\venvdashtest\Lib\site-packages\pandas\core\window\rolling.py", line 472, in _apply_columnwise
    return self._apply_series(homogeneous_func, name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Github\dash_repo_issue\venvdashtest\Lib\site-packages\pandas\core\window\rolling.py", line 456, in _apply_series
    result = homogeneous_func(values)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Github\dash_repo_issue\venvdashtest\Lib\site-packages\pandas\core\window\rolling.py", line 614, in homogeneous_func
    result = calc(values)
             ^^^^^^^^^^^^
  File "C:\Github\dash_repo_issue\venvdashtest\Lib\site-packages\pandas\core\window\rolling.py", line 611, in calc
    return func(x, start, end, min_periods, *numba_args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "aggregations.pyx", line 262, in pandas._libs.window.aggregations.roll_mean
OverflowError: int too big to convert

Tried other segmentations too - all ended up with error - Is using Classpy possible with this kind of scenarios

@ermshaua
Copy link
Owner

Hi,

no, ClaSP cannot handle missing data. You could, however, run ClaSP on the first part of the data and the second part separately and concatenate the found CPs. Does this work for you?

Best
Arik

@sssaha
Copy link
Author

sssaha commented Jan 20, 2025

There can be missing data in other regions too - like one or two timestamps. These are measurements from real sensors - and from sensors the data can go missing.

@ermshaua
Copy link
Owner

ClaSP expects data sampled at equi-distant timestamps, e.g. every 10 milliseconds. I'd suggest preprocessing the data, e.g. using interpolation, and applying ClaSP afterwards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants