Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong stationarity alert in time series #1636

Open
3 tasks done
Blackandwhite23 opened this issue Aug 8, 2024 · 2 comments
Open
3 tasks done

Wrong stationarity alert in time series #1636

Blackandwhite23 opened this issue Aug 8, 2024 · 2 comments

Comments

@Blackandwhite23
Copy link

Current Behaviour

I made a report of a time series and then used the following code:
description = profile.get_description()
for col in df:
var1 = description.variables.get(col)
stat = var1.get('stationary')
p = var1.get('addfuller')
display("Column: " + col + " ; Stationary: " + str(stat) +" ; P: " + str(p))

I analysed a data set with some columns and get the following result:
Column: Column 1 ; Stationary: False ; P: 8.367848162951153e-15
Column: Column 2 ; Stationary: False ; P: 1.0170622187220445e-11
Column: Column 3 ; Stationary: False ; P: 2.555609761088582e-05
Column: Column 4 ; Stationary: False ; P: 7.172269761903138e-08
Column: Column 5 ; Stationary: False ; P: 9.321131415426812e-18
Column: Column 6 ; Stationary: False ; P: 9.027089348108759e-15
Column: Column 7 ; Stationary: False ; P: 0.02133819126759494
Column: Column 8 ; Stationary: False ; P: 4.406120572138344e-12
Column: Column 9 ; Stationary: False ; P: 0.0028888647417244155
Column: Column 10 ; Stationary: False ; P: 0.00044090523969600784
Column: Column 11 ; Stationary: False ; P: 0.00286260675205775
Column: Column 12 ; Stationary: False ; P: 0.0001708455587419074
Column: Column 13 ; Stationary: False ; P: 9.472249697294651e-30
Column: Column 14 ; Stationary: False ; P: 2.526552913384979e-12
Column: Column 15 ; Stationary: False ; P: 0.000455609981090904
Column: Column 16 ; Stationary: False ; P: 0.0004254554235795494
Column: Column 17 ; Stationary: None ; P: None
Column: Column 18 ; Stationary: None ; P: None
Column: Column 19 ; Stationary: False ; P: 1.2239118466383953e-16
Column: Column 20 ; Stationary: True ; P: 9.06748511005521e-29
Column: Column 21 ; Stationary: True ; P: 0.005396832629069178
Column: Column 22 ; Stationary: True ; P: 1.850847639853015e-11

Expected Behaviour

I would expect, that it marks every column with a p-value of < 0.05 as "stationary".

Data Description

I used a private dataset

Code that reproduces the bug

description = profile.get_description()


for col in df:
  var1 = description.variables.get(col)
  stat = var1.get('stationary')
  p = var1.get('addfuller')
  display("Column: " + col + " ; Stationary: " + str(stat) +" ; P: " + str(p))

pandas-profiling version

v4.9.0

Dependencies

Package        Version
0  ydata_profiling         v4.9.0
1           pandas          2.1.4
2            numpy         1.26.4
3       matplotlib          3.7.1
4      statsmodels         0.14.2
5           Python        3.10.12
6               OS  Linux 6.1.85+

OS

Linux 6.1.85+

Checklist

  • There is not yet another bug report for this issue in the issue tracker
  • The problem is reproducible from this bug report. This guide can help to craft a minimal bug report.
  • The issue has not been resolved by the entries listed under Common Issues.
@Blackandwhite23
Copy link
Author

Here some data in the attachment if needed
ADF_Test.csv

#Load dataset to dataframe
import pandas as pd
filename = 'ADF_Test.csv'
data = pd.read_csv(filename,sep=',',decimal='.', parse_dates=["time"], index_col="time")
display(data)

import ydata_profiling

Create a profile report

profile = ydata_profiling.ProfileReport(data, title="ADF Test", explorative=True, tsmode=True)
profile.to_notebook_iframe()
profile.to_file("ADF_Test.html")

description = profile.get_description()

for col in data:
var1 = description.variables.get(col)
stat = var1.get('stationary')
p = var1.get('addfuller')
display("Column: " + col + " ; Stationary: " + str(stat) +" ; P: " + str(p))

And then I get:
Column: Col1 ; Stationary: False ; P: 8.367848162944989e-15

@quant12345
Copy link
Contributor

quant12345 commented Sep 28, 2024

Hi @Blackandwhite23!
I looked at the file: describe_timeseries_pandas.py
function pandas_describe_timeseries_1d, which returns the stationary and p_value. The function has a check for seasonal (if it is, return False(row 214)):

stats["stationary"] = is_stationary and not stats["seasonal"]

My knowledge of statistics is modest. From everything I've seen and read, I know that remove the trend. Here it is written that stationary ones do not have a trend and seasonality. And there is also a discussion here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants