You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I made a report of a time series and then used the following code:
description = profile.get_description()
for col in df:
var1 = description.variables.get(col)
stat = var1.get('stationary')
p = var1.get('addfuller')
display("Column: " + col + " ; Stationary: " + str(stat) +" ; P: " + str(p))
for col in data:
var1 = description.variables.get(col)
stat = var1.get('stationary')
p = var1.get('addfuller')
display("Column: " + col + " ; Stationary: " + str(stat) +" ; P: " + str(p))
And then I get:
Column: Col1 ; Stationary: False ; P: 8.367848162944989e-15
Hi @Blackandwhite23!
I looked at the file: describe_timeseries_pandas.py
function pandas_describe_timeseries_1d, which returns the stationary and p_value. The function has a check for seasonal (if it is, return False(row 214)):
stats["stationary"] = is_stationary and not stats["seasonal"]
My knowledge of statistics is modest. From everything I've seen and read, I know that remove the trend. Here it is written that stationary ones do not have a trend and seasonality. And there is also a discussion here.
Current Behaviour
I made a report of a time series and then used the following code:
description = profile.get_description()
for col in df:
var1 = description.variables.get(col)
stat = var1.get('stationary')
p = var1.get('addfuller')
display("Column: " + col + " ; Stationary: " + str(stat) +" ; P: " + str(p))
I analysed a data set with some columns and get the following result:
Column: Column 1 ; Stationary: False ; P: 8.367848162951153e-15
Column: Column 2 ; Stationary: False ; P: 1.0170622187220445e-11
Column: Column 3 ; Stationary: False ; P: 2.555609761088582e-05
Column: Column 4 ; Stationary: False ; P: 7.172269761903138e-08
Column: Column 5 ; Stationary: False ; P: 9.321131415426812e-18
Column: Column 6 ; Stationary: False ; P: 9.027089348108759e-15
Column: Column 7 ; Stationary: False ; P: 0.02133819126759494
Column: Column 8 ; Stationary: False ; P: 4.406120572138344e-12
Column: Column 9 ; Stationary: False ; P: 0.0028888647417244155
Column: Column 10 ; Stationary: False ; P: 0.00044090523969600784
Column: Column 11 ; Stationary: False ; P: 0.00286260675205775
Column: Column 12 ; Stationary: False ; P: 0.0001708455587419074
Column: Column 13 ; Stationary: False ; P: 9.472249697294651e-30
Column: Column 14 ; Stationary: False ; P: 2.526552913384979e-12
Column: Column 15 ; Stationary: False ; P: 0.000455609981090904
Column: Column 16 ; Stationary: False ; P: 0.0004254554235795494
Column: Column 17 ; Stationary: None ; P: None
Column: Column 18 ; Stationary: None ; P: None
Column: Column 19 ; Stationary: False ; P: 1.2239118466383953e-16
Column: Column 20 ; Stationary: True ; P: 9.06748511005521e-29
Column: Column 21 ; Stationary: True ; P: 0.005396832629069178
Column: Column 22 ; Stationary: True ; P: 1.850847639853015e-11
Expected Behaviour
I would expect, that it marks every column with a p-value of < 0.05 as "stationary".
Data Description
I used a private dataset
Code that reproduces the bug
pandas-profiling version
v4.9.0
Dependencies
OS
Linux 6.1.85+
Checklist
The text was updated successfully, but these errors were encountered: