Skip to content

matiasscorsetti/drift

Folders and files

NameName
Last commit message
Last commit date

Latest commit

be0175e · May 19, 2021

History

24 Commits
Nov 3, 2020
Nov 18, 2020
Nov 18, 2020
Nov 3, 2020
May 19, 2021
Nov 18, 2020
Nov 3, 2020

Repository files navigation

Drift

Data Drift Detection

Drift estimator for multiple columns using cluster sampling and weights

It is based on an ADWIN (ADaptive WINdowing) model for each column of a dataframe. ADWIN is an adaptive sliding window algorithm for detecting changes, and keep up-to-date statistics on a data stream. ADWIN allows algorithms not adapted for drifting data, be resistant to this phenomenon.

The general idea is to keep statistics from a variable size window while detecting concept drift.

The algorithm will decide the size of the window by cutting the statistics window at different points and analyze the average of some statistics in these two windows. If the absolute value of the difference between the two averages exceed a predefined threshold, the change is detected at that point and all data before that point is discarded.

When training the model, the size of the resulting dataset is saved (if a sample was performed in the training, the sample size determines the dataframe size, see "size" attributes). The results should be evaluated at the dataframe level in general or per column (and not at the row level).

Always automatically adjusts the size of the input dataframe to the size of the dataset used in training.

About

Data Drift Detection

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published