Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate cause of bug in River EFDT implementation #11

Open
anjsimmo opened this issue Oct 12, 2023 · 0 comments
Open

Investigate cause of bug in River EFDT implementation #11

anjsimmo opened this issue Oct 12, 2023 · 0 comments
Assignees

Comments

@anjsimmo
Copy link
Contributor

When training EFDT on the Skin dataset (from the UCI Machine Learning Repository), EFDT sometimes encounters an error (e.g. on the ninth batch of 1000 points).

This appears to be a bug within the River impelementation of EFDT itself. We need to confirm the bug, check if it exists in the development version of River (there was a recent EFDT bug fix here which may or may not be related), and report it if still an issue (with a minimal reproducible example, ideally on a synthetic dataset to make it easier to reproduce the bug).

In the case of our paper, we can work around this by using smaller data samples (up to eight batches of 1000), but it raises questions about the reilability of the River EFDT implementation (looking at the Git history, the River implementation of EFDT seems to have been written by Saulo Martiello Mastelini during his PhD, without any involvement from the original authors of the EFDT paper who impemented it in Java)

See the following notebook for a demonstration of the issue: https://github.com/a2i2/tree_diff/blob/re-evaluation/notebooks/EFDT%20Issue.ipynb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants