Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test the effects of a random noise vector in the model #272

Closed
dfsnow opened this issue Dec 7, 2024 · 4 comments
Closed

Test the effects of a random noise vector in the model #272

dfsnow opened this issue Dec 7, 2024 · 4 comments
Assignees
Labels
method ML technique or method change

Comments

@dfsnow
Copy link
Member

dfsnow commented Dec 7, 2024

Following up on discussions in the office, we should test the perturbation effect of a vector of random noise in the model. Specifically, we should look at feature importance and the stability of estimates.

@dfsnow dfsnow added the method ML technique or method change label Dec 7, 2024
@Damonamajor
Copy link
Contributor

Damonamajor commented Dec 9, 2024

@dfsnow do we want this in something like the equivalent of an EI-issue? Where should it be stored?
The same question for #273

@Damonamajor
Copy link
Contributor

Damonamajor commented Dec 9, 2024

@dfsnow We are also probably going to have a decent number of comparisons / test etc. Do we want to have some set parameters or a baseline to test these things against throughout the modeling season?

@dfsnow
Copy link
Member Author

dfsnow commented Dec 9, 2024

@dfsnow do we want this in something like the equivalent of an EI-issue? Where should it be stored? The same question for #273

For something like this issue, I think it's sufficient to just post findings in the issue comments (including any supporting tables and plots). No need for separate analyses.

@dfsnow We are also probably going to have a decent number of comparisons / test etc. Do we want to have some set parameters or a baseline to test these things against throughout the modeling season?

We'll run a baseline model as soon as #263 is merged. We can use that for comparison for now (until we have 2024 data).

@dfsnow
Copy link
Member Author

dfsnow commented Jan 22, 2025

Adding a random vector of noise was actually pretty useful! It didn't alter the global performance statistics, but it did introduce some instability in individual predictions that helped us debug.

Adding random noise caused a few predictions to flip between very low ($100K) and very high ($10M) values.
This turned out to be caused by the inclusion of some very high (albeit plausible) erroneous sales in the training data that made it past our sales validation. We've since corrected the underlying sales, which in turn stabilized our predictions.

@dfsnow dfsnow closed this as completed Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
method ML technique or method change
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants