Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a MedianEncoder on encoding category #568

Open
ricardordb opened this issue Nov 26, 2022 · 4 comments
Open

Adding a MedianEncoder on encoding category #568

ricardordb opened this issue Nov 26, 2022 · 4 comments

Comments

@ricardordb
Copy link

Hello!
I am working on a project and we found that the median encoding works better for our kind of data.
So I replaced the MeanEncoding mean function by the median function creating a new encoder.
Already tested the new encoder on some data and it works perfectly.
I forked it here: feature_engine

What do you think about adding a MedianEncoder to the project?

@solegalli
Copy link
Collaborator

Hi @ricardordb Thanks a lot for your suggestion!

@glevv what do you think about this suggestion?

My thoughts:

In principle, if we use target mean encoding, I don't see why not also use target median. Sounds like a small difference from a statistical perspective.

In practice, I don't know how much the median encoder would improve the model performance over the mean encoder (I guess we don't have enough data on that).

I guess we could leave the performance bit to the user, but by adding a transformer to the library, we are sort of legitimizing its use. Less experienced users may think this is mainstream encoding method, when this is probably not the case?

The MeanEncoder functionality is based on the article from Micci-Barrera, which explains the logic based on Bayes, and also the use of smoothing.

Looking at the class @ricardordb developed, it looks like the smoothing functionality should be removed and we would have to change the docstrings substantially not to mislead people to think that the new transformer is based on the same article, if we were to include the class?

@ricardordb do you have references supporting the use of this class?

@glevv
Copy link
Contributor

glevv commented Nov 28, 2022

@solegalli

This is a special case of quantile encoder
http://contrib.scikit-learn.org/category_encoders/quantile.html

I know of this method, but didn't use it at all, since I saw no point in using it over target encoding.

@solegalli
Copy link
Collaborator

Thank you @glevv

This tells us 2 things:

First, I still have a lot to learn (lol).

And second, if we were to implement median encoding, then, we should probably read the references in the Quantile encoder from category encoders to understand more of its use and functionality, and potentially create a quantile encoder and not just a median encoder, based on the literature.

Since it exists in category encoders, I don't think this is urgent, but if someone thinks it is worth it, I would be happy to make it part of feature-engine as well.

@ricardordb have you used the quantile encoder?

@ricardordb
Copy link
Author

ricardordb commented Nov 29, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants