Open
Description
Feature Enhancement
For encoders that have the handle_missing argument, allow a function to be passed here that takes the value of the missing value and computes an encoding for it. This allows the user to choose which encoding value is the best match for a given missing label.
def get_best_match(missing_value, available_values: Dict[Any,float]) -> float:
""" choose which value best represents missing value"""
return best_match
encoder = OrdinalEncoder(handle_missing=get_best_match)
Example
We have categories at train of Nokia 2.1, Nokia 2.2, Samsung A52, Samsung S10
.
At predict we also have Nokia 2.3, Samsung A52s
from thefuzz import process
def get_best_match(missing_value, available_values: Dict[Any,float]) -> float:
""" perform string matching with thefuzz to get closest matching string"""
most_similar_label = process.ExtractOne(missing_value, list(available_values.keys())
return available_values[most_similar_label]
encoder = OrdinalEncoder(handle_missing=get_best_match)