Skip to content

Allow function to be passed as handle missing argument that user can define which group to default to #344

Open
@Fish-Soup

Description

@Fish-Soup

Feature Enhancement

For encoders that have the handle_missing argument, allow a function to be passed here that takes the value of the missing value and computes an encoding for it. This allows the user to choose which encoding value is the best match for a given missing label.

def get_best_match(missing_value, available_values: Dict[Any,float]) -> float:
      """ choose which value best represents missing value"""
      return best_match


encoder = OrdinalEncoder(handle_missing=get_best_match)

Example

We have categories at train of Nokia 2.1, Nokia 2.2, Samsung A52, Samsung S10.
At predict we also have Nokia 2.3, Samsung A52s

from thefuzz import process
def get_best_match(missing_value, available_values: Dict[Any,float]) -> float:
       """ perform string matching with thefuzz to get closest matching string""" 
      most_similar_label =  process.ExtractOne(missing_value, list(available_values.keys())
       return available_values[most_similar_label]

encoder = OrdinalEncoder(handle_missing=get_best_match)


Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions