Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option to order by ascending/descending prediction in cumulative effect curves #204

Open
HectorLira opened this issue Jul 25, 2022 · 0 comments · May be fixed by #220
Open

Add an option to order by ascending/descending prediction in cumulative effect curves #204

HectorLira opened this issue Jul 25, 2022 · 0 comments · May be fixed by #220
Labels
enhancement New feature or request

Comments

@HectorLira
Copy link
Contributor

HectorLira commented Jul 25, 2022

Describe the feature and the current state.

In the causal validation module and the curves file, it would be useful to add an ascending parameter for the cumulative effect and cumulative gain curves.

The current state is to order predictions descending:

ordered_df = df.sort_values(prediction, ascending=False).reset_index(drop=True)

If we add an ascending: bool = False argument to the cumulative_effect_curve, cumulative_gain_curve, relative_cumulative_gain_curve, and effect_curves, a user could modify how these effects are computed, whether to do them ascending or descending by the prediction column.

Will this change a current behavior? How?

Not if the user does not explicitly change the argument to ascending=True. If they do, the cumulative effect or cumulative gain curves will be computed using an ascending ordering in the prediction column.

A model could output a prediction that is not necessarily positively related to the effect to be computed, so adding an option to order this relationship differently will allow for effects and gains with negatively related predictions and outcomes to be computed adequately.

One current workaround is to do this:

df["prediction"] = -df["prediction"]

and then the computation will be made adequately. But this seems like a hack and maybe something we want to solve more cleanly.

Additional Information

The new definition of cumulative_effect_curve would look like this:

@curry
def cumulative_effect_curve(df: pd.DataFrame,
                            treatment: str,
                            outcome: str,
                            prediction: str,
                            min_rows: int = 30,
                            steps: int = 100,
                            effect_fn: EffectFnType = linear_effect,
                            ascending: bool = False) -> np.ndarray:
    """
    Orders the dataset by prediction and computes the cumulative effect curve according to that ordering

    Parameters
    ----------
    df : Pandas' DataFrame
        A Pandas' DataFrame with target and prediction scores.

    treatment : Strings
        The name of the treatment column in `df`.

    outcome : Strings
        The name of the outcome column in `df`.

    prediction : Strings
        The name of the prediction column in `df`.

    min_rows : Integer
        Minimum number of observations needed to have a valid result.

    steps : Integer
        The number of cumulative steps to iterate when accumulating the effect

    effect_fn : function (df: pandas.DataFrame, treatment: str, outcome: str) -> int or Array of int
        A function that computes the treatment effect given a dataframe, the name of the treatment column and the name
        of the outcome column.

    ascending : bool
        Whether the prediction column should be ordered ascending or not. Default is False.


    Returns
    ----------
    cumulative effect curve: Numpy's Array
        The cumulative treatment effect according to the predictions ordering.
    """

    size = df.shape[0]
    ordered_df = df.sort_values(prediction, ascending=ascending).reset_index(drop=True)
    n_rows = list(range(min_rows, size, size // steps)) + [size]
    return np.array([effect_fn(ordered_df.head(rows), treatment, outcome) for rows in n_rows])
@HectorLira HectorLira added the enhancement New feature or request label Jul 25, 2022
@MarianaBlaz MarianaBlaz linked a pull request Dec 20, 2022 that will close this issue
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant