You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the causal validation module and the curves file, it would be useful to add an ascending parameter for the cumulative effect and cumulative gain curves.
If we add an ascending: bool = False argument to the cumulative_effect_curve, cumulative_gain_curve, relative_cumulative_gain_curve, and effect_curves, a user could modify how these effects are computed, whether to do them ascending or descending by the prediction column.
Will this change a current behavior? How?
Not if the user does not explicitly change the argument to ascending=True. If they do, the cumulative effect or cumulative gain curves will be computed using an ascending ordering in the prediction column.
A model could output a prediction that is not necessarily positively related to the effect to be computed, so adding an option to order this relationship differently will allow for effects and gains with negatively related predictions and outcomes to be computed adequately.
One current workaround is to do this:
df["prediction"] =-df["prediction"]
and then the computation will be made adequately. But this seems like a hack and maybe something we want to solve more cleanly.
Additional Information
The new definition of cumulative_effect_curve would look like this:
@currydefcumulative_effect_curve(df: pd.DataFrame,
treatment: str,
outcome: str,
prediction: str,
min_rows: int=30,
steps: int=100,
effect_fn: EffectFnType=linear_effect,
ascending: bool=False) ->np.ndarray:
""" Orders the dataset by prediction and computes the cumulative effect curve according to that ordering Parameters ---------- df : Pandas' DataFrame A Pandas' DataFrame with target and prediction scores. treatment : Strings The name of the treatment column in `df`. outcome : Strings The name of the outcome column in `df`. prediction : Strings The name of the prediction column in `df`. min_rows : Integer Minimum number of observations needed to have a valid result. steps : Integer The number of cumulative steps to iterate when accumulating the effect effect_fn : function (df: pandas.DataFrame, treatment: str, outcome: str) -> int or Array of int A function that computes the treatment effect given a dataframe, the name of the treatment column and the name of the outcome column. ascending : bool Whether the prediction column should be ordered ascending or not. Default is False. Returns ---------- cumulative effect curve: Numpy's Array The cumulative treatment effect according to the predictions ordering. """size=df.shape[0]
ordered_df=df.sort_values(prediction, ascending=ascending).reset_index(drop=True)
n_rows=list(range(min_rows, size, size//steps)) + [size]
returnnp.array([effect_fn(ordered_df.head(rows), treatment, outcome) forrowsinn_rows])
The text was updated successfully, but these errors were encountered:
Describe the feature and the current state.
In the causal validation module and the curves file, it would be useful to add an
ascending
parameter for the cumulative effect and cumulative gain curves.The current state is to order predictions descending:
If we add an
ascending: bool = False
argument to thecumulative_effect_curve
,cumulative_gain_curve
,relative_cumulative_gain_curve
, andeffect_curves
, a user could modify how these effects are computed, whether to do them ascending or descending by the prediction column.Will this change a current behavior? How?
Not if the user does not explicitly change the argument to
ascending=True
. If they do, the cumulative effect or cumulative gain curves will be computed using an ascending ordering in the prediction column.A model could output a prediction that is not necessarily positively related to the effect to be computed, so adding an option to order this relationship differently will allow for effects and gains with negatively related predictions and outcomes to be computed adequately.
One current workaround is to do this:
and then the computation will be made adequately. But this seems like a hack and maybe something we want to solve more cleanly.
Additional Information
The new definition of
cumulative_effect_curve
would look like this:The text was updated successfully, but these errors were encountered: