Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revised metrics proposal #15

Open
wants to merge 44 commits into
base: master
Choose a base branch
from

Conversation

riedgar-ms
Copy link
Member

Create a fresh proposal for metrics, based on discussions in meetings, and focusing on 'what does the user type?' rather than the implementation.

Copy link
Member

@romanlutz romanlutz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together! This is very thorough and explains the difficulties very well. I hope my comments make sense :-)

api/Updated-Metrics.md Outdated Show resolved Hide resolved

We would provide operations to calculate differences in various ways (all of these results are a Pandas Series):
```python
>>> result.differences()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very confusing to me. Defaults to relative_to="max"?
Why not let the user specify? Reads a lot better with the explicit specification as well.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean you'd like to have no default at all or a different one than "max"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user can specify, but I believe doing relative to max matches our current behaviour. I would be happy with not having a default too, if that was the consensus.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our current behavior is that we are taking differences relative to the minimum not maximum (that's why all of our differences are positive). That's how I wrote the proposal here. In other words, the "relative to" part is whatever you subtract from all the group-level metric values.

But maybe it's confusing? An alternative term could be "origin". Another idea would be to say "from" instead of "relative_to"?

>>> result.differences(relative_to='overall', aggregate='max')
0.1870
>>> result.differences(relative_to='overall', abs=True, aggregate='max')
0.2436
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this different from the line above with no abs? The value was positive, so that should be identical, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without abs the values were -0.2436 and 0.1870. With abs the values are 0.2436 and 0.1870.


There would be a similar method called `ratios()` on the `GroupedMetric` object:
```python
>>> result.ratios()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same thing, don't like the default at all :-(

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the same from me: I'm happy to change if there's agreement that requiring the argument makes more sense.

- `ratio_order=` determines how to build the ratio. Values are
- `sub_unity` to make larger value the denominator
- `super_unity` to make larger value the numerator
- `from_relative` to make the value specified by `relative_to=` the denominator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both this line and the next say relative_to, is that intended?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit worried about the complexity of the ratio_order argument, esp for new users.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The change is between numerator and denominator.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a question of naming, or whether we need all the cases? I'm not overly happy with the names myself, but I do think that all four cases might be useful.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit of both, I guess... but I think you're right that you need all four cases.

api/Updated-Metrics.md Outdated Show resolved Hide resolved
This would make the following all equivalent:
```python
>>> r1 = group_summary(sklearn.accuracy_score, y_true, y_pred, sensitive_features=A_sex)
>>> r2 = group_summary('accuracy_score', y_true, y_pred, sensitive_features=A_sex)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume this maps to the sklearn metrics? What if I want to define my own metric? Will that also work (although not with the string, of course)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me add an 'in addition' to the introductory sentence.

api/Updated-Metrics.md Outdated Show resolved Hide resolved
api/Updated-Metrics.md Outdated Show resolved Hide resolved

The question is whether users prefer:
```python
>>> diff = group_summary(skm.recall_score, y_true, y_pred, sensitive_features=A).difference(aggregate='max')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously I'm just one data point, but I strongly prefer this over the version below.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed - people were leaning that way, but since there's always the question of 'Do we really need a new concept?' I just wanted to call this out.

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the proposal. I think it's a good step. I'd wrap the lines though 🙈


```python
y_true = [0, 1, 0, 0, 1, 1, ...]
y_pred = [1, 0, 0, 1, 0, 1, ...]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we're dealing with metrics, should we separate floating point predictions (decision_function or predict) from discrete ones? (assuming some metrics need soft values and some work with hard values)

Maybe simply add :

Suggested change
y_pred = [1, 0, 0, 1, 0, 1, ...]
# or soft values, depending on what the underlying metric expects
y_pred = [1, 0, 0, 1, 0, 1, ...]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that we want separate group_summary() functions for different types of underlying metrics, but I agree about the need to highlight that we just slice up the y_pred list regardless of the types of its items, if that's what you meant. I was trying to write that below, but I'll make it more explicit.

api/Updated-Metrics.md Outdated Show resolved Hide resolved
weights = [ 1, 2, 3, 2, 2, 1, ...]
```

We actually seek to be very agnostic as to the contents of the `y_true` and `y_pred` arrays.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agnostic in what way? I think the proposal later explains this, but it may be easier to follow if that's moved up here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was meaning in exactly the same way as your suggested comment. Let me rephrase.

api/Updated-Metrics.md Outdated Show resolved Hide resolved
We propose adding an extra argument to `differences()` and `ratios()`, to provide a `condition_on=` argument.

Suppose we have a DataFrame, `A_3` with three sensitive features: Sex, Race and Income Band (the latter having values 'Low' and 'High').
This could represent a loan scenario where discrimination based on income is allowed, but within the income bands, other sensitive groups must be treated equally.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or to use abstract names and avoid mentioning sex, race or income alltogether.

```
This should generalise to the other methods described above.

One open question is how extra arguments should be passed to the individual metric functions, including how to handle the `indexed_params=`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading that correctly, we would ditch the **kwargs, and have an explicit props= dictionary - and probably rename indexed_params= to sample_props=. When evaluating multiple metrics at once, I still think we'd need a list of dictionaries and a list of lists for these, though.

api/Updated-Metrics.md Show resolved Hide resolved
api/Updated-Metrics.md Outdated Show resolved Hide resolved
Signed-off-by: Richard Edgar <[email protected]>
Signed-off-by: Richard Edgar <[email protected]>
Signed-off-by: Richard Edgar <[email protected]>

We do not intend to change the API invoked by the user.
What will change is the return type.
Rather than a `Bunch`, we will return a `GroupedMetric` object, which can offer richer functionality.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether we're already at the stage of discussing naming, but since we decided to discard .eval(), we might want to think about a name that better reflects that the class that's currently called GroupedMetric contains results/scores. The current name suggests that the class corresponds to some grouped metric itself rather than results/scores. Similarly, I'm not sure about group_summary(), because in my view GroupedMetric is not really a summary? My suggestion would be to come up with a few alternatives and ask some people who are not as deeply involved with the metrics API so far which one they find most intuitive.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great time to be discussing naming, I think..... I'll see if I can come up with a few possibilities.

api/Updated-Metrics.md Outdated Show resolved Hide resolved
api/Updated-Metrics.md Outdated Show resolved Hide resolved
api/Updated-Metrics.md Show resolved Hide resolved
api/Updated-Metrics.md Outdated Show resolved Hide resolved

We would provide operations to calculate differences in various ways (all of these results are a Pandas Series):
```python
>>> result.differences()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean you'd like to have no default at all or a different one than "max"?

- `ratio_order=` determines how to build the ratio. Values are
- `sub_unity` to make larger value the denominator
- `super_unity` to make larger value the numerator
- `from_relative` to make the value specified by `relative_to=` the denominator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit worried about the complexity of the ratio_order argument, esp for new users.


The question is whether users prefer:
```python
>>> diff = group_summary(skm.recall_score, y_true, y_pred, sensitive_features=A).difference(aggregate='max')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

api/Updated-Metrics.md Outdated Show resolved Hide resolved
api/Updated-Metrics.md Outdated Show resolved Hide resolved
api/Updated-Metrics.md Outdated Show resolved Hide resolved
@riedgar-ms
Copy link
Member Author

riedgar-ms commented Sep 8, 2020

Meeting 2020-09-08

Thanks to all for attending today. The main agreement is that we should think about a 'data exploration' interface (provided by the object) and a 'scorer' interface (provided by wrapper functions for the objects).

More minor points:

  • condition_on= should produce columns (at construction time), and not just be specified at aggregation time. This is because it's natural to look down columns for anomalies, and not have to jump back and forth to check which row you're on. These could be done as 'simple' columns (and be mutually exclusive with doing multiple metrics at once), or apparently DataFrames can have a MultiIndex on their columns.
  • We don't need most wrapper functions (such as accuracy_score_group_summary() etc.). The key ones to have, and to have available for make_scorer() are ones like demographic_parity_difference() and equalized_odds_ratio()
  • Perhaps we don't need all the various options for ratios() and differences(). Indeed, perhaps these would be best handled in the wrapper functions, and keep the methods on the objects constrained?

Next Steps

I will take @MiroDudik 's list of scenarios from above, and start to rough out how things might look in each API:

  1. Report one disaggregated metric in a data frame.
  2. Report several disaggregated metrics in a data frame.
  3. Report several performance and fairness metrics of several models in a data frame.
  4. In abeyance Report several performance and fairness metrics as well as some disaggregated metrics of several models in a data frame.
  5. Create a fairness-performance raster plot of several models.
  6. Run sklearn.model_selection.cross_validate with a demographic parity (as a fairness metric) and balanced error rate (as the performance metric). We should perhaps review this in the context of Adrin's PR in sklearn?
  7. Run GridSearchCV with demographic parity and balanced error, where the goal is to find the lowest-error model whose demographic parity is <= 0.05.

One thing which had been confusing me from before: These are not all expected to be one-liners, but may well be more complex. The idea is to identify which pieces are relevant to the metrics API, and whether we like how that looks.

Signed-off-by: Richard Edgar <[email protected]>
Signed-off-by: Richard Edgar <[email protected]>
Signed-off-by: Richard Edgar <[email protected]>
Signed-off-by: Richard Edgar <[email protected]>
@riedgar-ms
Copy link
Member Author

riedgar-ms commented Sep 14, 2020

Meeting 2020-09-14

Thanks you @hildeweerts @adrinjalali @MiroDudik and @amueller for our meeting today.

We agreed that:

  • We should have separate .difference() and .difference_from_overall() methods (similarly for ratios) for producing the scalars
  • Although users could to result.by_group.max() we should provide .group_max() and .group_min() methods
  • We should keep make_derived_metric() but have the first argument be a string, selecting from the various difference and ratio methods

It also sounded like we should keep the current set of functions built with make_derived_metric() such as accuracy_score_group_min().

According to @adrinjalali , the PR for SLEP006 Option 4 will probably be in the release of SciKit-Learn next April. We should aim to be compatible with that in our scorer objects.

Next Steps

I will update this proposal based on the above points. I will also start a rough implementation of GroupedMetric so that we have actual code to play with.

api/Updated-Metrics.md Outdated Show resolved Hide resolved
Signed-off-by: Richard Edgar <[email protected]>
api/Updated-Metrics.md Outdated Show resolved Hide resolved
@riedgar-ms
Copy link
Member Author

riedgar-ms commented Sep 21, 2020

Meeting 2020-09-21

Thanks to @adrinjalali @hildeweerts @amueller and @MiroDudik for our meeting today.

Points agreed:

  • Conditional features should go into the rows (but make sure that they are at a 'higher' level than the sensitive features)
  • Don't have separate .difference() and .difference_from_overall() methods, but instead have a .difference(method='min_max|to_overall') method, where we can add to the allowed values for the method= argument later (similarly for .ratios())
  • The .group_min(), group_max(), .difference() and .ratio() methods should produce one result per conditional feature class
  • Don't worry about supporting dictionaries of functions for now. Single functions and lists of functions are sufficient
  • Rather than have sample_param_names= in the constructor (as at present), better to have sample_params=dict() and slice those up
  • For now, make_derived_metric() should produce a single scalar, and not support conditional features

Still TBD:

  • Do we require users to use functools.partial() rather than having the params= argument to the GroupedMetric constructor? I felt that we were leaning towards keeping the params= argument, but I'm not sure there was consensus

Yet to be discussed:

  • What to do when particular intersections of conditional and sensitive features are empty. Do we skip that row entirely, or do we return None

@adrinjalali
Copy link
Member

I was under the impression we're okay with restricting the metrics' signature and asking users to use partial instead.

@riedgar-ms
Copy link
Member Author

riedgar-ms commented Sep 28, 2020

Meeting 2020-09-18

Thanks to @adrinjalali @hildeweerts and @MiroDudik for our meeting today.

We started by discussing the .overall property. Currently, when there are no conditional features, this is a DataFrame with a single row with a special name of 'overall':

>>> target.overall
        recall_score precision_score
overall     0.733333        0.717391

The motivation behind this decision was to ensure that we have a consistent type returned. However, doing so comes at the cost of introducing a 'magic' row name of 'overall'. Making this choice also gives some trouble doing operations such as target.by_group - target.overall, where one gets results such as:

        recall_score precision_score
overall          NaN             NaN
pp               NaN             NaN
q                NaN             NaN

The intended operation in this case is actually:

target.by_group - target.overall.iloc[0]
     recall_score precision_score
SF 0                             
pp      0.0057971     -0.00905797
q     -0.00606061      0.00988142

Continuing with the .difference() method, there are a certain number of behind-the-scenes machinations. However, perhaps these are not necessary:

target.difference(method='to_overall')
        recall_score precision_score
overall   0.00606061      0.00988142
(target.by_group - target.overall.iloc[0]).abs().max()
recall_score       0.006061
precision_score    0.009881
dtype: float64

These are not a problem when there are conditional features, which ensures that there are matching portions of the DataFrame index. Together, these suggest that when there are no conditional features, the return type of .overall should be a Series, with entries corresponding to the metric names. @MiroDudik is going to experiment some more, to verify that operations performed regularly by data scientists are then simple.

We agreed that for now, we should drop the params= argument from the GroupedMetric constructor, and require users to use functools.partial if they need to set other parameters for the metrics (such as the required beta= for fbeta_score()).

When specifying multiple functions in the GroupedMetric constructor, the current pattern is:

fns = [skm.recall_score, skm.precision_score, skm.fbeta_score]
sample_params = [{'sample_weight': wgts},
                 {'sample_weight': wgts},
                 {'sample_weight': wgts}]

result = metrics.GroupedMetric(fns, Y_test, Y_pred,
                               sensitive_features=...,
                               conditional_features=...,
                               sample_params=sample_params)

We agreed that using dictionaries instead of arrays would make more sense. Something like:

fns = {
    'recall': skm.recall_score,
    'prec': skm.precision_score,
    'fbeta_0.5': functools.partial(skm.fbeta_score, beta=0.5) }
sample_params = {
   'recall' : {'sample_weight': wgts},
   'prec': {'sample_weight': wgts},
   'fbeta_0.5': {'sample_weight': wgts}
}

There is one question as to whether sample_params= should be a dictionary-of-dictionaries, or use double underscores (i.e. sample_params={ 'recall__sample_weight': wgts, 'prec__sample_weight': wgts, ... }) as happens in SciKit-Learn. There was a preference to the dictionary-of-dictionaries, though. We agreed that trying to have a special 'broadcast' key (to avoid having to repeat sample_weight multiple times) was probably not worth the extra complications and pitfalls.

Now that the API is pretty much finalised, I will open a new issue to discuss the naming.

@riedgar-ms
Copy link
Member Author

Meeting 2020-10-06

Thanks to @adrinjalali @hildeweerts and @MiroDudik for our discussion today.

We had a long talk about make_derived_metric() and how it would be used. @MiroDudik has shared a hackmd with his concerns and a proposal, and we need to iterate on that. We did decide that the callable object returned from make_derived_metric() should support all the arguments of the base metric (even if we still require functools.partial() for GroupedMetric itself). We therefore need to tell make_derived_metric() which arguments need to be sliced up and which to just pass through. Since there are generally more of the latter, I will restore sample_param_names= to make_derived_metric().

One question which did arise: how to specify the method= argument for the aggregation approaches supported by make_derived_metric(). Should it be specified in the arguments of make_derived_metric(), or in the callable? The consensus was going towards the latter, but I would highlight that that 'reserves' another argument name which then can't be used by the base metric.

Related to these, although not essential for API design: do we provide all the accuracy_score_group_min() wrappers for users, or expect them to create them themselves (one line for each they want).

We agreed to keep 'empty' combinations of features in the result tables as NaNs; users can easily drop those rows if they want. In future, we might add an optional argument to drop them when the objects are created.

When it comes to naming, we still have issue #17 . I have the impression that no-one particularly likes GroupedMetric as a name, but there's a lack of better alternatives.

Next steps:

  • Continue with discussion of issues @MiroDudik raised. We need to close on this ASAP
  • I will move the new code out of experimental and replace the existing metrics implementation

@riedgar-ms
Copy link
Member Author

riedgar-ms commented Oct 12, 2020

Meeting 2020-10-12

Thanks to @MiroDudik and @adrinjalali for our meeting today. We agreed on the following:

  • The class will be renamed from GroupedMetric to MetricsFrame. We're using the plural 'Metrics' to make it plain that there won't be similar 'US' and 'Imperial' frames as well
  • conditional_feature= is still not quite agreed. @MiroDudik is going to ask for more feedback around the lab, but it appears there isn't much agreement. We may just have to pick something and stick with it
  • group_min() will stay as it is. We can't think of anything concise which has any less potential for ambiguity
  • Definitely keep the method= argument with that name
  • Have the valid values for method= be between_pairs and to_overall. We felt that minmax was reusing a different term, and that max_to_min for difference would need min_over_max for ratio, which is inconsistent. The new proposal has the form 'preposition-noun' for both. Again, we could be more precise, but at a huge conciseness cost
  • I will restore a default of method=between_pairs to match the current behaviour
  • We talked about sample_params= vs sample_props= but decided to stick with the former

We also discussed make_derived_metric() but have come to realise that that is a lot thornier that we'd appreciated. @adrinjalali highlighted some potential future problems we could be laying for ourselves, when it comes to passing arguments around. The simplest example of this is the method= argument on the returned callable, and how that can conflict with arguments to the base metric (it gets more complicated than that). This is an interaction point with the work @adrinjalali is currently doing in SciKit-Learn around routing arguments through pipelines, so we need to ensure that we're compatible and consistent with that.

Next steps

  • @riedgar-ms will start on the renaming of GroupedMetric to MetricsFrame tomorrow (13th) unless there are objections
  • @riedgar-ms will start writing proper docstrings, for review by all interested parties
  • @riedgar-ms will remove make_derived_metric() from the PR for now

Personally, I would even suggest that make_derived_metric() does not need to be in the v0.5.0 release. Yes, we'd lose some functionality compared to now, but we're almost certain to break existing code anyway. So long as the MetricFrame is present, users can implement their own versions of make_derived_metric() which work in their particular cases (while we work on the general case). We can even provide a sample in the docs, with a note that there are pitfalls.

@riedgar-ms
Copy link
Member Author

Meeting 2020-10-19

Thanks to @adrinjalali , @hildeweerts and @MiroDudik for our meeting today. We were reviewing the current active PR implementing the new metrics API.

  • This PR does not include make_derived_metric() and we agreed to keep it that way
  • We appear to be converging on make_derived_metric() but we agreed not to hold up this PR for that
  • We also agreed that make_derived_metric() is not essential for a v0.5.0 release, although it would be good if we could get it ready in time. We can always do a v0.5.1 release afterwards

For the PR itself the following issues were raised:

  • @MiroDudik is unsure about the default naming of the sensitive and control features if there are none from the user. @MiroDudik is going to experiment, and propose an alternative (to the current S/CF [n] names)
  • @riedgar-ms will add the method= argument to the DP and EO functions
  • @riedgar-ms will fix the documentation of the control_feature argument to the MetricFrame constructor, so that it contains a reminder of what control features actually are. Right now, you need to read the main body of the constructor help text to see
  • @riedgar-ms will add sensitive_levels and control_levels properties to the MetricFrame, which will report the number of sensitive and control features in the data. This to to help users who want to stack and unstack the overall and by_group DataFrames

@hildeweerts raised other issues with the documentation format, pointing that the entirefairlearn.metrics API page is a bit of a mess (for example, it would be better to group properties and methods separately within a class). @hildeweerts pointed to the SciKit-Learn documentation as a possible model. However, we agreed that these criticisms, while absolutely important, were broader than just this PR. In the current PR, we will just make sure that the text which is there is correct.

@MiroDudik and @riedgar-ms will work to resolve the issues raised above, and get the PR merged. @adrinjalali and @hildeweerts will also review, but the merge will not be delayed for that (we can make further tweaks next week).

@riedgar-ms
Copy link
Member Author

Meeting 2020-10-26

Thanks to @hildeweerts @adrinjalali and @MiroDudik for another productive meeting. We discussed the following:

What to do with control_levels when there are none

If the user wants to do their own manipulations of the by_group DataFrame, they are going to be calling Pandas methods such as max(levels=lvls), groupby(levels=lvls) and unstack(levels=lvls), and they are likely to want to set lvls to the value of the control_levels property. Unfortunately, Pandas isn't consistent in how these different methods behave when lvls is None or []. This raises the question of what control_levels should return if none were specified by the user

We decided that control_levels should be None if none were passed. @adrinjalali pointed out that this is safer due to the mutability of lists, and also conceptually consistent with the user not specifying a control_levels argument to the constructor.

Whether to allow strings for level names

This is related to the previous point. The levels= argument in Pandas can usually be a list of level names or indices, but Pandas allows names to be integers, giving a potential for ambiguity. We may revisit this in future, but we shall require our levels to be strings.

How to handle single callables

@MiroDudik pointed out that if a user makes a call such as:

acc = MetricsFrame(skm.accuracy_score, y_true, y_pred, sensitive_features=A)

then having to write things like

acc.overall['accuracy_score']
acc.by_group['accuracy_score']['group_0']
acc.group_max()['accuracy_score']

is tedious for people messing around with notebooks. It would be good to drop the extra redirection. However, @riedgar-ms was concerned that if a more general library is attempting to consume Fairlearn in a general way, then any logic we put in to handle the single function case would then have to be undone by that higher level library (introducing maintenance overhead and places for bugs to hide).

We agreed that if the user passes in a dictionary of callables, then we keep the same behaviour, even if there's only one entry in the dictionary. However, if the user passes in a callable, then DataFrames becomes Series, and Series become scalars, so that users don't have to keep typing [metric_name] to get at their results. @adrinjalali pointed out that this would behave like an overloaded function in languages which supported such things.

The initial implementation of this is likely to be messy (at this point in time, just set a flag and process all output before it's returned to the user). However, @riedgar-ms will provide a full set of tests, so that the internals can be improved in the future.

@riedgar-ms
Copy link
Member Author

riedgar-ms commented Nov 2, 2020

Meeting 2020-11-02

Thank you to @adrinjalali @hildeweerts and @MiroDudik for another productive discussion.

We agreed that the best way forward is to release what we have as v0.5.0 as soon as possible. @hildeweerts and @adrinjalali will take a look from a user perspective, but we won't hold a release except for showstopper bugs.

We also discussed make_derived_metric() now that it's reappearing. We won't hold v0.5.0 for this; it can go out with v0.5.1 a few weeks later. In the call, we adjusted some arguments. @riedgar-ms is going to have named arguments be required, and add documentation. Ways of getting ride of the manipulations of globals() will also be investigated, and reflecting on the user-provided function for a method= argument.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants