Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a function for extracting a test statistic #24

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ararslan
Copy link
Member

This is added to StatsAPI rather than to HypothesisTests since StatsAPI also houses HypothesisTest, pvalue, and other relevant functionality.

Defining this function will bring a resolution to the 7-year-old issue JuliaStats/HypothesisTests.jl#79, which has received a number of duplicates over the years, suggesting that it would be of general interest.

I considered the name statistic, and still rather prefer that (teststatistic has too many s's and t's 😩), but wasn't sure whether it was insufficiently descriptive. I'd be interested in hearing thoughts both on naming and on whether this should be required for HypothesisTest as pvalue is or whether it should be optional (what I have currently).

This is added to StatsAPI rather than to HypothesisTests since StatsAPI
also houses `HypothesisTest`, `pvalue`, and other relevant
functionality.

Defining this function will bring a resolution to the 7-year-old issue
JuliaStats/HypothesisTests.jl#79, which has
received a number of duplicates over the years, suggesting that it would
be of general interest.
@ararslan ararslan requested a review from nalimilan July 29, 2023 19:13
@codecov-commenter
Copy link

codecov-commenter commented Jul 29, 2023

Codecov Report

Patch and project coverage have no change.

Comparison is base (64d7d28) 100.00% compared to head (27e6dc6) 100.00%.

Additional details and impacted files
@@            Coverage Diff            @@
##              main       #24   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            3         3           
  Lines           37        37           
=========================================
  Hits            37        37           
Files Changed Coverage Δ
src/StatsAPI.jl 100.00% <ø> (ø)

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ararslan
Copy link
Member Author

@nalimilan, did you have thoughts on the function name? I find statistic appealing. Looks like @babaq would prefer teststat (at least over teststatistic), which seems okay I guess but I generally like to avoid shortening words when possible.

@nalimilan
Copy link
Member

I'm afraid statistic is too general so I prefer teststatistic or teststat. I'm not sure whether it's best to abbreviate or not, as we are quite inconsistent in that regard (e.g. confint vs. loglikelihood).

@devmotion
Copy link
Member

I like teststatistic most, I think. I agree that statistic seems a bit too general, and I prefer not shortening the function name (but I'd be fine with teststat as well).

@palday
Copy link
Member

palday commented Jul 31, 2023

statistic feels like it’s asking for a name collision and teststatistic is long and awkward. My ranking: teststat > statistic >> teststatistic

@ararslan
Copy link
Member Author

statistic feels like it’s asking for a name collision

Only two registered packages define a function called statistic: Bootstrap and Hecke. Bootstrap's definition could/should extend this one from StatsAPI. Hecke defines statistic but doesn't use, document, or export it; it seems to be dead code.

MixedAnova, AnovaBase, and WildBootTests all define a teststat function.

Amusingly, HypothesisTests defines teststatistic, which I didn't realize. It's only defined for VarianceEqualityTest and isn't documented nor exported though.

@ararslan
Copy link
Member Author

So actually, wouldn't the generality of statistic be a reasonable thing for an interface function? After all, the whole point of StatsAPI is for packages to share names. 😄 The meaning becomes unambiguous in the context of the type of the input.

@palday
Copy link
Member

palday commented Jul 31, 2023

After a little bit of thinking, statistic actually sounds nice because it could also be used in other, non-testing contexts.

@nalimilan
Copy link
Member

People often complain that we abuse generic functions by overloading them with methods which actually have little of nothing in common, so I though using a more specific name like teststat would be more appropriate.

Maybe ping the authors of the packages you mentioned to get their opinion? We definitely want all packages to use the same function so we need some of them to agree switching to the new function.

@ararslan
Copy link
Member Author

ararslan commented Aug 4, 2023

People often complain that we abuse generic functions by overloading them with methods which actually have little of nothing in common

I was not aware of this. What are other examples?

@ararslan
Copy link
Member Author

ararslan commented Aug 4, 2023

Maybe ping the authors of the packages you mentioned to get their opinion? We definitely want all packages to use the same function so we need some of them to agree switching to the new function.

I think present company have HypothesisTests covered and I don't think this is relevant for Hecke, but @yufongpeng for AnovaBase/MixedAnova, @droodman for WildBootTests, and @juliangehring for Bootstrap: hello! Thanks for contributing to the Julia ecosystem. We're thinking of introducing a function in StatsAPI which, if named generically, could be useful to your respective packages. For AnovaBase, MixedAnova, and WildBootTests, it would correspond to the function @yufongpeng and @droodman have called teststat. For Bootstrap, it would correspond to what @juliangehring has called statistic. Since your respective packages all have StatsAPI as a transitive dependency already (by way of Distributions for WildBootTests and StatsBase for the others), extending the function defined here would not require taking on additional dependencies that wouldn't already need to be loaded. If you would be interested in integrating in this way, we would love your input here! What would be your preferred name for this function? Current contenders are:

  • statistic
  • teststatistic
  • teststat

This was originally motivated by the need for a generic accessor function to extract the value of a test statistic from a HypothesisTest object but its scope does not need to be limited to that.

@yufongpeng
Copy link

Since this function is for test statistics, I prefer a more specific name.
teststat or teststatistic is good, but statistic is too general.

@droodman
Copy link

droodman commented Aug 5, 2023 via email

@ararslan
Copy link
Member Author

ararslan commented Aug 6, 2023

Thank you @yufongpeng and @droodman for your input!

Since this function is for test statistics

It isn't necessarily, that was just initial use case that prompted this discussion. Another notable example is Bootstrap, which defines a statistic function that returns the function used to compute the statistic (which needn't be a test statistic) on each sample. For example, statistic(bootstrap(mean, randn(20), BasicSampling(100))) == mean. Bootstrap could extend statistic from StatsAPI but it probably wouldn't make sense to extend teststatistic/teststat as that's insufficiently general.

@nalimilan
Copy link
Member

I was not aware of this. What are other examples?

Probably the most problematic function is fit, for which we don't even document possible arguments. But most other functions in StatsAPI have a well defined signature so that's really an exception.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants