Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Minimal Support for Reliability Scores (aka Crombach's alpha) #701

Merged
merged 35 commits into from
Aug 24, 2021
Merged

Added Minimal Support for Reliability Scores (aka Crombach's alpha) #701

merged 35 commits into from
Aug 24, 2021

Conversation

storopoli
Copy link
Contributor

This is a minimal implementation. It is based on a covariance matrix.
I created a struct Reliability to hold the total reliability score and the calculations for each reliability score if a certain item (i.e. column) was dropped from the covariance matrix.

There is a set of test functions and I used the Wikipedia example for a covariance matrix both in the tests and in the docstrinngs.

This is my first PR in a Julia package, so please let me know what I can do improve...

Some worthy mentions:

R's psych::alpha has the following output:

Reliability analysis   
Call: alpha(x = covmatrix)
  raw_alpha std.alpha G6(smc) average_r S/N median_r
      0.78      0.78    0.74      0.47 3.5     0.45
 Reliability if an item is dropped:
     raw_alpha std.alpha G6(smc) average_r S/N   var.r med.r
var1      0.71      0.71    0.62      0.45 2.4 0.00022  0.44
var2      0.72      0.72    0.64      0.46 2.6 0.01126  0.44
var3      0.71      0.71    0.63      0.45 2.4 0.00806  0.43
var4      0.77      0.77    0.69      0.53 3.3 0.00321  0.54
 Item statistics 
          r r.cor r.drop
var1   0.80  0.71   0.62
var2   0.79  0.68   0.60
var3   0.80  0.71   0.62
var4   0.72  0.56   0.50

If we were to include variable names we would have to include some Tables.jl or DataFrames.jl dependency.

Also, I am only calculating the "vanilla" Crombach's alpha. The std.alpha, 6(smc), average_r S/N, var.r and med.r were not implemented.

src/StatsBase.jl Outdated Show resolved Hide resolved
src/reliability.jl Outdated Show resolved Hide resolved
src/reliability.jl Outdated Show resolved Hide resolved
src/reliability.jl Outdated Show resolved Hide resolved
Co-authored-by: Bogumił Kamiński <[email protected]>
src/reliability.jl Outdated Show resolved Hide resolved
src/StatsBase.jl Outdated Show resolved Hide resolved
src/reliability.jl Outdated Show resolved Hide resolved
src/reliability.jl Outdated Show resolved Hide resolved
src/reliability.jl Outdated Show resolved Hide resolved
src/reliability.jl Outdated Show resolved Hide resolved
src/reliability.jl Outdated Show resolved Hide resolved
src/reliability.jl Outdated Show resolved Hide resolved
@bkamins
Copy link
Contributor

bkamins commented Jul 25, 2021

In general maybe it would be also good to add other reliability measures like https://en.wikipedia.org/wiki/Congeneric_reliability.
Also for the Cronbach alpha maybe it should be reported if the data satisfies tau-equivalence?

Co-authored-by: Bogumił Kamiński <[email protected]>
@nalimilan
Copy link
Member

Thanks for the PR. Unfortunately I don't have time to review it right now, but I just wanted to note that the API design is faced with similar challenges as things like pairwise correlation and distances: how to pass variables (matrix, iterator of vectors, and/or Tables.jl object), and whether/how/when to report variable names. See #627 and nalimilan/FreqTables.jl#54.

@storopoli
Copy link
Contributor Author

Great to see that this kind of API is being worked on. I will work on @bkamins suggestions, and make it ready once you are ready to review.

On quick question: Do any of you think we should expect a covariance matrix or should we expect a matrix and do the covariance matrix construction inside the crombach_alpha function?

@bkamins could you point me towards on how to implement a test if the the data satisfies tau-equivalence? I'm not a psychometrics specialist...

@bkamins
Copy link
Contributor

bkamins commented Jul 26, 2021

Do any of you think we should expect a covariance matrix or should we expect a matrix and do the covariance matrix construction inside the crombach_alpha function?

I would assume this function should take a covariance matrix. However, then the challenge is with column names as @nalimilan said, as currently we do not have a set standard AFAICT how to pass them.

how to implement a test if the the data satisfies tau-equivalence

all off-diagonal values should be the same. But maybe - if other software does not do this check we can also skip it and rely on the user to test it.

@storopoli
Copy link
Contributor Author

Ok, I also agree that the user should pass a covariance matrix.

Regarding tau equivalence, R's psych cannot handle that in the alpha function. It is instead something that only SEMs packages deals with. See here

@bkamins
Copy link
Contributor

bkamins commented Jul 26, 2021

It is instead something that only SEMs packages deals with.

That is why I have said above that we might consider adding other reliability scores. But we can leave it for later and skip checking the assumptions of the method here.

@storopoli
Copy link
Contributor Author

Ok, I think all suggestions and reviews were addressed.

One thing we need to decide is to whether use the struct to hold the results and offer a nice print from the Base.show (current implementation) or just return a tuple (with a Dict or vector of Pair) and let the user deal with that.

src/reliability.jl Outdated Show resolved Hide resolved
src/reliability.jl Outdated Show resolved Hide resolved
test/reliability.jl Outdated Show resolved Hide resolved
test/reliability.jl Outdated Show resolved Hide resolved
test/reliability.jl Outdated Show resolved Hide resolved
Copy link
Contributor

@bkamins bkamins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice - it looks good now. I hope you enjoyed it :). Let us wait for other reviewers now.

@storopoli
Copy link
Contributor Author

Yeah I learned a ton, thanks! I had huge respect for you and all of your work. After this PR it increased orders of magnitude!


struct Reliability{T <: Real}
alpha::T
dropped::Vector{Pair{Int, T}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first parameter here should be Any not Int

Suggested change
dropped::Vector{Pair{Int, T}}
dropped::Vector{Pair{Any, T}}

I think.

This would allow passing names to items. The Int we have now is not useful (it is just a position in a vector so it is redundant).

@storopoli - I would wait with working on this on @nalimilan to comment about the API and passing item names here.

@nalimilan - I believe that the computational part in this PR is finished. We just need to decide on the way we handle the item names.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we only store a Vector{T} of values for now. Names can be stored in a separate field if we add support for them later. Anyway it's more practical to store values separately from names than having to access the fields in a pair.

@mschauer
Copy link
Member

Bump

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay!

src/StatsBase.jl Outdated Show resolved Hide resolved
src/reliability.jl Outdated Show resolved Hide resolved
src/reliability.jl Outdated Show resolved Hide resolved
src/reliability.jl Outdated Show resolved Hide resolved
src/reliability.jl Outdated Show resolved Hide resolved
src/reliability.jl Outdated Show resolved Hide resolved

struct Reliability{T <: Real}
alpha::T
dropped::Vector{Pair{Int, T}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we only store a Vector{T} of values for now. Names can be stored in a separate field if we add support for them later. Anyway it's more practical to store values separately from names than having to access the fields in a pair.

src/reliability.jl Outdated Show resolved Hide resolved
test/reliability.jl Show resolved Hide resolved
test/reliability.jl Outdated Show resolved Hide resolved
@storopoli
Copy link
Contributor Author

Thanks @nalimilan I've implemented the desired tests, converted the Vector of Pairs to a simple Vector, changed the function naming to crombachalpha and changed the round printing stuff to Printf.

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Just a few more comments.

Can you also add a reference to the function in the docs so that it's included in the manual? Maybe a new section under Scalar Statistics?

src/StatsBase.jl Outdated Show resolved Hide resolved
src/reliability.jl Outdated Show resolved Hide resolved
src/reliability.jl Outdated Show resolved Hide resolved
test/reliability.jl Outdated Show resolved Hide resolved
test/reliability.jl Outdated Show resolved Hide resolved
test/reliability.jl Outdated Show resolved Hide resolved
src/StatsBase.jl Outdated Show resolved Hide resolved
src/reliability.jl Outdated Show resolved Hide resolved
```
"""
function crombach_alpha(covmatrix::AbstractMatrix{<:Real})
isposdef(covmatrix) || throw(ArgumentError("Covariance matrix is not positive definite!"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about my second comment?

test/reliability.jl Show resolved Hide resolved
@storopoli
Copy link
Contributor Author

Ready for a review. See if the docs implementation is enough (never done it before, but followed from the examples in code)

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See if the docs implementation is enough (never done it before, but followed from the examples in code)

Thanks, looks good.

src/reliability.jl Outdated Show resolved Hide resolved
test/reliability.jl Outdated Show resolved Hide resolved
test/reliability.jl Outdated Show resolved Hide resolved
src/reliability.jl Outdated Show resolved Hide resolved
src/reliability.jl Outdated Show resolved Hide resolved
@storopoli
Copy link
Contributor Author

@nalimilan see if now is fine. I've replaced all "reliability" to "Cronbach's alpha"

@bkamins
Copy link
Contributor

bkamins commented Aug 23, 2021

Looks good for me.

@bkamins
Copy link
Contributor

bkamins commented Aug 23, 2021

(I am OOO so I have not checked if the documentation renders properly)

src/StatsBase.jl Outdated Show resolved Hide resolved
test/reliability.jl Outdated Show resolved Hide resolved
test/reliability.jl Outdated Show resolved Hide resolved
src/reliability.jl Outdated Show resolved Hide resolved
test/reliability.jl Outdated Show resolved Hide resolved
@nalimilan
Copy link
Member

Thanks!

@nalimilan nalimilan merged commit 7fcea24 into JuliaStats:master Aug 24, 2021
@storopoli
Copy link
Contributor Author

Great! My Pleasure!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants