-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add proptable #19
add proptable #19
Changes from 3 commits
961525e
6268227
f65d113
a9070ae
4e715b6
495ce57
a660551
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,5 +5,5 @@ module FreqTables | |
|
||
include("freqtable.jl") | ||
|
||
export freqtable | ||
export freqtable, prop | ||
end # module |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -131,3 +131,6 @@ function freqtable(d::AbstractDataFrame, x::Symbol...; args...) | |
setdimnames!(a, x) | ||
a | ||
end | ||
|
||
prop(tbl::AbstractArray{<:Number}) = tbl / sum(tbl) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you add a docstring? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wanted to add a docstring for |
||
prop(tbl::AbstractArray{<:Number}, dims) = tbl ./ sum(tbl, dims) |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,6 +8,7 @@ y = repeat(["D", "C", "A", "B"], inner=[10], outer=[10]); | |
tab = @inferred freqtable(x) | ||
@test tab == [100, 100, 100, 100] | ||
@test names(tab) == [["a", "b", "c", "d"]] | ||
@test prop(tab) == [0.25, 0.25, 0.25, 0.25] | ||
tab = @inferred freqtable(y) | ||
@test tab == [100, 100, 100, 100] | ||
@test names(tab) == [["A", "B", "C", "D"]] | ||
|
@@ -17,6 +18,31 @@ tab = @inferred freqtable(x, y) | |
20 30 30 20; | ||
20 30 30 20] | ||
@test names(tab) == [["a", "b", "c", "d"], ["A", "B", "C", "D"]] | ||
@test prop(tab) == [0.075 0.05 0.05 0.075; | ||
0.075 0.05 0.05 0.075; | ||
0.05 0.075 0.075 0.05; | ||
0.05 0.075 0.075 0.05] | ||
@test prop(tab, (1,2)) == [0.075 0.05 0.05 0.075; | ||
0.075 0.05 0.05 0.075; | ||
0.05 0.075 0.075 0.05; | ||
0.05 0.075 0.075 0.05] | ||
@test prop(tab, 1) == [0.3 0.2 0.2 0.3; | ||
0.3 0.2 0.2 0.3; | ||
0.2 0.3 0.3 0.2; | ||
0.2 0.3 0.3 0.2] | ||
@test prop(tab, 2) == [0.3 0.2 0.2 0.3; | ||
0.3 0.2 0.2 0.3; | ||
0.2 0.3 0.3 0.2; | ||
0.2 0.3 0.3 0.2] | ||
@test prop(tab, ()) == [1.0 1.0 1.0 1.0; | ||
1.0 1.0 1.0 1.0; | ||
1.0 1.0 1.0 1.0; | ||
1.0 1.0 1.0 1.0] | ||
|
||
@test_throws MethodError prop() | ||
@test_throws MethodError prop([1,2,3], ("a","b")) | ||
@test_throws MethodError prop(("a","b")) | ||
@test_throws MethodError prop((1, 2)) | ||
|
||
tab =freqtable(x, y, | ||
subset=1:20, | ||
|
@@ -26,7 +52,13 @@ tab =freqtable(x, y, | |
3.0 2.0 | ||
1.5 1.0] | ||
@test names(tab) == [["a", "b", "c", "d"], ["C", "D"]] | ||
|
||
@test prop(tab) == [4 6; 2 3; 6 4; 3 2] / 30.0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wouldn't hurt to have There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
@test prop(tab, 1) == [8 12; 4 6; 12 8; 6 4] / 30.0 | ||
@test prop(tab, (1,)) == [8 12; 4 6; 12 8; 6 4] / 30.0 | ||
@test prop(tab, 2) == [6 9; 6 9; 9 6; 9 6] / 15.0 | ||
@test prop(tab, (2,)) == [6 9; 6 9; 9 6; 9 6] / 15.0 | ||
@test prop(tab, ()) == [1.0 1.0; 1.0 1.0; 1.0 1.0; 1.0 1.0] | ||
@test prop(tab, (1, 2)) == [4 6; 2 3; 6 4; 3 2] / 30.0 | ||
|
||
using CategoricalArrays | ||
cx = CategoricalArray(x) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This behavior (inherited from
sum
) is the opposite of R's: the passed dimensions are those to collapse, while in R they are ones to retain. I wonder whether it's appropriate for computing proportions: it's probably more natural to think "I want to compute proportion by rows" than "I want to divide each row by the column sums", isn't it? After all, we say "row profiles/percents" and "column profiles/percents".There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about it, initially I wanted to be consistent with
sum
.But let us call the argument
margin
then and make it work like in R.