Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add proptable #19

Merged
merged 7 commits into from
Dec 22, 2017
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ b │ 3 2
c │ 2 3
d │ 2 3

julia> prop(tbl2, 1)
julia> prop(tbl2, 2)
4×2 Named Array{Float64,2}
Dim1 ╲ Dim2 │ A B
────────────┼─────────
Expand Down
43 changes: 42 additions & 1 deletion src/freqtable.jl
Original file line number Diff line number Diff line change
Expand Up @@ -132,5 +132,46 @@ function freqtable(d::AbstractDataFrame, x::Symbol...; args...)
a
end

"""
prop(tbl, [margin])

Create table of proportions from a frequency table `tbl` with margins generated for
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily a frequency table: any numeric table will do.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

dimensions specified by `margin`. Does not check if `tbl` contains non-negative frequencies.
Calculating `sum` over the result of `prop` over dimensions that are complement of `margin`
produces `AbstractArray` containing only approximately `1.0`, see last example below.

**Arguments**
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring recommendations in the manual say that we should avoid using a separate section for arguments then possible. Here, it would be clearer to mention the expected types of arguments in the signature, and explain what they do directly in the description.

It would also be be nice to be less technical and explain in very simple terms how to compute row and column proportions. No need to mention "approximately" nor duplicated entries.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

although margin allows to many types to specify it in the signature. I left it in the description.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry - given your comment below I can put it into the signature :)


* `tbl` : `AbstractArray{<:Number}` containing frequency table
* `margin` : `Integer` or collection of `Integer`s of dimensions to generate proportions by;
duplicated entries are ignored, e.g. [1, 1, 2] is the same as `[1, 2]`;
if omitted proportions over the whole `tbl` are computed

**Result**

* `::AbstractArray` : array containing calculated proportions

**Examples**

```julia
prop([1 2; 3 4])
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better use jldoctest and put the full REPL output.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

prop([1 2; 3 4], ())
prop([1 2; 3 4], 1)
prop([1 2; 3 4], [1])
prop([1 2; 3 4], (1,2))
tbl = prop(rand(5, 5, 5, 5), (1, 2))
sumtbl = sum(tbl, (3,4))
all(x -> x ≈ 1.0, sumtbl)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to show that, just print the output of sum.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK given the change above

```

"""

prop(tbl::AbstractArray{<:Number}) = tbl / sum(tbl)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a docstring?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to add a docstring for freqtable and prop in a separate PR when we have settled the implementation.

prop(tbl::AbstractArray{<:Number}, dims) = tbl ./ sum(tbl, dims)

function prop(tbl::AbstractArray{<:Number,N}, margin) where N
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use margin::Integer...? Also, what's minimum?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integer... - good idea, changing
minimum - removed, I wanted to check for negative entries but decided that it is not needed

if !all(x -> isa(x, Integer) && (1 <= x <= N), margin)
throw(ArgumentError("invalid margin argument; expected integers in a valid dimension range"))
end
minimum
tbl ./ sum(tbl, tuple(setdiff(1:N, margin)...))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that inferrable? It would be nice to add @inferred to tests. If not, I wonder whether we could figure out a solution based on dispatch.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is; added check to tests

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

end
39 changes: 22 additions & 17 deletions test/freqtable.jl
Original file line number Diff line number Diff line change
Expand Up @@ -22,27 +22,32 @@ tab = @inferred freqtable(x, y)
0.075 0.05 0.05 0.075;
0.05 0.075 0.075 0.05;
0.05 0.075 0.075 0.05]
@test prop(tab, (1,2)) == [0.075 0.05 0.05 0.075;
0.075 0.05 0.05 0.075;
0.05 0.075 0.075 0.05;
0.05 0.075 0.075 0.05]
@test prop(tab, 1) == [0.3 0.2 0.2 0.3;
@test prop(tab, ()) == [0.075 0.05 0.05 0.075;
0.075 0.05 0.05 0.075;
0.05 0.075 0.075 0.05;
0.05 0.075 0.075 0.05]
@test prop(tab, 2) == [0.3 0.2 0.2 0.3;
0.3 0.2 0.2 0.3;
0.2 0.3 0.3 0.2;
0.2 0.3 0.3 0.2]
@test prop(tab, 2) == [0.3 0.2 0.2 0.3;
@test prop(tab, 1) == [0.3 0.2 0.2 0.3;
0.3 0.2 0.2 0.3;
0.2 0.3 0.3 0.2;
0.2 0.3 0.3 0.2]
@test prop(tab, ()) == [1.0 1.0 1.0 1.0;
1.0 1.0 1.0 1.0;
1.0 1.0 1.0 1.0;
1.0 1.0 1.0 1.0]
@test prop(tab, (1, 2)) == [1.0 1.0 1.0 1.0;
1.0 1.0 1.0 1.0;
1.0 1.0 1.0 1.0;
1.0 1.0 1.0 1.0]

tbl = prop(rand(5, 5, 5, 5), (1, 2))
sumtbl = sum(tbl, (3,4))
@test all(x -> x ≈ 1.0, sumtbl)

@test_throws MethodError prop()
@test_throws MethodError prop([1,2,3], ("a","b"))
@test_throws MethodError prop(("a","b"))
@test_throws MethodError prop((1, 2))
@test_throws ArgumentError prop([1,2,3], ("a","b"))
@test_throws ArgumentError prop([1,2,3], 2)

tab =freqtable(x, y,
subset=1:20,
Expand All @@ -53,12 +58,12 @@ tab =freqtable(x, y,
1.5 1.0]
@test names(tab) == [["a", "b", "c", "d"], ["C", "D"]]
@test prop(tab) == [4 6; 2 3; 6 4; 3 2] / 30.0
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't hurt to have @inferred here too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@test prop(tab, 1) == [8 12; 4 6; 12 8; 6 4] / 30.0
@test prop(tab, (1,)) == [8 12; 4 6; 12 8; 6 4] / 30.0
@test prop(tab, 2) == [6 9; 6 9; 9 6; 9 6] / 15.0
@test prop(tab, (2,)) == [6 9; 6 9; 9 6; 9 6] / 15.0
@test prop(tab, ()) == [1.0 1.0; 1.0 1.0; 1.0 1.0; 1.0 1.0]
@test prop(tab, (1, 2)) == [4 6; 2 3; 6 4; 3 2] / 30.0
@test prop(tab, 2) == [8 12; 4 6; 12 8; 6 4] / 30.0
@test prop(tab, (2,)) == [8 12; 4 6; 12 8; 6 4] / 30.0
@test prop(tab, 1) == [6 9; 6 9; 9 6; 9 6] / 15.0
@test prop(tab, (1,)) == [6 9; 6 9; 9 6; 9 6] / 15.0
@test prop(tab, (1, 2)) == [1.0 1.0; 1.0 1.0; 1.0 1.0; 1.0 1.0]
@test prop(tab, ()) == [4 6; 2 3; 6 4; 3 2] / 30.0

using CategoricalArrays
cx = CategoricalArray(x)
Expand Down