-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add proptable #19
add proptable #19
Conversation
Thanks. Looks good, I just wonder about the name |
An attempt for positional argument. Loos better? |
Clever! It's tempting, but it's also very non-standard so I'm really hesitant. Can you have a look at what similar functions in Base or in packages do in similar cases? |
4eda4f9
to
7c63df4
Compare
In all cases those are positional arguments. I have not found the use of Finally the option is to define
then the definition is even simpler:
This is the way R works. Actually, when I wanted |
Yeah, implementation-wise it's simpler to require It's still tempting to add a direct way to compute a table of proportions, either via a dedicated function or via a keyword argument. Or maybe we should wait until we add a more general convenience function, like Stata's |
The way to go probably depends on the way you want FreqTables to be developed (actually if you would outline what else you would see in this package I could contribute - I work with such type of analysis a lot). For me things like Given the above - if we named the function |
The scope of the package isn't completely defined. Originally I hoped that the functions would be included in StatsBase once we agreed on a standard for named arrays/axis arrays, but that hasn't happened (yet?). Anyway I wouldn't say that these would be too heavy: by what definition are they "heavy" for a package dedicated to frequency tables? :-) As for tests, I'd rather put them in HypothesisTests, at least if they can support
Let's go with |
I have commited As for other things - I put |
README.md
Outdated
@@ -44,6 +53,15 @@ b │ 3 2 | |||
c │ 2 3 | |||
d │ 2 3 | |||
|
|||
julia> prop(tbl2, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This behavior (inherited from sum
) is the opposite of R's: the passed dimensions are those to collapse, while in R they are ones to retain. I wonder whether it's appropriate for computing proportions: it's probably more natural to think "I want to compute proportion by rows" than "I want to divide each row by the column sums", isn't it? After all, we say "row profiles/percents" and "column profiles/percents".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about it, initially I wanted to be consistent with sum
.
But let us call the argument margin
then and make it work like in R.
src/freqtable.jl
Outdated
@@ -131,3 +131,6 @@ function freqtable(d::AbstractDataFrame, x::Symbol...; args...) | |||
setdimnames!(a, x) | |||
a | |||
end | |||
|
|||
prop(tbl::AbstractArray{<:Number}) = tbl / sum(tbl) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a docstring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to add a docstring for freqtable
and prop
in a separate PR when we have settled the implementation.
src/freqtable.jl
Outdated
prop(tbl::AbstractArray{<:Number}) = tbl / sum(tbl) | ||
prop(tbl::AbstractArray{<:Number}, dims) = tbl ./ sum(tbl, dims) | ||
|
||
function prop(tbl::AbstractArray{<:Number,N}, margin) where N |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use margin::Integer...
? Also, what's minimum
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Integer...
- good idea, changing
minimum
- removed, I wanted to check for negative entries but decided that it is not needed
src/freqtable.jl
Outdated
throw(ArgumentError("invalid margin argument; expected integers in a valid dimension range")) | ||
end | ||
minimum | ||
tbl ./ sum(tbl, tuple(setdiff(1:N, margin)...)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that inferrable? It would be nice to add @inferred
to tests. If not, I wonder whether we could figure out a solution based on dispatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is; added check to tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
src/freqtable.jl
Outdated
""" | ||
prop(tbl, [margin]) | ||
|
||
Create table of proportions from a frequency table `tbl` with margins generated for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily a frequency table: any numeric table will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
src/freqtable.jl
Outdated
**Examples** | ||
|
||
```julia | ||
prop([1 2; 3 4]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better use jldoctest
and put the full REPL output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
src/freqtable.jl
Outdated
prop([1 2; 3 4], (1,2)) | ||
tbl = prop(rand(5, 5, 5, 5), (1, 2)) | ||
sumtbl = sum(tbl, (3,4)) | ||
all(x -> x ≈ 1.0, sumtbl) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to show that, just print the output of sum
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK given the change above
src/freqtable.jl
Outdated
Calculating `sum` over the result of `prop` over dimensions that are complement of `margin` | ||
produces `AbstractArray` containing only approximately `1.0`, see last example below. | ||
|
||
**Arguments** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring recommendations in the manual say that we should avoid using a separate section for arguments then possible. Here, it would be clearer to mention the expected types of arguments in the signature, and explain what they do directly in the description.
It would also be be nice to be less technical and explain in very simple terms how to compute row and column proportions. No need to mention "approximately" nor duplicated entries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
although margin
allows to many types to specify it in the signature. I left it in the description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry - given your comment below I can put it into the signature :)
1 similar comment
test/freqtable.jl
Outdated
@@ -26,7 +59,10 @@ tab =freqtable(x, y, | |||
3.0 2.0 | |||
1.5 1.0] | |||
@test names(tab) == [["a", "b", "c", "d"], ["C", "D"]] | |||
|
|||
@test prop(tab) == [4 6; 2 3; 6 4; 3 2] / 30.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't hurt to have @inferred
here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/freqtable.jl
Outdated
1.0 1.0 | ||
1.0 1.0 | ||
|
||
julia> tbl = prop(rand(2, 2, 2, 2), 1, 2); sum(tbl, (3,4)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you show this with a simple 3D array, e.g. reshape(1:12, (2, 2, 3))
, and print the result of prop
first? I think that's very helpful to understand what this does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
0.075 0.05 0.05 0.075; | ||
0.05 0.075 0.075 0.05; | ||
0.05 0.075 0.075 0.05] | ||
pt = @inferred prop(tab, 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this test fails. Maybe inference doesn't work after all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I tested it on Vector
and it works. It fails NamedArray
- I will investigate what can be done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, weird. Looks like the problem might be in sum(a::NamedArray, dim)
:
julia> @code_warntype sum(NamedArray([1 2; 3 4]), 1)
Variables:
#self# <optimized out>
a::NamedArrays.NamedArray{Int64,2,Array{Int64,2},Tuple{DataStructures.OrderedDict{String,Int64},DataStructures.OrderedDict{String,Int64}}}
d::Int64
Body:
begin
return $(Expr(:invoke, MethodInstance for sum(::NamedArrays.NamedArray{Int64,2,Array{Int64,2},Tuple{DataStructures.OrderedDict{String,Int64},DataStructures.OrderedDict{String,Int64}}}, ::Tuple{Int64}), :(NamedArrays.sum), :(a), :((Core.tuple)(d)::Tuple{Int64})))
end::Any
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is - I have patched it here, but see the comment below.
In order to ensure type stability I had to use internals of
|
OK. Do you think there would be a way to make these constructors type-stable in NamedArrays itself? That would be useful in general. |
If it does not get fixed earlier I put it to my TODO, but I have some other things I would like to patch before :). Anyway - even if |
BTW: do you know why in CI |
Let's go with that, a two-line function isn't too bad as a temporary solution. But we really need to fix these constructors, as performance suffers a lot.
0.7 is going to require lots of small changes. |
I've just tagged a release: JuliaLang/METADATA.jl#12595 |
following #8.