-
Notifications
You must be signed in to change notification settings - Fork 35
Move CategoricalValue and CategoricalPool into separate package #64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm really not a fan of this idea. The two parts are really tightly coupled, it would make further development more cumbersome to keep them synchronized. What's the problem with depending on CategoricalArrays? It's a small pure-Julia dependency. I don't think you'd notice the size difference if we split the package in two parts. |
My eventual plan for In some sense this is analog to the |
The difference with I don't really understand why PooledArrays or DataArrays would have to depend on |
And that pulls in
Actually, it would be If the sink for |
If that's the problem, then it can easily be fixed. Actually, we've discussed getting rid of this dependency in the past, as it's only used for the return type of
I think we need a more general abstraction for |
Are you referring to the pointer to the pool? I'll have to mull this whole area a bit more, it is not really clear to me what we want in this area... Especially when it comes to queries, it seems there is another lifting-like issue there, i.e. what to return if one applies some function onto a categorical value etc. |
Yes, but more generally it will be much more efficient to work directly with integer codes than creating a
Yeah, it's not very easy to decide what to do with operations on |
Hm, I would have thought that the compiler could optimize that away, but you probably have better insight into that than me. For the whole Here is a crazy idea: could the levels be encoded as a type parameter? Something like this: immutable CategoricalValue{LEVELS}
index::Int
end
getvalue{LEVELS}(a::CategoricalValue{LEVELS}) = string(LEVELS.parameters[a.index].parameters[1])
a = CategoricalValue{Tuple{Val{:level1},Val{:level2}}}(2)
println(getvalue(a)) It does feel like a gross abuse of the type system and I could easily imagine that this is a really terrible idea, but maybe worth investigating a bit?
I like that a lot. |
I think the way forward here is to start moving to
This would be essentially equivalent to creating an enum for each variable and storing an array of enum values. This can be efficient for some very specific use cases, but in general I don't think this would be a good idea as recompiling functions for each set of levels would be wasteful. This has been discussed before at JuliaStats/DataArrays.jl#50 and JuliaStats/DataArrays.jl#73. |
This seems a bit moot since the small unions era. |
I'm starting to think about queryverse/IterableTables.jl#2, and the whole design would be a lot easier if packages could take a dependency on
CategoricalValue
without taking a dependency on the wholeCategoricalArrays
package. Maybe a package calledCategoricalValues.jl
would work?The text was updated successfully, but these errors were encountered: