Skip to content

support for passing custom types? #138

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mkborregaard opened this issue Aug 18, 2016 · 39 comments · Fixed by #192
Closed

support for passing custom types? #138

mkborregaard opened this issue Aug 18, 2016 · 39 comments · Fixed by #192

Comments

@mkborregaard
Copy link

mkborregaard commented Aug 18, 2016

Hi,
RCall offers a great many standard conversions - julia arrays to R arrays, R lists to julia Dicts etc. But there are also numerous more complex types defined within R and julia packages. Could RCall support an API for implicitly converting between R and julia objects by allowing users to specify a conversion function?

An example: both languages have (several) representations for defining phylogenies: BioJulia has e.g. the Phylo submodule, and the ape R package has the phylo type. It is possible to use RCall code to split the julia Phylogeny type into components (arrays and such), pass them to R, then construct an ape phylo object. It is thus easy to define a conversion function.

The ideal method would be to be able to define such a method using an RCall standard API, putting them either in the relevant julia package, in RCall or an RCall_typeextensions package, and then after defining this method being able to use the conversion implicitly, e.g.

R"plot($phy)"
@simonbyrne
Copy link
Member

That's a good idea. We should be able to do this using dispatch somehow.

@simonbyrne
Copy link
Member

I should say that though it's not documented, this is already possible in the Julia -> R direction by overloading RCall.sexp: just define a new method that takes your Julia type and gives the appropriate Ptr to an R object back. Unfortunately this requires keeping track of R garbage collection references, so perhaps isn't the best way in the long term, but I'd be happy to help out if you want to have a go at doing it.

The other direction is more complicated. At the moment we only dispatch on the R core "SEXPTYPE"s, but we could change this to dispatch on the class attribute. We would also have to figure out how to incorporate S4 classes and Reference Classes (though these seem to be less common in the wild).

@mkborregaard
Copy link
Author

I think that might be worth giving a go, but I don't think I am up to the task of dealing with R's garbage collector references.
One thing to keep in mind is how to avoid that all packages that implement some useful type will have to depend on Rcall. One way of doing this might be for RCall to define methods that take an appropriate julia dict with all the subobjects of an R type, and then builds the R type in R. Then all packages that want to implement this functionality just need to define a function to create the specified julia dict.

@simonbyrne
Copy link
Member

One solution would be an "RCallRecipes" approach, being a lightweight repo which gets included in your package as a dependency, but doesn't require loading or installing RCall.

@mkborregaard
Copy link
Author

I like that idea a lot.

@nalimilan
Copy link
Member

That, together with a system similar to FileIO.jl, would be very useful to automatically choose what type to use when getting a special class from R.

@simonbyrne
Copy link
Member

I have been thinking about this a little, but still don't have a concrete way forward yet.

As a reference (partly for myself) here are a couple of pages on how R's S3 objects work:

The main challenge with the R -> Julia direction is that R S3 objects can have multiple classes, set via the class attribute:

class(x) <- c("aa", "bb", "cc")

(Complicating matters is that there are some "fallback" classes for when no "class" attribute is defined, e.g. "numeric" in the case of a RealSxp, and "matrix" if the dim attribute is set. Unfortunately, the exact way this is computed is via the R_data_class function, and I'm not really sure that this is really part of the C API.)

R's generic methods then check if foo.class exists, falling back on foo.default. So now the question is how we could implement something similar in Julia for rcopy.

One way is to add an extra "type" argument to rcopy, say RClass:

rcopy(::Type{Matrix{Float64}}, ::RClass{:matrix}, x::RObject{RealSxp}) = ...
rcopy(::RClass{:matrix}, x::RObject{RealSxp}) = ...

and then define the generic rcopy as:

function rcopy{S}(x::RObject{S})
    for class in getclasses(x)
        classsym = symbol(class)}
        if method_exists(rcopy, Tuple{RClass{classsym}, RObject{S}})
            return rcopy(RClass{classsym}(), x)
        end
    end
    return rcopy(RClass{:default}(), x)
end

@dmbates
Copy link
Member

dmbates commented Dec 5, 2016

I understand your point, @simonbyrne, but the example is not a good one because "matrix" isn't a class in R. The Matrix package defines S4 classes for various types of matrices and a Matrix class which would correspond to an AbstractArray{T,2}, but base R and S3 classes only know about matrices through the dims attribute. This sort-of makes sense in R because all data objects are vectors so a matrix or higher dimensional array is just a matter or indexing into a vector.

R was created to be "not unlike S" and these decisions go back to the original S formulation.

@randy3k
Copy link
Member

randy3k commented Dec 5, 2016

Is it really necessary to generalize the rcopy function for arbitrary classes? I guess we could keep rcopy as low level as possible. A user with particular application could tailor-make a function to convert an RObject to their own Julia type using the rcopy functions, e.g.,

function convert(::Type{Foo}, x::RObject)
      # using rcopy function to access x
end

@simonbyrne
Copy link
Member

@dmbates: while looking into it I realised that matrix is kind of special (see #159), but the idea is to mimic S3 dispatch. I don't really understand S4 classes all that well, I need to do some more reading to figure out how that would work.

@randy3k Fair enough, but I would like to be able to get rid of the DataFrames dependency and this provides a path to do that.

@randy3k
Copy link
Member

randy3k commented Dec 5, 2016

I like the idea to get rid of the dependency of DataFrames and all related packages. How about only keeping rcopy to Array{T, 2} dispatches and move all the other dispatches to RCallRecipes. In RCallRecipes, we could define all other conversions

function convert(::Type{DataFrame}, x::RObject)
      ### ...
end

@dmbates
Copy link
Member

dmbates commented Dec 5, 2016

@simonbyrne Anyone familiar with Julia's multiple dispatch and type system will recognize that S4 classes and methods are an elegant idea. However, the implementation is not quite as elegant. The isS4 function checks for a flag in the first 32-bit word of the object. If the object is S4 then the slots are part of the attributes cons list.

@dmbates
Copy link
Member

dmbates commented Dec 5, 2016

@randy3k I think you need to rcopy to an Array{T, 1} don't you? The columns in a DataFrame can be heterogeneous and don't always fit into an Array{T, 2}.

Data frames are pretty fundamental to R and I think they warrant special consideration in RCall. An alternative may be to use the DataStreams formulation.

@nalimilan
Copy link
Member

IIUC this would allow getting rid of the DataFrames dependency only by moving the code and the dependency to DataFrames. It sound much more reasonable for RCall to depend on DataFrames than the other way around... That would be even worse for NullableArrays. What we really need is optional dependencies.

@dmbates
Copy link
Member

dmbates commented Dec 5, 2016

@nalimilan, @quinnj Would DataStreams be a lighter weight dependency?

As I said earlier, I think that data frames are sufficiently fundamental to R that they should be given special consideration in RCall. However, it may be better to go through an intermediate representation of a data source as long as categories and missing data values are retrievable.

@quinnj
Copy link

quinnj commented Dec 5, 2016

Seeing how DataStreams currently depends on DataFrames, no, it would not be a more light-weight dependency :)

That said, the DataFrames code in DataStreams actually belongs in DataFrames, and DataStreams should really be it's own standalone package with no dependencies. It's on my todo list to get all hte code moved out.

@randy3k
Copy link
Member

randy3k commented Dec 5, 2016

@dmbates You are right! It should be Array{T, 1}. I meant to keep the rcopy(::Type{Array{T, N}}, x) and rcopy(::Type{Dict}, x) functions.

My logic was to keep rcopy as low level as possible so that it only deals with SEXP objects directly. For higher level operations such as handling R classes data.frame or in general foo, there should be a different function, say convert. I agree that data frames are fundamental to R, so we should provide the function convert(::Type{DataFrame}, x::RObject), or the constructor
DataFrame(x::RObject).

EDIT
For completeness, I guess we should also define all the default constructors, say Int(), String() and Array(), in this way, users does not need to use rcopy function directly.

@randy3k
Copy link
Member

randy3k commented Dec 5, 2016

This is what I am thinking:

  • remove rcopy(::Type{DataFrame}, x) and sexp(x::DataFrame), same for NullableArray, CategoricalArray and NullableCategoricalArray.
  • Add the default constructors: Int(x::RObject), Array(x:RObject) etc.
  • define data frame related consturctors: DataFrame(x::RObject), NullableArray(x::RObject) etc

So now all the conversions are explicit. To allow implicit conversion, we can then define a version of rcopy along the line of @simonbyrne.

@simonbyrne
Copy link
Member

I think that seems reasonable, though should we define them as convert methods instead (which are called implicitly by constructors)?

@nalimilan My idea was that we would have a lightweight package, called say RCallRecipes which would basically just define RObject (and I guess the various Sxp types). Other packages, (e.g. DataFrames) would then depend on it, and define the appropriate conversion methods.

@randy3k
Copy link
Member

randy3k commented Dec 5, 2016

My bad, I just could not recall if constructor calls convert or the other way.

@mkborregaard
Copy link
Author

Hey thanks for the progress here, the direction this takes looks really promising.

@TransGirlCodes
Copy link

TransGirlCodes commented Jan 3, 2017

The main challenge with the R -> Julia direction is that R S3 objects can have multiple classes, set via the class attribute:

class(x) <- c("aa", "bb", "cc")

@simonbyrne Can't this be alleviated by noting the order of class attributes matters in R: normally to implement some kind of mechanism resembling inheritance during S3 dispatch of methods.

So can one route to solving this be to say, well much as with julia, the actual concrete type/class (which I think is usually the first element of class(x)), is the actual solid class of the thing, and therefore determines how RCall or a recipe will try to handle it?

Edit: Ah actually your way is a lot better, I'm clearly tired as I didn't see it on my first pass:

One way is to add an extra "type" argument to rcopy, say RClass:

rcopy(::Type{Matrix{Float64}}, ::RClass{:matrix}, x::RObject{RealSxp}) = ...
rcopy(::RClass{:matrix}, x::RObject{RealSxp}) = ...
and then define the generic rcopy as:

function rcopy{S}(x::RObject{S})
for class in getclasses(x)
classsym = symbol(class)}
if method_exists(rcopy, Tuple{RClass{classsym}, RObject{S}})
return rcopy(RClass{classsym}(), x)
end
end
return rcopy(RClass{:default}(), x)
end

@richardreeve
Copy link

@simonbyrne I'm certainly interested in your earlier offer to give pointers for overloading RCall.sexp. I have a type that I'd like to automatically convert to an R S3 class through @rput if my package is loaded along with RCall, and I hope that will be fairly easy... ? Obviously this will depend on something like JuliaLang/julia#6195 for incorporation into the package as I have no desire for my package to have a dependency on RCall, but it would be useful code for testing in the meanwhile.

Also, in the longer term, we are developing a package in R and Julia simultaneously, and validating them against each other. At the moment, this is working okay(ish) using rcall() and testing results, but I'd like to be able to convert the S4 classes in R to Julia and vice-versa in order to test the object contents directly. You suggest that S4 conversion will be harder in general though - are there any examples of it in this code?

Finally, I'd like to do this automatically via @rput and @rget. I appreciate that this may be more contentious in some cases, where multiple options exist for what the translation should be, but it would certainly be nice to allow it in some restricted cases (such as where the same group is developing the same package!)... and while making the copy constructors / convert() functions work with RObjects seems fine for more technically minded people, it rather destroys the utility of @rput / @rget to have to switch away from it as soon as anything is not in Base R. An @r...-related syntax like:

  • @rput obj metacommunity
  • @rget obj Metacommunity{Float64}

would seem like a better approach for consistency if ambiguities exist, even if all it does is call the copy constructor. This would also help with built in constructors like DataFrames, where you might wish to be able to choose to convert R data frames to DataTables, but there is (and should be!) a default option provided. Similarly just for arrays, you might(?) want to make them (e.g.) an ArrayFire array directly rather than copying them through a standard Julia Array, and you certainly don't want to remove the default behaviour for an array...

@richardreeve
Copy link

Ah, just realised the significance of your RCallRecipes package proposal - maybe I'd be able to do things sooner than I thought, but nonetheless JuliaLang/julia#6195 seems like a better long term solution which would obviate the need for that package, and it is planned for 1.0 as far as I understand from that thread.

@randy3k
Copy link
Member

randy3k commented Apr 27, 2017

For the RCallRecipes approach. Besides the essentials like RObject and rcopy, we might need to export a number of functions in https://github.com/JuliaInterop/RCall.jl/blob/master/src/methods.jl as helpers to make it useful.
It is not clear to me which functions are needed at this point. At the end, we might just make RCallRecipes a lightweight version of RCall (except the Rf_initEmbeddedR part?).

@richardreeve
Copy link

If there's a definite point at which RCallRecipes will no longer be needed which is only a year away, then something quite hacky may suffice... and talking of hacky, I also wondered whether it's possible for the same repository to provide two different packages by having two entries in METADATA, with corresponding src/RCall.jl file and src/RCallRecipes.jl files which would be loaded depending on which package the repository thought it was?

@richardreeve
Copy link

I now have working R translation code for my package using convert(::Type{MyType}, ::RObject) and sexp(::MyType) to move things back and forth. That gives me constructors in both directions (implicitly MyType(::RObject) and RObject(::MyType)). What I don't understand is why that doesn't give me a free pass for using @rput? Surely if I do:

j = MyType()
@rput j

it should be able to recognise that there's a way of translating to R? Instead I have to do:

j = MyType()
r = RObject(j)
@rput r

which seems unnecessarily clumsy. I also presume that the suggestion of @simonbyrne above to use rcopy(::RClass{:matrix}, x::RObject{RealSxp}) = ... isn't implemented yet, because implicit conversion in the opposite direction would also be great?

@randy3k
Copy link
Member

randy3k commented Jun 9, 2017

Is there any error message when you do @rput j? It should be nice if you could provide a minimal example.

@richardreeve
Copy link

Oops, sorry. No problem. The package is here, the branch with the RCall code in is called R, and the code is here. Despite what it says on the README, it's not yet made it into METADATA (the PR is pending - JuliaLang/METADATA.jl#9701), so you'll have to clone it if you want to use it.

julia> using RCall
WARNING: Method definition ==(Base.Nullable{S}, Base.Nullable{T}) in module Base at nullable.jl:238 overwritten in module NullableArrays at /Users/richardr/.julia/v0.6/NullableArrays/src/operators.jl:128.

julia> using Phylo

R> library(ape)

julia> cd(Pkg.dir("Phylo", "src"))

julia> include("rcall.jl")
sexp (generic function with 114 methods)

julia> nu = Nonultrametric(5);

julia> jt = rand(nu)
NamedTree phylogenetic tree with 9 nodes and 8 branches
Leaf names:
String["tip 1", "tip 2", "tip 3", "tip 4", "tip 5"]

julia> @rput jt
ERROR: MethodError: no method matching protect(::RCall.RObject{RCall.VecSxp})
Closest candidates are:
  protect(::Ptr{S<:RCall.Sxp}) where S<:RCall.Sxp at /Users/richardr/.julia/v0.6/RCall/src/types.jl:297
Stacktrace:
 [1] setindex!(::Ptr{RCall.EnvSxp}, ::Phylo.BinaryTree{Phylo.LeafInfo,Void}, ::Symbol) at /Users/richardr/.julia/v0.6/RCall/src/methods.jl:452
 [2] setindex!(::RCall.RObject{RCall.EnvSxp}, ::Phylo.BinaryTree{Phylo.LeafInfo,Void}, ::Symbol) at /Users/richardr/.julia/v0.6/RCall/src/methods.jl:461

julia> rt = RObject(jt)
RCall.RObject{RCall.VecSxp}

Phylogenetic tree with 5 tips and 4 internal nodes.

Tip labels:
[1] "tip 1" "tip 2" "tip 3" "tip 4" "tip 5"

Rooted; includes branch lengths.


julia> @rput rt
RCall.RObject{RCall.VecSxp}

Phylogenetic tree with 5 tips and 4 internal nodes.

Tip labels:
[1] "tip 1" "tip 2" "tip 3" "tip 4" "tip 5"

Rooted; includes branch lengths.


R> rt

Phylogenetic tree with 5 tips and 4 internal nodes.

Tip labels:
[1] "tip 1" "tip 2" "tip 3" "tip 4" "tip 5"

Rooted; includes branch lengths.

@richardreeve
Copy link

@randy3k Please let me know if you need more. As I'm confident is obvious from the code, I have no idea what I'm doing with the RCall interface - I just played with things that I found in the package until I got the kind of behaviour I wanted. Any suggestions are very welcome!

@randy3k
Copy link
Member

randy3k commented Jun 9, 2017

Thanks for providing the example. Unfortunately, I won't have time to look it up until next week.

@richardreeve
Copy link

No problem, thanks for being interested anyway!

@randy3k
Copy link
Member

randy3k commented Jun 9, 2017

I have quickly skimed over your package. The issue was that sexp should return a SEXP object rather a RObject object. It may not be easy for those who are not familar with the R api. I could perpare a PR to your package once I have time.

@richardreeve
Copy link

Ah, fab - thanks for the pointer. And you're right - I'm struggling with it. I tried calling sexp(tor) to get the SEXP back (instead of RObject(tor)), but the next rcall()s returned RObjects anyway, so I just return the p field from the RObject at the end of the function, and that now works... does that seem reasonable or have I committed some terrible faux pas?

@randy3k
Copy link
Member

randy3k commented Jun 9, 2017

There is a version of rcall_p which returns a SEXP object. The difference between RObject and SEXP is that RObjects are protected from R garbage collection.

sexp functions convert any objects to the internal SEXP object. Creation of any RObjects within the sexp should be avoided. To protect the temporary SEXP objects from GC, the functions protect and unprotect should be used.

@richardreeve
Copy link

richardreeve commented Jun 12, 2017

@randy3k - Thanks very much for all of your help. That all seems to be working seamlessly in the Julia -> R direction now, and it makes a lot more sense. Is there any problem with the protection from R garbage collection when I repeatedly overwrite a variable (in a loop) that I'm exporting to R? I do that a lot in testing between our R and Julia packages, and I don't want to be leaking memory... hopefully the R gc happens automatically when the underlying julia objects are garbage collected?

@randy3k
Copy link
Member

randy3k commented Jun 12, 2017

When an RObject is created, a finalizer is also registered so that when it is freed by the Julia GC, its memory in R will be released too. But such mechanism is a bit heavy. Within a function stack frame, the memory should be protected manually using the protect and unprotect pair.

It is just not efficient to have RObjects everywhere with a function.

@richardreeve
Copy link

Okay, thanks. So now I have conversion working transparently from Julia to R, I'm still a bit away from the opposite direction:

julia> using RCall
WARNING: Method definition ==(Base.Nullable{S}, Base.Nullable{T}) in module Base at nullable.jl:238 overwritten in module NullableArrays at /Users/richardr/.julia/v0.6/NullableArrays/src/operators.jl:128.

julia> using Phylo

julia> include(Pkg.dir("Phylo", "src/rcall.jl"));

R> library(ape)

R> rt <- rtree(4)

julia> jt=NamedTree(R"rt")
NamedTree phylogenetic tree with 7 nodes and 6 branches
Leaf names:
String["t2", "t1", "t4", "t3"]

julia> @rput jt;

R> jt

Phylogenetic tree with 4 tips and 3 internal nodes.

Tip labels:
[1] "t2" "t1" "t4" "t3"

Rooted; includes branch lengths.

R> all.equal(rt, jt)
[1] TRUE

Apart from the RCallRecipes issue, it would be extremely handy to provide an @rget-like replacement for the currently hacky jt=NamedTree(R"rt"). Are there any thoughts about providing some interface like @rget{NamedTree} rt or whatever to do this? Or ideally something even cleaner like a mechanism for registering converters by R class as suggested above so that @rget rt will work on its own?

@randy3k
Copy link
Member

randy3k commented Jun 13, 2017

#192 has implemented a version of @simonbyrne's RClass idea.

The idea is an user should define an explicit rcopy method, e.g. rcopy{S <: Sxp}(::Type{Foo}, s::Ptr{S}), which converts an SEXP to julia object of type Foo. This definition allows one to execute
rcopy(Foo, r), convert(Foo, r) and Foo(r) (unless they are defined somewhere else).

However the default rcopy call, rcopy(r) doesn't know what type it should convert to.
Here we use RClass, suppose the corresponding R class of Foo is Bar, the conversion rule could be defined via

rcopytype{S <: Sxp}(::Type{RClass{:Bar}}, s::Ptr{S}) = Foo

This will allow rcopy(r) to dispatch rcopy(Foo, r) if r has an R class Bar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants