Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuDF support (CUDA Data Frame) #665

Open
archenroot opened this issue Dec 27, 2018 · 9 comments
Open

cuDF support (CUDA Data Frame) #665

archenroot opened this issue Dec 27, 2018 · 9 comments

Comments

@archenroot
Copy link
Member

As per my understanding this could become defacto standard for Spark integration with Rapids for GPU accelerations in future. I will work on this API, just put here issue for reference.

@saudet
Copy link
Member

saudet commented Jun 21, 2019

Well, it looks like these guys like to do it the hard way too after all: rapidsai/cudf#1995

Nevertheless, there's probably a lot of features from the C++ API not accessible with those wrappers, so similarly to OpenCV, I think it's still worthwhile to maintain separate wrappers for the C++ API.

@archenroot
Copy link
Member Author

archenroot commented Jun 21, 2019

Shame on me, I waited for more stability and finally missed the right point in time, I am now finishing job in UAE and returning to EU region with some vacation, I will look at that (also I still didn't push the gunrock apis, shame on me :-) )

@razajafri
Copy link

@archenroot have you started work on the presets for cudf?

I started reading about JavaCPP and it seems like I need a list of headers in order of precedence. Is that true? If so have you started compiling a list already that I can add to?

@archenroot
Copy link
Member Author

@razajafri - I am more monitoring what rapids team is going to decide rapidsai/cudf#1995

@razajafri
Copy link

razajafri commented Nov 18, 2019

@archenroot I am a contributor on rapids java bindings and the reason why I reached out is so I could evaluate javacpp. Please let me know if you have done any work on it that I can build on top.

@saudet
Copy link
Member

saudet commented Nov 18, 2019

@razajafri I see! Thanks for reaching out. The C++ API itself looks pretty clean, so it shouldn't be harder to map than CUDA itself, which is basically these presets files here:
https://github.com/bytedeco/javacpp-presets/tree/master/cuda/src/main/java/org/bytedeco/cuda/presets

We do need to list the headers files that we wish to map in an order that makes sense with respect to C++, yes, a topological sort of sorts. (Something that could be automated up to a point, which will happen when I get the chance to work on this, but probably not before a couple of years...)

Now, if I understand correctly, cuDF depends on Arrow, so we would need to map that one first. The official Java wrappers for Arrow are pretty limited and not very efficient, so we are already considering providing our own wrappers for the C++ API. In other words, it's something I will probably get to do as part of my work anyway (and then other developers will be able to start providing more idiomatic APIs on top of that easily). Do you guys have a timetable in mind for this?

/cc @agibsonccc

@razajafri
Copy link

@saudet thanks for the detailed explanation. cudf doesn't directly expose any Arrow APIs that I know of. Do we still need to provide Java presets for it? Can it not just be a lib dependency instead?

TBH we don't have the bandwidth for this I was going to spend a couple of hours on it to see how easy/difficult it is to add Java presets for cudf. I am still willing to contribute as much as I can as a personal goal of mine. I would love to get on a call with you or anyone else willing to go over the process of creating presets. I have read the documentation already, I would like to know more about automating the header topsort.

@saudet
Copy link
Member

saudet commented Nov 19, 2019

If it doesn't expose any data structures from Arrow, yes, we don't need to do anything for that separately. Although it would enhance interoperability if we did, so still worth to do at some point in any case.

Listing the header files really isn't an issue. What takes most of the time is figuring out the right "info" to make everything parse and compile. The other thing that takes most of the time is understanding how to make the library actually build. cuDF doesn't appear easy to build, for example, see issue rapidsai/cudf#2770. Imagine you were a newbie and had to build cuDF on all supported platforms. I estimate that it would take about the same amount of time that we need to tinker with the header files to get them working properly, at least a few days. Does that sound like too much work? On the other hand, wrapping everything manually with JNI would take much much longer, while a limited Java API wouldn't be useful for many use cases.

Another way we could separate the workload is having me write the presets for cuDF, and having someone like you get the builds passing on, for example, Travis CI, and write the high-level APIs on top of the presets. This is something that I would be able to do as part of my work since we'd be in effect collaborating with NVIDIA on that. We can have calls too if you'd like, that's fine. Please send me email anytime you'd like to schedule something!

@saudet
Copy link
Member

saudet commented Dec 9, 2019

@razajafri I've finished creating initial presets for Arrow here:
https://github.com/bytedeco/javacpp-presets/tree/master/arrow
Presets for cuDF would most likely be very similar to that and once created they are very easy to maintain. If this looks like something you would like to use either as is or to build a high-level API on top, please let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants