Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set of operations to benchmark #1

Open
ChrisRackauckas opened this issue Oct 8, 2017 · 5 comments
Open

Set of operations to benchmark #1

ChrisRackauckas opened this issue Oct 8, 2017 · 5 comments

Comments

@ChrisRackauckas
Copy link

I think we should come up with a pretty comprehensive set of operations to benchmark. Is there a standard list for the field? Probably something with some groups, joins, etc. @davidanthoff or @explodingman might have some good resources?

@xiaodaigh
Copy link
Owner

Yeah, can test

leftjoin, innerjoin, outer join, and antijoin.

transpose, melt etc

Basically we can look into dplyr and data.table operations and just add one operation at a time.

@ChrisRackauckas
Copy link
Author

Here's something to look at and track: JuliaData/DataTables.jl#17

@ChrisRackauckas
Copy link
Author

Could we get them generating one big table like https://github.com/Rdatatable/data.table/wiki/Benchmarks-:-Grouping#code-to-reproduce-the-timings-above-? We can do Pandas through Pandas.jl and data.frame + dplyr through

@xiaodaigh
Copy link
Owner

xiaodaigh commented Oct 8, 2017

I've tried before Rdatatable/data.table#974

I am actually interested in benchmarks using real-life data instead of purely relying on synthetics

@ChrisRackauckas
Copy link
Author

I am actually interested in benchmarks using real-life data instead of purely relying on synthetics

A mix is usually a good idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants