-
Notifications
You must be signed in to change notification settings - Fork 933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add multi-source reading to JSON reader benchmarks #17688
Add multi-source reading to JSON reader benchmarks #17688
Conversation
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Null probability has been set to zero in the random table generator so that the JSON reader benchmark does not fail for multi-source multi-batch reading. Once issue #17689 is resolved, the change to the table generator will be reverted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small question otherwise looks good
|
||
json_read_common(source_sink, num_rows, state, comptype, data_size); | ||
std::vector<char> out_buffer; | ||
auto sink = cudf::io::sink_info(&out_buffer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing a pointer to a vector in modern C++ is generally odd. Are you referring to out_buffer.data()
or just out_buffer
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm using the data sink constructor that accepts a pointer to a output host vector -
cudf/cpp/include/cudf/io/types.hpp
Line 555 in 39bcd16
explicit sink_info(std::vector<char>* buffer) : _type(io_type::HOST_BUFFER), _buffers({buffer}) {} |
It is indeed a little awkward to pass a pointer to a vector, but I'm not sure if there's a cleaner way to construct a
sink_info
object.@vuule do you have any suggestions on a better approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that the _buffer
data member of sink_info
should be a vector of char spans, but addressing this is beyond the scope of the current PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did I miss these comments?
The writer resizes the vector, as the output size is not known in advance. Passing a span unfortunately does not work here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah OK, then the buffer owner could use a shared pointer to avoid passing raw pointers around.
/merge |
Description
Depends on #17708
Enables benchmarking of multi-source multi-batch JSON reader by adding another axis for number of input sources in the benchmark.
Checklist