-
-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rz buffer optimize #4779
base: dev
Are you sure you want to change the base?
Rz buffer optimize #4779
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tested the changes already with perf
or similar?
I am not sure if this is enough. Since the compiler not necessarily knows the b->type
value. Depending on how the user of RzBuffer
is calls the buffer API.
It might require some if (b->type != RZ_BUF_....) return
in the using function, to signal the compiler which type the RzBuffer object is.
But we should try this out.
How should I test it against test cases which contain hotpaths? If this doesn't work, then one of the solutions could be using this (although I'm not sure): static bool buf_init(RzBuffer *b, const void *user){
if(b->type != RZ_BUFFER_BYTES && b->type != RZ_BUFFER_MMAP){
rz_return_val_if_fail(b && b->methods, false);
return b->methods->init ? b->methods->init(b, user) : true;
}
// Otherwise handle.
} We can also create a wrapper function which handles |
I think the easiest is to write some small C program which you link against You have to install Rizin every time you make a change to |
ut64 sz; | ||
const ut8 *buf; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
define these at the beginning of the function and remove the other definitions within the if/else.
@Rot127 I ran tests for an input of 100.8 MB, writing it to buffer once and reading through it 10,000 times and here is what I found:
if (len > result) {
memset(buf + result, b->Oxff_priv, len - result);
} Can you please check once whether the case you were implementing |
@tushar3q34 Thank you for the preliminary results! Can you share your test code please? So we have the same basis to work with? |
Yeah sure : Test code for RzBuffer Cases I tested :
switch (b->type) {
case RZ_BUFFER_BYTES:
case RZ_BUFFER_MMAP:
return buf_bytes_read(b, buf, len);
break;
default:
rz_return_val_if_fail(b->methods, -1);
return b->methods->read ? b->methods->read(b, buf, len) : -1;
break;
}
return buf_bytes_read(b, buf, len);
struct buf_bytes_priv *priv = get_priv_bytes(b);
if (!priv->buf) {
// This can happen when called from buf_mmap.c, when you open a 0-length
// file.
return -1;
}
ut64 real_len = priv->length < priv->offset ? 0 : RZ_MIN(priv->length - priv->offset, len);
memmove(buf, priv->buf + priv->offset, real_len);
priv->offset += real_len;
return real_len;
|
Ok, this is pretty funny. I rewrote your testing a little, to be more similar to my slow use case from before. But I couldn't replicate it. Thanks so far for checking it. I will implement my working minimal example tomorrow. And send it here. |
Just a comment: please be careful on using |
@tushar3q34 Ok, so here is the branch with the problem: You can build it and run the following command on some larger file:
The functions which are relevant are:
If you simply time rizin -qc "/x a158f101" ./input.data
Searching... hits: 0
real 0m1.618s
user 0m5.596s
sys 0m0.104s Now, this is a branch which uses time rizin -qc "/x a158f101" ./input.data
Searching... hits: 0
real 0m0.835s
user 0m2.376s
sys 0m0.105s I don't know how people do performance testing on Windows or MacOS. But on Linux you usually use You use it like this: perf record rizin -qc "/x a158f101" ./input.data
perf report # Opens details which functions took how much CPU time The
The (faster)
In general it is ok if I hope you can work from there on. If you have questions please let me know. Also, feel free to question my assumptions I made, that it is indeed the fetching of function pointers which lead to the performance hit. Also please note, that the "fixed" faster branch has some other edits. But they still do logically more or less the same. |
@Rot127 I ran the performance tests on the branch and modified the code in different ways to get the following results. I ran performance tests using Time Profiling Instrument on Xcode since my system runs on MacOS.
How would you like to proceed from here? Should I push the latest changes to the PR for review, or would you prefer a discussion or further refinements? |
@tushar3q34 looks like the 5th option is the best one. |
@tushar3q34 Nice work! fyi: Inlining As it looks like we would get around +5% runtime spent on Those ~23% on This is the only idea I have currently. Of course we could replace It is an interesting problem, but I honestly don't know how to design it nicely. If you have any ideas, please drop them. |
There is very less chance that it is possible because of
I will think and try a few more things and get back to you. Thanks. |
It's possible to inline an API if it's done like in |
Another option you could try GCC/Clang function attribute #ifdef _MSC_VER
#define forceinline __forceinline
#elif defined(__GNUC__)
#define forceinline inline __attribute__((__always_inline__))
#elif defined(__CLANG__)
#if __has_attribute(__always_inline__)
#define forceinline inline __attribute__((__always_inline__))
#else
#define forceinline inline
#endif
#else
#define forceinline inline
#endif |
Yes, I tried that but the issue is that we need definition of If I use the attribute |
Hi @Rot127, I tried these methods but none of them could lead to inlining of
buf = buf_bytes_get_whole_buf(b, &sz);
if (sz <= start) {
return 0;
}
return fwd_scan(buf + start, RZ_MIN(sz - start, amount), user); But in all these, buf = buffer->methods->get_whole_buf(buffer, &sz); // or other methods to read whole buffer
bytes_pattern_compare(buf + offset, RZ_MIN(sz - offset, leftovers), user); So we can use |
Your checklist for this pull request
Detailed description
Inefficiency in
RzBuffer
occurs due to failure in inlining methods likeread
,write
,get_whole_buf
etc. as mentioned in the issue. The changes I made include :RzBufferMethods
forRZ_BUFFER_BYTES
andRZ_BUFFER_MMAP
b->methods->init
for byte type buffer,buf_bytes_init()
is used which should get inlined.Test plan
...
Closing issues
closes #4769
...