-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GPU] prevent too long sort in experimental detectron generate proposals single image #28422
[GPU] prevent too long sort in experimental detectron generate proposals single image #28422
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
looks like not fixed issue, still hang in specific case |
works |
Commit title is a bit too long. Second part of it should rather be in the description |
...src/kernel_selector/cl_kernels/experimental_detectron_generate_proposals_single_image_ref.cl
Show resolved
Hide resolved
...src/kernel_selector/cl_kernels/experimental_detectron_generate_proposals_single_image_ref.cl
Show resolved
Hide resolved
static int static_counter = 0; | ||
static_counter++; | ||
int pivot_idx = l; | ||
if(static_counter%3 == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is 3 here? May be need to define it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks to me like a cyclic pivot selection rotation scheme in order to avoid worst case scenario in quicksort
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that a simple comment may suffice so it's more clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
on instance-segmentation-security-0050/FP32/instance-segmentation-security-0050.xml speedup from 1.55fps(0010b3b - ref) to 4.26 FPS [36bab31] , gpu uhd770, infer_precision=f32(ref version crash on f16) |
Details:
segmentation-security-0010.xml speed up 0.14fps->0.19fps
Tickets: