-
Notifications
You must be signed in to change notification settings - Fork 673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(autoware_probabilistic_occupancy_grid_map): cuda accelerated implementation #9542
base: main
Are you sure you want to change the base?
feat(autoware_probabilistic_occupancy_grid_map): cuda accelerated implementation #9542
Conversation
Thank you for contributing to the Autoware project! 🚧 If your pull request is in progress, switch it to draft mode. Please ensure:
|
Note to self: run this with the evaluator |
perception/autoware_probabilistic_occupancy_grid_map/CMakeLists.txt
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not seem to be caused by the Autoware development container.
Error: ponent_container-1] [ERROR] [1733273274.264398228] [occupancy_grid_map_container]: Component constructor threw an exception: cudaErrorInsufficientDriver (35)@/__w/autoware.universe/autoware.universe/install/autoware_cuda_utils/include/autoware/cuda_utils/cuda_unique_ptr.hpp#L43: CUDA driver version is insufficient for CUDA runtime version
The error indicates that the CUDA runtime library is missing when loading into the component container during execution.
Therefore, it is believed that referencing other CUDA-dependent packages and modifying the component loading should resolve the issue.
@youtalk I tried using the docker environment and ran:
with no errors (to the relevant package). Should I expect the CI/CD to pass build and test if it works in autoware's docker installation? |
@knzo25 Sorry I noticed your comments. For P/C workflow, no worry. |
@knzo25 For testing laserscan method, run
, then place ego and a NPC, and visualize the OccupancyGridMap (you can find the checkbox under perception). And check if the region behind of NPC from ego is gray |
You mean we can remove obstacle pointcloud filtering with raw data I explained last Friday? |
@YoshiRi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except the conflict, this PR worked in my local environment.
Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>
Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>
Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>
Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>
Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>
Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>
Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>
Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>
Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>
64d3030
to
5bfa282
Compare
I accidentally pushed (-forced) into your repo. I am sorry for this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It worked on my environment. I left some trivial comments to fix.
int offset_dx = dx > 0 ? 1 : -1; // sign(dx); | ||
int offset_dy = dy > 0 ? size_x : -size_x; // sign(dy) * size_x; | ||
|
||
float scale = (dist == 0.0) ? 1.0 : min(1.f, max_length / dist); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for safety, how about using epsilon?
std::abs(dist) < epsilon
sm_lower_left_y * sm_size_x + sm_lower_left_x + region_y * sm_size_x + region_x; | ||
const int dm_index = | ||
dm_lower_left_y * dm_size_x + dm_lower_left_x + region_y * dm_size_x + region_x; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[imho]
Add index check for safety
if(dm_index >= dm_size_x * dm_size_y) return;
const double p1_norm = std::max(EPSILON_PROB, std::min(1.0 - EPSILON_PROB, p1)); | ||
log_odds += std::log(p1_norm / (1.0 - p1_norm)); | ||
|
||
const double p2_norm = std::max(EPSILON_PROB, std::min(1.0 - EPSILON_PROB, p1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, maybe my mistake
const double p2_norm = std::max(EPSILON_PROB, std::min(1.0 - EPSILON_PROB, p1)); | |
const double p2_norm = std::max(EPSILON_PROB, std::min(1.0 - EPSILON_PROB, p2)); |
|
||
__host__ __device__ __forceinline__ std::uint8_t convertProbabilityToChar(const double value) | ||
{ | ||
return static_cast<std::uint8_t>(value * 255.0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clip before casting will be safer.
return static_cast<std::uint8_t>(value * 255.0); | |
return static_cast<std::uint8_t>(std::max(0.0, std::min(1.0, value)) * 255.0);; |
Description
This PR introduces a CUDA accelerated version of the OGM.
Several improvements are possible, but this PR attempts to replicate whenever possible all the functionalities of the original OGM that were actively used.
Algorithms that were not reimplemented in CUDA:
extractCommonPointCloud
: this option was off in all configs and launchers, and talking with @YoshiRi, it would not be usedAlgorithms that were not tested:
Related links
Parent Issue:
How was this PR tested?
Tested with our internal reference design. The OGM will not give the exact same results, but visual inspection shows almost the same characteristics.
The latency with a VLS128 goes from ~20ms to 2-3ms
Notes for reviewers
@soblin
This PR introduces CUDA dependencies. If this would break any P/C workflows, please let me know and I will try to make a version that works even without CUDA installes (sadly, via macros).
In addition, I was told that I should not accelerate the laserscan method. If this is wrong , please let me know
None.
Interface changes
None.
Effects on system behavior
None.