feat(autoware_probabilistic_occupancy_grid_map): cuda accelerated implementation #9542

knzo25 · 2024-12-03T02:53:00Z

Description

This PR introduces a CUDA accelerated version of the OGM.
Several improvements are possible, but this PR attempts to replicate whenever possible all the functionalities of the original OGM that were actively used.

Algorithms that were not reimplemented in CUDA:

extractCommonPointCloud: this option was off in all configs and launchers, and talking with @YoshiRi, it would not be used

Algorithms that were not tested:

Fusion. No project actively uses it, but it should work since it uses a non accelerated version

How was this PR tested?

Tested with our internal reference design. The OGM will not give the exact same results, but visual inspection shows almost the same characteristics.
The latency with a VLS128 goes from ~20ms to 2-3ms

Notes for reviewers

@soblin
This PR introduces CUDA dependencies. If this would break any P/C workflows, please let me know and I will try to make a version that works even without CUDA installes (sadly, via macros).

In addition, I was told that I should not accelerate the laserscan method. If this is wrong , please let me know

None.

Interface changes

None.

Effects on system behavior

None.

github-actions · 2024-12-03T02:53:18Z

Thank you for contributing to the Autoware project!

🚧 If your pull request is in progress, switch it to draft mode.

Please ensure:

You've checked our contribution guidelines.
Your PR follows our pull request guidelines.
All required CI checks pass before marking the PR ready for review.

knzo25 · 2024-12-03T02:57:58Z

Note to self: run this with the evaluator

knzo25 · 2024-12-03T05:55:39Z

@YoshiRi
Using CUDA we can delete the 2 subsample filters from the raw and obstacle pointcloud. Would you prefer a different PR for that?

@soblin
Here or internally, can you give a few pointers on how to check that I did not break the laserscan method?

perception/autoware_probabilistic_occupancy_grid_map/CMakeLists.txt

youtalk

This does not seem to be caused by the Autoware development container.

https://github.com/autowarefoundation/autoware.universe/actions/runs/12150740163/job/33884090279?pr=9542#step:5:7660

Error: ponent_container-1] [ERROR] [1733273274.264398228] [occupancy_grid_map_container]: Component constructor threw an exception: cudaErrorInsufficientDriver (35)@/__w/autoware.universe/autoware.universe/install/autoware_cuda_utils/include/autoware/cuda_utils/cuda_unique_ptr.hpp#L43: CUDA driver version is insufficient for CUDA runtime version

The error indicates that the CUDA runtime library is missing when loading into the component container during execution.
Therefore, it is believed that referencing other CUDA-dependent packages and modifying the component loading should resolve the issue.

knzo25 · 2024-12-04T06:58:56Z

@youtalk
In the past and in my own environment, I had only seen that issue when I messed up my environment (cuda and driver installations).

I tried using the docker environment
https://autowarefoundation.github.io/autoware-documentation/main/installation/autoware/docker-installation/

and ran:

colcon test --packages-select autoware_probabilistic_occupancy_grid_map
colcon test

with no errors (to the relevant package). Should I expect the CI/CD to pass build and test if it works in autoware's docker installation?

soblin · 2024-12-05T09:54:56Z

@knzo25 Sorry I noticed your comments. For P/C workflow, no worry.

soblin · 2024-12-05T09:57:26Z

@knzo25 For testing laserscan method, run

ros2 launch autoware_launch planning_simulator.launch.xml map_path:=<> vehicle_model:=<> sensor_model:=<>

, then place ego and a NPC, and visualize the OccupancyGridMap (you can find the checkbox under perception). And check if the region behind of NPC from ego is gray

YoshiRi · 2024-12-12T00:51:10Z

@knzo25

Using CUDA we can delete the 2 subsample filters from the raw and obstacle pointcloud. Would you prefer a different PR for that?

You mean we can remove obstacle pointcloud filtering with raw data I explained last Friday?
If it is not complicated, I think it's okay to include that feature within this PR.

knzo25 · 2024-12-13T01:09:28Z

@YoshiRi
I mean, we can delete the Pickup subsample (the extra node). It takes about 10ms for the raw pointcloud currently. If we forego that one, of course, the new implementation will be a bit slower, but not 10ms (sorry, I haven't had the chance to provide all the evidence required here 🙇 )

technolojin

Except the conflict, this PR worked in my local environment.

Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>

technolojin · 2024-12-27T00:45:47Z

I accidentally pushed (-forced) into your repo. I am sorry for this.
The changes are resolving the conflict and update relatively latest main(rebase).

YoshiRi

It worked on my environment. I left some trivial comments to fix.

YoshiRi · 2025-01-07T07:28:04Z

perception/autoware_probabilistic_occupancy_grid_map/lib/utils/utils_kernel.cu

+  int offset_dx = dx > 0 ? 1 : -1;            // sign(dx);
+  int offset_dy = dy > 0 ? size_x : -size_x;  // sign(dy) * size_x;
+
+  float scale = (dist == 0.0) ? 1.0 : min(1.f, max_length / dist);


for safety, how about using epsilon?

std::abs(dist) < epsilon

YoshiRi · 2025-01-07T07:29:43Z

perception/autoware_probabilistic_occupancy_grid_map/lib/utils/utils_kernel.cu

+    sm_lower_left_y * sm_size_x + sm_lower_left_x + region_y * sm_size_x + region_x;
+  const int dm_index =
+    dm_lower_left_y * dm_size_x + dm_lower_left_x + region_y * dm_size_x + region_x;
+


[imho]
Add index check for safety

if(dm_index >= dm_size_x * dm_size_y) return;

YoshiRi · 2025-01-07T07:42:21Z

...utoware_probabilistic_occupancy_grid_map/lib/updater/log_odds_bayes_filter_updater_kernel.cu

+  const double p1_norm = std::max(EPSILON_PROB, std::min(1.0 - EPSILON_PROB, p1));
+  log_odds += std::log(p1_norm / (1.0 - p1_norm));
+
+  const double p2_norm = std::max(EPSILON_PROB, std::min(1.0 - EPSILON_PROB, p1));


Sorry, maybe my mistake

Suggested change

const double p2_norm = std::max(EPSILON_PROB, std::min(1.0 - EPSILON_PROB, p1));

const double p2_norm = std::max(EPSILON_PROB, std::min(1.0 - EPSILON_PROB, p2));

YoshiRi · 2025-01-07T07:43:25Z

...utoware_probabilistic_occupancy_grid_map/lib/updater/log_odds_bayes_filter_updater_kernel.cu

+
+__host__ __device__ __forceinline__ std::uint8_t convertProbabilityToChar(const double value)
+{
+  return static_cast<std::uint8_t>(value * 255.0);


Clip before casting will be safer.

Suggested change

return static_cast<std::uint8_t>(value * 255.0);

return static_cast<std::uint8_t>(std::max(0.0, std::min(1.0, value)) * 255.0);;

knzo25 requested review from soblin, YoshiRi and yukkysaito as code owners December 3, 2024 02:53

github-actions bot added component:perception Advanced sensor data processing and environment understanding. (auto-assigned) tag:require-cuda-build-and-test labels Dec 3, 2024

knzo25 self-assigned this Dec 3, 2024

knzo25 added the run:build-and-test-differential Mark to enable build-and-test-differential workflow. (used-by-ci) label Dec 3, 2024

youtalk reviewed Dec 3, 2024

View reviewed changes

perception/autoware_probabilistic_occupancy_grid_map/CMakeLists.txt Outdated Show resolved Hide resolved

youtalk reviewed Dec 4, 2024

View reviewed changes

technolojin self-requested a review December 24, 2024 01:22

technolojin approved these changes Dec 25, 2024

View reviewed changes

knzo25 added 7 commits December 27, 2024 09:36

feat: implemented a cuda accelerated ogm

7dfcaca

Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>

chore: fixed cspells

5b65b6f

Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>

chore: unused header and variable names

0b37e18

Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>

chore: cspell fixes

abca6e3

Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>

chore: cppcheck fix attempt

43d79eb

Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>

fix: attempting to fix ci/cd regarding the cuda library

7905835

Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>

chore: fixed the order of the cuda check in the cmakelist

83871b0

Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>

knzo25 added 2 commits December 27, 2024 09:36

fix: removed cuda as a required dep for cpu only builds

9b58df0

Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>

fix: missing cuda linking (?)

5bfa282

Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>

technolojin force-pushed the feat/cuda_probabilistic_occupancy_grid_map branch from 64d3030 to 5bfa282 Compare December 27, 2024 00:43

YoshiRi reviewed Jan 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(autoware_probabilistic_occupancy_grid_map): cuda accelerated implementation #9542

feat(autoware_probabilistic_occupancy_grid_map): cuda accelerated implementation #9542

knzo25 commented Dec 3, 2024 •

edited

Loading

github-actions bot commented Dec 3, 2024 •

edited

Loading

knzo25 commented Dec 3, 2024

knzo25 commented Dec 3, 2024

youtalk left a comment

knzo25 commented Dec 4, 2024 •

edited

Loading

soblin commented Dec 5, 2024

soblin commented Dec 5, 2024

YoshiRi commented Dec 12, 2024

knzo25 commented Dec 13, 2024

technolojin left a comment

technolojin commented Dec 27, 2024 •

edited

Loading

YoshiRi left a comment

YoshiRi Jan 7, 2025

YoshiRi Jan 7, 2025 •

edited

Loading

YoshiRi Jan 7, 2025

YoshiRi Jan 7, 2025

	const double p2_norm = std::max(EPSILON_PROB, std::min(1.0 - EPSILON_PROB, p1));
	const double p2_norm = std::max(EPSILON_PROB, std::min(1.0 - EPSILON_PROB, p2));

	return static_cast<std::uint8_t>(value * 255.0);
	return static_cast<std::uint8_t>(std::max(0.0, std::min(1.0, value)) * 255.0);;

feat(autoware_probabilistic_occupancy_grid_map): cuda accelerated implementation #9542

Are you sure you want to change the base?

feat(autoware_probabilistic_occupancy_grid_map): cuda accelerated implementation #9542

Conversation

knzo25 commented Dec 3, 2024 • edited Loading

Description

Related links

How was this PR tested?

Notes for reviewers

Interface changes

Effects on system behavior

github-actions bot commented Dec 3, 2024 • edited Loading

knzo25 commented Dec 3, 2024

knzo25 commented Dec 3, 2024

youtalk left a comment

Choose a reason for hiding this comment

knzo25 commented Dec 4, 2024 • edited Loading

soblin commented Dec 5, 2024

soblin commented Dec 5, 2024

YoshiRi commented Dec 12, 2024

knzo25 commented Dec 13, 2024

technolojin left a comment

Choose a reason for hiding this comment

technolojin commented Dec 27, 2024 • edited Loading

YoshiRi left a comment

Choose a reason for hiding this comment

YoshiRi Jan 7, 2025

Choose a reason for hiding this comment

YoshiRi Jan 7, 2025 • edited Loading

Choose a reason for hiding this comment

YoshiRi Jan 7, 2025

Choose a reason for hiding this comment

YoshiRi Jan 7, 2025

Choose a reason for hiding this comment

knzo25 commented Dec 3, 2024 •

edited

Loading

github-actions bot commented Dec 3, 2024 •

edited

Loading

knzo25 commented Dec 4, 2024 •

edited

Loading

technolojin commented Dec 27, 2024 •

edited

Loading

YoshiRi Jan 7, 2025 •

edited

Loading