Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(autoware_probabilistic_occupancy_grid_map): cuda accelerated implementation #9542

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

knzo25
Copy link
Contributor

@knzo25 knzo25 commented Dec 3, 2024

Description

This PR introduces a CUDA accelerated version of the OGM.
Several improvements are possible, but this PR attempts to replicate whenever possible all the functionalities of the original OGM that were actively used.

Algorithms that were not reimplemented in CUDA:

  • extractCommonPointCloud: this option was off in all configs and launchers, and talking with @YoshiRi, it would not be used

Algorithms that were not tested:

  • Fusion. No project actively uses it, but it should work since it uses a non accelerated version

Related links

Parent Issue:

  • Link

How was this PR tested?

Tested with our internal reference design. The OGM will not give the exact same results, but visual inspection shows almost the same characteristics.
The latency with a VLS128 goes from ~20ms to 2-3ms

Notes for reviewers

@soblin
This PR introduces CUDA dependencies. If this would break any P/C workflows, please let me know and I will try to make a version that works even without CUDA installes (sadly, via macros).

In addition, I was told that I should not accelerate the laserscan method. If this is wrong , please let me know

None.

Interface changes

None.

Effects on system behavior

None.

Copy link

github-actions bot commented Dec 3, 2024

Thank you for contributing to the Autoware project!

🚧 If your pull request is in progress, switch it to draft mode.

Please ensure:

@github-actions github-actions bot added component:perception Advanced sensor data processing and environment understanding. (auto-assigned) tag:require-cuda-build-and-test labels Dec 3, 2024
@knzo25
Copy link
Contributor Author

knzo25 commented Dec 3, 2024

Note to self: run this with the evaluator

@knzo25 knzo25 self-assigned this Dec 3, 2024
@knzo25
Copy link
Contributor Author

knzo25 commented Dec 3, 2024

@YoshiRi
Using CUDA we can delete the 2 subsample filters from the raw and obstacle pointcloud. Would you prefer a different PR for that?

@soblin
Here or internally, can you give a few pointers on how to check that I did not break the laserscan method?

@knzo25 knzo25 added the run:build-and-test-differential Mark to enable build-and-test-differential workflow. (used-by-ci) label Dec 3, 2024
Copy link
Member

@youtalk youtalk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not seem to be caused by the Autoware development container.

https://github.com/autowarefoundation/autoware.universe/actions/runs/12150740163/job/33884090279?pr=9542#step:5:7660

Error: ponent_container-1] [ERROR] [1733273274.264398228] [occupancy_grid_map_container]: Component constructor threw an exception: cudaErrorInsufficientDriver (35)@/__w/autoware.universe/autoware.universe/install/autoware_cuda_utils/include/autoware/cuda_utils/cuda_unique_ptr.hpp#L43: CUDA driver version is insufficient for CUDA runtime version

The error indicates that the CUDA runtime library is missing when loading into the component container during execution.
Therefore, it is believed that referencing other CUDA-dependent packages and modifying the component loading should resolve the issue.

@knzo25
Copy link
Contributor Author

knzo25 commented Dec 4, 2024

@youtalk
In the past and in my own environment, I had only seen that issue when I messed up my environment (cuda and driver installations).

I tried using the docker environment
https://autowarefoundation.github.io/autoware-documentation/main/installation/autoware/docker-installation/

and ran:

colcon test --packages-select autoware_probabilistic_occupancy_grid_map
colcon test

with no errors (to the relevant package). Should I expect the CI/CD to pass build and test if it works in autoware's docker installation?

@soblin
Copy link
Contributor

soblin commented Dec 5, 2024

@knzo25 Sorry I noticed your comments. For P/C workflow, no worry.

@soblin
Copy link
Contributor

soblin commented Dec 5, 2024

@knzo25 For testing laserscan method, run

ros2 launch autoware_launch planning_simulator.launch.xml map_path:=<> vehicle_model:=<> sensor_model:=<> 

, then place ego and a NPC, and visualize the OccupancyGridMap (you can find the checkbox under perception). And check if the region behind of NPC from ego is gray

@YoshiRi
Copy link
Contributor

YoshiRi commented Dec 12, 2024

@knzo25

Using CUDA we can delete the 2 subsample filters from the raw and obstacle pointcloud. Would you prefer a different PR for that?

You mean we can remove obstacle pointcloud filtering with raw data I explained last Friday?
If it is not complicated, I think it's okay to include that feature within this PR.

@knzo25
Copy link
Contributor Author

knzo25 commented Dec 13, 2024

@YoshiRi
I mean, we can delete the Pickup subsample (the extra node). It takes about 10ms for the raw pointcloud currently. If we forego that one, of course, the new implementation will be a bit slower, but not 10ms (sorry, I haven't had the chance to provide all the evidence required here 🙇 )

Copy link
Contributor

@technolojin technolojin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except the conflict, this PR worked in my local environment.

Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>
Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>
Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>
Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>
Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>
@technolojin technolojin force-pushed the feat/cuda_probabilistic_occupancy_grid_map branch from 64d3030 to 5bfa282 Compare December 27, 2024 00:43
@technolojin
Copy link
Contributor

technolojin commented Dec 27, 2024

I accidentally pushed (-forced) into your repo. I am sorry for this.
The changes are resolving the conflict and update relatively latest main(rebase).

Copy link
Contributor

@YoshiRi YoshiRi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It worked on my environment. I left some trivial comments to fix.

int offset_dx = dx > 0 ? 1 : -1; // sign(dx);
int offset_dy = dy > 0 ? size_x : -size_x; // sign(dy) * size_x;

float scale = (dist == 0.0) ? 1.0 : min(1.f, max_length / dist);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for safety, how about using epsilon?

std::abs(dist) < epsilon

sm_lower_left_y * sm_size_x + sm_lower_left_x + region_y * sm_size_x + region_x;
const int dm_index =
dm_lower_left_y * dm_size_x + dm_lower_left_x + region_y * dm_size_x + region_x;

Copy link
Contributor

@YoshiRi YoshiRi Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[imho]
Add index check for safety

if(dm_index >= dm_size_x * dm_size_y) return;

const double p1_norm = std::max(EPSILON_PROB, std::min(1.0 - EPSILON_PROB, p1));
log_odds += std::log(p1_norm / (1.0 - p1_norm));

const double p2_norm = std::max(EPSILON_PROB, std::min(1.0 - EPSILON_PROB, p1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, maybe my mistake

Suggested change
const double p2_norm = std::max(EPSILON_PROB, std::min(1.0 - EPSILON_PROB, p1));
const double p2_norm = std::max(EPSILON_PROB, std::min(1.0 - EPSILON_PROB, p2));


__host__ __device__ __forceinline__ std::uint8_t convertProbabilityToChar(const double value)
{
return static_cast<std::uint8_t>(value * 255.0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clip before casting will be safer.

Suggested change
return static_cast<std::uint8_t>(value * 255.0);
return static_cast<std::uint8_t>(std::max(0.0, std::min(1.0, value)) * 255.0);;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:perception Advanced sensor data processing and environment understanding. (auto-assigned) run:build-and-test-differential Mark to enable build-and-test-differential workflow. (used-by-ci) tag:require-cuda-build-and-test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants