Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semantic Labeling #203

Draft
wants to merge 40 commits into
base: ros2-devel
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
8ed6683
init
sriramk117 Aug 7, 2024
dcb74b7
added neccessary callback functions
sriramk117 Aug 11, 2024
c611e78
implemented functionality to run sam and groundingdino
sriramk117 Aug 13, 2024
4fb0258
wrote vision pipeline and execute callback
sriramk117 Aug 14, 2024
5d81216
created result message returned by vision pipeline
sriramk117 Aug 14, 2024
bfcaeaa
modified launch file and created yaml file for parameters
sriramk117 Aug 14, 2024
19e9275
updated setup.py and modified parameters
sriramk117 Aug 16, 2024
c29520d
Merge branch 'ros2-devel' into sriramk/semantic-labeling
sriramk117 Aug 16, 2024
4e7391f
added requirements to install and fixed imports
sriramk117 Aug 16, 2024
9c43fc4
changed grounding dino path and added checkpoint
sriramk117 Sep 12, 2024
31551e4
Added config file + fixed image transformations
sriramk117 Sep 12, 2024
3ec7b50
Added GroundingDINO visualization function
sriramk117 Sep 14, 2024
9dc9a40
created GroundingDINO publisher for testing
sriramk117 Sep 16, 2024
929e570
added more testing code for bbox visualization
sriramk117 Sep 16, 2024
e1ebf8b
fixed groundingdino results visualization
sriramk117 Sep 18, 2024
704caa1
corrected image preprocessing?
sriramk117 Sep 19, 2024
4f9305d
groundingdino works!
sriramk117 Sep 23, 2024
024c71c
masks are now displayable
sriramk117 Sep 23, 2024
c78cd4a
record vision pipeline inference time
sriramk117 Sep 24, 2024
e503800
wrote code to generate mask messages during action calls
sriramk117 Sep 27, 2024
648a46e
masks msgs are generated but action keeps aborting
sriramk117 Sep 30, 2024
3032f65
Added gpt-4o query functionality
sriramk117 Nov 7, 2024
85f9577
groundingdino can be downloaded via github url
Nov 8, 2024
0049598
updated comments/code quality changes
Nov 8, 2024
e9fd4d5
invoking gpt-4o has been transformed into a service
sriramk117 Nov 8, 2024
9d52d98
segment all items action now takes a single string as input
sriramk117 Nov 8, 2024
30bc036
added env variables
sriramk117 Nov 9, 2024
4d3b27c
environment variables not loading?
sriramk117 Nov 9, 2024
94af48e
ran black formatter
sriramk117 Nov 9, 2024
29ed345
Merge branch 'ros2-devel' into sriramk/semantic-labeling
sriramk117 Nov 9, 2024
23577ae
changes to segmentallitems node initializing it as a perception node
sriramk117 Nov 9, 2024
3688541
fixed error of topics not being received by segmentallitems action
sriramk117 Nov 9, 2024
195b123
code cleanup
Nov 9, 2024
b8a4ccb
running gpt-4o inference is now an action not a service
sriramk117 Dec 3, 2024
5363732
cleaned up some comments
sriramk117 Dec 3, 2024
d73c983
goal status cancellation
sriramk117 Dec 5, 2024
2326742
temporary changes for running testing procedures
sriramk117 Dec 11, 2024
b95ac8e
republisher.yaml reverted to original
sriramk117 Jan 2, 2025
4bf52ea
fixed cv2 visualization merge conflict
sriramk117 Jan 2, 2025
9382b67
segmentation inference optimization workin
sriramk117 Jan 3, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@
build/
__pycache__/

# Environment Variables file
.env

Comment on lines +5 to +7
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a .env file added in this PR, but I'm guessing this was more for personal use. I'd recommend omitting this change unless it's relevant for the functionality of the PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed more references to a env file later in the code, where exactly does this come into play?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm adding environment variable functionality to our codebase so we can privately store API keys without exposing them publicly in github. In this particular case, it is for accessing the PRL OpenAI API key to invoke GPT-4o.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this may come in handy later on as well if we power perception w/ foundation models in the future.

# Compiled Object files
*.slo
*.lo
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -336,7 +336,7 @@ def update(self) -> py_trees.common.Status:
x_unit.vector, x_pos.vector
)

# # If you need to send a fixed food frame to the robot arm, e.g., to
# # If you need to send a fixed food frame to the robot arm, e.g., to
sriramk117 marked this conversation as resolved.
Show resolved Hide resolved
# # debug off-centering issues, uncomment this and modify the translation.
# deg = 90 # fork roll
# world_to_food_transform.transform.translation.x = 0.26262263022586224
Expand Down
2 changes: 2 additions & 0 deletions ada_feeding_msgs/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,12 @@ rosidl_generate_interfaces(${PROJECT_NAME}

"action/AcquireFood.action"
"action/ActivateController.action"
"action/GenerateCaption.action"
"action/MoveTo.action"
"action/MoveToConfiguration.action"
"action/MoveToMouth.action"
"action/SegmentAllItems.action"
"action/SegmentFromBox.action"
"action/SegmentFromPoint.action"
"action/Teleoperate.action"
"action/Trigger.action"
Expand Down
28 changes: 28 additions & 0 deletions ada_feeding_msgs/action/GenerateCaption.action
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# The interface for an action that takes in a list of input labels
# describing the food items on a plate and returns a sentence caption compiling
# these labels used as a query for GroundingDINO detection.

# A list of semantic labels corresponding to each of the masks of detected
# items in the image
string[] input_labels
---
# Possible return statuses
uint8 STATUS_SUCCEEDED=0
uint8 STATUS_FAILED=1
uint8 STATUS_CANCELED=3
uint8 STATUS_UNKNOWN=99

# Whether the vision pipeline succeeded and if not, why
uint8 status

# The header for the image that the generated caption by GPT-4o
# corresponds to
std_msgs/Header header
# The camera intrinsics
sensor_msgs/CameraInfo camera_info
# A sentence caption compiling the semantic labels used as a query for
# GroundingDINO to perform bounding box detections.
string caption
---
# How much time the action has spent running inference on GPT-4o
builtin_interfaces/Duration elapsed_time
15 changes: 10 additions & 5 deletions ada_feeding_msgs/action/SegmentAllItems.action
Original file line number Diff line number Diff line change
@@ -1,22 +1,27 @@
# The interface for an action that gets an image from the camera and returns
# the masks of all segmented items within that image.
# the bounding boxes of all items within that image.

# The list of input semantic labels for the food items on the plate
string caption
Comment on lines +4 to +5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment seems misleading, I suspect this was an old comment for item_labels

---
# Possible return statuses
uint8 STATUS_SUCCEEDED=0
uint8 STATUS_FAILED=1
uint8 STATUS_CANCELED=3
uint8 STATUS_UNKNOWN=99

# Whether the segmentation succeeded and if not, why
# Whether the vision pipeline succeeded and if not, why
uint8 status

# The header for the image that the masks corresponds to
std_msgs/Header header
# The camera intrinsics
sensor_msgs/CameraInfo camera_info
# Masks of all the detected items in the image
ada_feeding_msgs/Mask[] detected_items
# Bounding boxes of all the detected items in the image
sensor_msgs/RegionOfInterest[] detected_items
# A list of semantic labels corresponding to each of the masks of detected
# items in the image
string[] item_labels
---
# How much time the action has spent segmenting the food item
# How much time the action has spent running the vision pipeline
builtin_interfaces/Duration elapsed_time
28 changes: 28 additions & 0 deletions ada_feeding_msgs/action/SegmentFromBox.action
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# The interface for an action that gets an image from the camera and a bounding
# box of the desired item to segment, and then returns the pixel-wise mask
# of that item

# The region of interest (bounding box) to seed the segmentation algorithm with
sensor_msgs/RegionOfInterest region_of_interest

# The semantic label describing the item bounded by the region of interest
string label
---
# Possible return statuses
uint8 STATUS_SUCCEEDED=0
uint8 STATUS_FAILED=1
uint8 STATUS_CANCELED=3
uint8 STATUS_UNKNOWN=99

# Whether the segmentation succeeded and if not, why
uint8 status

# The header for the image that the masks corresponds to
std_msgs/Header header
# The camera intrinsics
sensor_msgs/CameraInfo camera_info
# Top contender mask segmented given a bounding box of an item
ada_feeding_msgs/Mask detected_item
---
# How much time the action has spent segmenting the food item
builtin_interfaces/Duration elapsed_time
3 changes: 3 additions & 0 deletions ada_feeding_msgs/msg/Mask.msg
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,9 @@ float64 average_depth
# An arbitrary ID that defines the segmented item
string item_id

# An ID that semantically labels a specific, segmented item
string object_id

# A score that indicates how confident the segemntation algorithm is in
# this mask.
float64 confidence
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,7 @@ def main(args=None):
# pylint: disable=import-outside-toplevel
from ada_feeding_perception.face_detection import FaceDetectionNode
from ada_feeding_perception.food_on_fork_detection import FoodOnForkDetectionNode
from ada_feeding_perception.segment_all_items import SegmentAllItemsNode
from ada_feeding_perception.segment_from_point import SegmentFromPointNode
from ada_feeding_perception.table_detection import TableDetectionNode

Expand All @@ -178,6 +179,7 @@ def main(args=None):
node = ADAFeedingPerceptionNode("ada_feeding_perception")
face_detection = FaceDetectionNode(node)
food_on_fork_detection = FoodOnForkDetectionNode(node)
segment_all_items = SegmentAllItemsNode(node) # pylint: disable=unused-variable
segment_from_point = SegmentFromPointNode(node) # pylint: disable=unused-variable
table_detection = TableDetectionNode(node)
executor = MultiThreadedExecutor(num_threads=16)
Expand Down
Loading