-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Semantic Labeling #203
Draft
sriramk117
wants to merge
40
commits into
ros2-devel
Choose a base branch
from
sriramk/semantic-labeling
base: ros2-devel
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Semantic Labeling #203
Changes from all commits
Commits
Show all changes
40 commits
Select commit
Hold shift + click to select a range
8ed6683
init
sriramk117 dcb74b7
added neccessary callback functions
sriramk117 c611e78
implemented functionality to run sam and groundingdino
sriramk117 4fb0258
wrote vision pipeline and execute callback
sriramk117 5d81216
created result message returned by vision pipeline
sriramk117 bfcaeaa
modified launch file and created yaml file for parameters
sriramk117 19e9275
updated setup.py and modified parameters
sriramk117 c29520d
Merge branch 'ros2-devel' into sriramk/semantic-labeling
sriramk117 4e7391f
added requirements to install and fixed imports
sriramk117 9c43fc4
changed grounding dino path and added checkpoint
sriramk117 31551e4
Added config file + fixed image transformations
sriramk117 3ec7b50
Added GroundingDINO visualization function
sriramk117 9dc9a40
created GroundingDINO publisher for testing
sriramk117 929e570
added more testing code for bbox visualization
sriramk117 e1ebf8b
fixed groundingdino results visualization
sriramk117 704caa1
corrected image preprocessing?
sriramk117 4f9305d
groundingdino works!
sriramk117 024c71c
masks are now displayable
sriramk117 c78cd4a
record vision pipeline inference time
sriramk117 e503800
wrote code to generate mask messages during action calls
sriramk117 648a46e
masks msgs are generated but action keeps aborting
sriramk117 3032f65
Added gpt-4o query functionality
sriramk117 85f9577
groundingdino can be downloaded via github url
0049598
updated comments/code quality changes
e9fd4d5
invoking gpt-4o has been transformed into a service
sriramk117 9d52d98
segment all items action now takes a single string as input
sriramk117 30bc036
added env variables
sriramk117 4d3b27c
environment variables not loading?
sriramk117 94af48e
ran black formatter
sriramk117 29ed345
Merge branch 'ros2-devel' into sriramk/semantic-labeling
sriramk117 23577ae
changes to segmentallitems node initializing it as a perception node
sriramk117 3688541
fixed error of topics not being received by segmentallitems action
sriramk117 195b123
code cleanup
b8a4ccb
running gpt-4o inference is now an action not a service
sriramk117 5363732
cleaned up some comments
sriramk117 d73c983
goal status cancellation
sriramk117 2326742
temporary changes for running testing procedures
sriramk117 b95ac8e
republisher.yaml reverted to original
sriramk117 4bf52ea
fixed cv2 visualization merge conflict
sriramk117 9382b67
segmentation inference optimization workin
sriramk117 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,6 +2,9 @@ | |
build/ | ||
__pycache__/ | ||
|
||
# Environment Variables file | ||
.env | ||
|
||
# Compiled Object files | ||
*.slo | ||
*.lo | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
# The interface for an action that takes in a list of input labels | ||
# describing the food items on a plate and returns a sentence caption compiling | ||
# these labels used as a query for GroundingDINO detection. | ||
|
||
# A list of semantic labels corresponding to each of the masks of detected | ||
# items in the image | ||
string[] input_labels | ||
--- | ||
# Possible return statuses | ||
uint8 STATUS_SUCCEEDED=0 | ||
uint8 STATUS_FAILED=1 | ||
uint8 STATUS_CANCELED=3 | ||
uint8 STATUS_UNKNOWN=99 | ||
|
||
# Whether the vision pipeline succeeded and if not, why | ||
uint8 status | ||
|
||
# The header for the image that the generated caption by GPT-4o | ||
# corresponds to | ||
std_msgs/Header header | ||
# The camera intrinsics | ||
sensor_msgs/CameraInfo camera_info | ||
# A sentence caption compiling the semantic labels used as a query for | ||
# GroundingDINO to perform bounding box detections. | ||
string caption | ||
--- | ||
# How much time the action has spent running inference on GPT-4o | ||
builtin_interfaces/Duration elapsed_time |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,22 +1,27 @@ | ||
# The interface for an action that gets an image from the camera and returns | ||
# the masks of all segmented items within that image. | ||
# the bounding boxes of all items within that image. | ||
|
||
# The list of input semantic labels for the food items on the plate | ||
string caption | ||
Comment on lines
+4
to
+5
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Comment seems misleading, I suspect this was an old comment for |
||
--- | ||
# Possible return statuses | ||
uint8 STATUS_SUCCEEDED=0 | ||
uint8 STATUS_FAILED=1 | ||
uint8 STATUS_CANCELED=3 | ||
uint8 STATUS_UNKNOWN=99 | ||
|
||
# Whether the segmentation succeeded and if not, why | ||
# Whether the vision pipeline succeeded and if not, why | ||
uint8 status | ||
|
||
# The header for the image that the masks corresponds to | ||
std_msgs/Header header | ||
# The camera intrinsics | ||
sensor_msgs/CameraInfo camera_info | ||
# Masks of all the detected items in the image | ||
ada_feeding_msgs/Mask[] detected_items | ||
# Bounding boxes of all the detected items in the image | ||
sensor_msgs/RegionOfInterest[] detected_items | ||
# A list of semantic labels corresponding to each of the masks of detected | ||
# items in the image | ||
string[] item_labels | ||
--- | ||
# How much time the action has spent segmenting the food item | ||
# How much time the action has spent running the vision pipeline | ||
builtin_interfaces/Duration elapsed_time |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
# The interface for an action that gets an image from the camera and a bounding | ||
# box of the desired item to segment, and then returns the pixel-wise mask | ||
# of that item | ||
|
||
# The region of interest (bounding box) to seed the segmentation algorithm with | ||
sensor_msgs/RegionOfInterest region_of_interest | ||
|
||
# The semantic label describing the item bounded by the region of interest | ||
string label | ||
--- | ||
# Possible return statuses | ||
uint8 STATUS_SUCCEEDED=0 | ||
uint8 STATUS_FAILED=1 | ||
uint8 STATUS_CANCELED=3 | ||
uint8 STATUS_UNKNOWN=99 | ||
|
||
# Whether the segmentation succeeded and if not, why | ||
uint8 status | ||
|
||
# The header for the image that the masks corresponds to | ||
std_msgs/Header header | ||
# The camera intrinsics | ||
sensor_msgs/CameraInfo camera_info | ||
# Top contender mask segmented given a bounding box of an item | ||
ada_feeding_msgs/Mask detected_item | ||
--- | ||
# How much time the action has spent segmenting the food item | ||
builtin_interfaces/Duration elapsed_time |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a
.env
file added in this PR, but I'm guessing this was more for personal use. I'd recommend omitting this change unless it's relevant for the functionality of the PR.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed more references to a
env
file later in the code, where exactly does this come into play?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm adding environment variable functionality to our codebase so we can privately store API keys without exposing them publicly in github. In this particular case, it is for accessing the PRL OpenAI API key to invoke GPT-4o.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming this may come in handy later on as well if we power perception w/ foundation models in the future.