-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(visual-prompt): Use transformer GroudingDino to replace third party code #29
Conversation
WalkthroughThe changes involve updates across several Python files, including enhancements to linting directives, modifications to import statements, and the introduction of a new module for visual prompt actions. Key functions for object detection and optical character recognition (OCR) have been added, along with corresponding unit tests to ensure their functionality. Additionally, the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant VisualPromptActions
participant ImageProcessing
participant OCR
User->>VisualPromptActions: Request object detection and OCR
VisualPromptActions->>ImageProcessing: Call get_groundingdino_boxes
ImageProcessing-->>VisualPromptActions: Return detected boxes
VisualPromptActions->>OCR: Call get_easyocr_boxes
OCR-->>VisualPromptActions: Return recognized text
VisualPromptActions-->>User: Provide annotated image and text results
Assessment against linked issues
Recent review detailsConfiguration used: CodeRabbit UI Files selected for processing (1)
Files skipped from review due to trivial changes (1)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Outside diff range, codebase verification and nitpick comments (3)
test/actions/test_visual_prompt_actions.py (2)
14-25
: Improve the test by using a local test image and asserting the content of the result.Consider the following improvements:
- Use a local test image instead of fetching it from a URL. This will make the test more reliable and faster.
- Assert the content of the result, not just the structure. This will make the test more comprehensive.
Apply this diff to improve the test:
def test_get_groundingdino_boxes_single_image(): - url = "http://images.cocodataset.org/val2017/000000039769.jpg" - image = Image.open(requests.get(url, stream=True).raw) + test_dir = Path(__file__).parent.parent + image_path = test_dir / "_assets" / "cat.jpg" + image = Image.open(image_path) text = "a cat." box_threshold = 0.4 text_threshold = 0.3 result = get_groundingdino_boxes(image, text, box_threshold, text_threshold) assert len(result) == 1 - assert len(result[0]) > 0 - assert len(result[0][0]) == 2 + assert result[0] == [(100, 200), (300, 400)]
27-43
: Improve the test by using local test images and asserting the content of the result.Consider the following improvements:
- Use local test images instead of fetching them from URLs. This will make the test more reliable and faster.
- Assert the content of the result, not just the structure. This will make the test more comprehensive.
Apply this diff to improve the test:
def test_get_groundingdino_boxes_multi_image(): - url1 = "http://images.cocodataset.org/val2017/000000039769.jpg" - url2 = "https://farm5.staticflickr.com/4005/4666183752_c5b79faa17_z.jpg" - image1 = Image.open(requests.get(url1, stream=True).raw) - image2 = Image.open(requests.get(url2, stream=True).raw) + test_dir = Path(__file__).parent.parent + image1_path = test_dir / "_assets" / "cat.jpg" + image2_path = test_dir / "_assets" / "car.jpg" + image1 = Image.open(image1_path) + image2 = Image.open(image2_path) text = "a cat. a car." box_threshold = 0.4 text_threshold = 0.3 result = get_groundingdino_boxes( [image1, image2], text, box_threshold, text_threshold ) assert len(result) == 2 - assert len(result[0]) > 0 - assert len(result[1]) > 0 - assert len(result[0][0]) == 2 + assert result[0] == [(100, 200), (300, 400)] + assert result[1] == [(500, 600), (700, 800)]pyproject.toml (1)
63-64
: Potential Impact of Removingnumpy
DependencyThe
numpy
library is still being used in the following files, which suggests that its removal from the dependencies could affect the project's functionality:
crab-benchmark-v0/dataset/ubuntu_subtasks.py
crab/actions/visual_prompt_actions.py
Please verify whether these usages are necessary and consider re-adding
numpy
if required. No references totorchvision
,addict
,yapf
,matplotlib
, orpycocotools
were found, indicating their removal might not impact the project.Analysis chain
Removal of several dependencies.
Dependencies such as
torchvision
,numpy
,addict
,yapf
,matplotlib
, andpycocotools
have been removed. This suggests a shift away from these libraries, likely due to the adoption ofGroundingDino
which might not require these. Confirm that the removal of these dependencies does not impact other parts of the project that might still rely on them.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify if removed dependencies are still referenced in the project. # Test: Search for imports of the removed libraries in Python files. rg --type python '(import|from) (torchvision|numpy|addict|yapf|matplotlib|pycocotools)'Length of output: 120
Script:
#!/bin/bash # Description: Verify if removed dependencies are still referenced in the project. # Test: Search for imports of the removed libraries in Python files by specifying the file extension. rg '(import|from) (torchvision|numpy|addict|yapf|matplotlib|pycocotools)' --glob '*.py' -A 2Length of output: 628
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files ignored due to path filters (3)
poetry.lock
is excluded by!**/*.lock
test/_assets/android_screenshot.png
is excluded by!**/*.png
test/_assets/ubuntu_screenshot.png
is excluded by!**/*.png
Files selected for processing (6)
- crab-benchmark-v0/dataset/android_subtasks.py (1 hunks)
- crab-benchmark-v0/dataset/handmade_tasks.py (2 hunks)
- crab-benchmark-v0/main.py (2 hunks)
- crab/actions/visual_prompt_actions.py (1 hunks)
- pyproject.toml (2 hunks)
- test/actions/test_visual_prompt_actions.py (1 hunks)
Files skipped from review due to trivial changes (3)
- crab-benchmark-v0/dataset/android_subtasks.py
- crab-benchmark-v0/dataset/handmade_tasks.py
- crab-benchmark-v0/main.py
Additional comments not posted (8)
test/actions/test_visual_prompt_actions.py (1)
45-61
: LGTM!The code changes are approved.
pyproject.toml (2)
30-30
: Addition ofsetuptools
dependency.The addition of
setuptools
with version^73.0.1
is noted. This is a common dependency for managing packages and dependencies in Python, often required for building and installing modules. Ensure that this addition is necessary for the project's build process, especially sincepoetry
is already being used as the build system and dependency manager.
63-64
: Update oftransformers
andtorch
dependencies.The versions of
transformers
andtorch
have been updated to4.44.1
and^2.4.0
respectively. This aligns with the PR's objective to use the latest stable versions for improved functionality and support. Ensure that these updates are compatible with other project dependencies and that no breaking changes affect the project's functionality.crab/actions/visual_prompt_actions.py (5)
1-13
: Review the license and copyright header.The copyright and licensing information is clearly stated and follows standard practices. Ensure that this header is consistent with the project's licensing policies.
33-49
: Review utility functions for geometric calculations.The functions
calculate_iou
,calculate_size
, andcalculate_center
are implemented correctly and efficiently. These functions are essential for handling bounding boxes in image processing tasks.
136-150
: Review theget_grounding_dino_model
function.The function is well-documented and uses caching to avoid redundant model loading, which is a good practice for performance. The implementation correctly handles the model loading with a dynamic type selection. Ensure that the
device
variable is globally available as it is used here.
271-282
: Review theget_elements_prompt
action function.The function constructs a user prompt based on detected elements. The implementation is straightforward and correctly formats the output. However, consider adding error handling for cases where the input might not be as expected.
14-22
: Review the import statements.The import statements are generally well-organized. However, the repeated import of
torch
in lines 16 and 26 is redundant and should be removed to clean up the code.Suggest removing the duplicate import:
- import torch
Likely invalid or redundant comment.
def filter_boxes_by_center(boxes_with_label, center_dis_thresh): | ||
boxes = [box[0] for box in boxes_with_label] | ||
boxes_to_remove = set() | ||
for i in range(len(boxes)): | ||
if i in boxes_to_remove: | ||
continue | ||
center_i = calculate_center(boxes[i]) | ||
for j in range(i + 1, len(boxes)): | ||
center_j = calculate_center(boxes[j]) | ||
# fmt: off | ||
center_close = ((center_i[0] - center_j[0]) ** 2 + | ||
(center_i[1] - center_j[1]) ** 2 < | ||
center_dis_thresh**2) | ||
# fmt: on | ||
if center_close: | ||
boxes_to_remove.add(j) | ||
|
||
boxes_filt = [ | ||
box for idx, box in enumerate(boxes_with_label) if idx not in boxes_to_remove | ||
] | ||
return boxes_filt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optimize and refactor filter_boxes_by_center
.
The function uses nested loops to compare box centers, which can be inefficient for a large number of boxes. Consider using a more efficient algorithm or data structure, such as a k-d tree, to handle spatial queries more efficiently.
Consider refactoring to use spatial data structures for efficiency:
from scipy.spatial import KDTree
def filter_boxes_by_center(boxes_with_label, center_dis_thresh):
centers = [calculate_center(box[0]) for box in boxes_with_label]
tree = KDTree(centers)
boxes_to_remove = set()
for i, center in enumerate(centers):
if i in boxes_to_remove:
continue
# Query the tree for neighbors within the distance threshold
indices = tree.query_ball_point(center, center_dis_thresh)
for j in indices:
if j > i:
boxes_to_remove.add(j)
return [box for idx, box in enumerate(boxes_with_label) if idx not in boxes_to_remove]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- crab/actions/visual_prompt_actions.py (1 hunks)
Files skipped from review due to trivial changes (1)
- crab/actions/visual_prompt_actions.py
box1Area = (box1[2] - box1[0]) * (box1[3] - box1[1]) | ||
box2Area = (box2[2] - box2[0]) * (box2[3] - box2[1]) | ||
unionArea = box1Area + box2Area - interArea | ||
iou = interArea / unionArea |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unionArea could be 0 when one of the box area is zero or there is no intersection area between two boxes
if unionArea == 0: return 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The union area is the whole occupied area of of box1 and box2. So I think UnionArea==0
only happens when box1 and box2 are both 0, which may be impossible in our situation. But checking can be useful some time in the future. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically LGTM. Some type hints on functions (private, I know) would improve the readability tho
|
||
|
||
@action(local=True) | ||
def get_elements_prompt(input: tuple[str, list[tuple[BoxType, str]]], env): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about this, but is the env
param here necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The env
param is refer to the Environment
object where the action is executed. It's like self
in a class method.
Running new version on a CUDA machine will highly increse the inference speed. Here is a comparision: GPU: RTX4500. Old:
New:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thx
transformers v4.44 provides stable GroundingDino implementation.
This PR removes all third party code and use transformers instead.
close #14
Summary by CodeRabbit
Summary by CodeRabbit
New Features
Bug Fixes
Chores
Tests