-
Notifications
You must be signed in to change notification settings - Fork 8
feat(tracking): Implement SimpleElementTracker with optimal matching #27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Replaces the placeholder _match_elements method in SimpleElementTracker with an implementation based on scipy.optimize.linear_sum_assignment. - Matches elements based on type and squared Euclidean distance between centers. - Applies configured distance threshold and type constraints to the cost matrix. - Basic unit tests in test_tracking.py pass with this implementation. Completes the core logic for SimpleElementTracker as specified in Issue #8.
Enhances the `_match_elements` method in `SimpleElementTracker` to improve matching robustness. - Incorporates a check for relative width and height differences between elements, using a configurable `size_rel_threshold`. This adds size similarity as a constraint alongside type and proximity. - Cost matrix for `linear_sum_assignment` now penalizes matches where elements differ significantly in size. - Removes defensive try/except blocks for `scipy`/`types` imports, assuming dependencies are met. Basic unit tests in `test_tracking.py` continue to pass with this refined matching logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 1 out of 1 changed files in this pull request and generated no comments.
Comments suppressed due to low confidence (2)
omnimcp/tracking.py:122
- The current structure first continues when the cost exceeds the threshold and then applies the distance constraint again, which can bypass the subsequent size constraint. Consider consolidating these checks so that the size constraint is consistently applied before deciding to mark a match as invalid.
elif cost_matrix[i, j] > self.match_threshold_sq:
omnimcp/tracking.py:114
- [nitpick] Consider using float('inf') instead of the magic number 1e8 for infinity_cost to improve clarity and reduce potential edge case issues.
infinity_cost = 1e8 # Use a large number for invalid assignments
- Updates mocks in test_core.py to return the correct LLMAnalysisAndDecision structure expected by the refactored plan_action_for_ui. - Corrects assertion strings checking for prompt headings to exactly match the PROMPT_TEMPLATE content. Resolves test failures introduced during planner refactoring. All tests now pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.
Comments suppressed due to low confidence (1)
omnimcp/tracking.py:118
- The condition 'elif cost_matrix[i, j] > self.match_threshold_sq:' is redundant because a similar check is performed immediately after, causing the size constraint branch to be unreachable for those indices. Consider removing this duplicate check to ensure that the size constraint is evaluated correctly for all valid assignments.
elif cost_matrix[i, j] > self.match_threshold_sq:
Implements the SimpleElementTracker (Issue #8) and integrates it throughout the perception-planning-execution loop to provide temporal context. Refactors the planner (core.py) to utilize tracking context and output a structured ActionDecision (aligning with Issue #26), improving maintainability by using PydanticPrompt for schema generation instead of hardcoding in the prompt. Key Changes: Core Functionality: - Add SimpleElementTracker (tracking.py) with scipy-based matching (type, proximity, size). - Integrate tracker into VisualState.update (visual_state.py). - Refactor plan_action_for_ui (core.py) to accept tracking_info, update prompt template to use tracking and generated schemas, call LLM expecting LLMAnalysisAndDecision, and return ActionDecision directly. - Refactor AgentExecutor (agent_executor.py) to pass tracking_info to planner, handle ActionDecision return type, add wait/finish action handlers, and update existing handlers. Data Structures & Schemas (types.py): - Add ElementTrack, ScreenAnalysis, ActionDecision, LLMAnalysisAndDecision, LoggedStep Pydantic models. - Add PydanticPrompt decorator and docstrings to generate LLM schema documentation. Observability (agent_executor.py, utils.py, __init__.py): - Implement structured metrics collection saved to run_metrics.json. - Implement structured step logging (LoggedStep format) saved to run_log.jsonl. - Refactor setup_run_logging function into utils.py. Testing (tests/): - Add unit tests for SimpleElementTracker (test_tracking.py). - Update tests for core, agent_executor, visual_state to reflect refactoring and fix issues. All tests pass. Environment & Dependencies: - Update Python requirement to >=3.11 in pyproject.toml. - Add pydantic-prompt, scipy, numpy dependencies. - Update uv.lock. - Update CI workflow for Python 3.11/3.12.
Integrates the
SimpleElementTracker
to provide temporal context to the planning LLM, aiming to improve state awareness and reliability, as outlined in Issue #8. This PR includes the tracker implementation, integration into the perception and planning pipeline, associated data models, updated tests, and metrics/logging infrastructure.Resolves: #8 (Implements core tracking, integration, and planner update)
Relates to: #26 (Defines
ActionDecision
structure used internally by the planner)Key Changes:
omnimcp/types.py
):ElementTrack
,ScreenAnalysis
,ActionDecision
,LLMAnalysisAndDecision
,LoggedStep
.omnimcp/tracking.py
):SimpleElementTracker
class usingscipy.optimize.linear_sum_assignment
for optimal matching based on elementtype
, center proximity, and relative size similarity.update
method for track lifecycle management.omnimcp/visual_state.py
):SimpleElementTracker
withinVisualState
.VisualState.update
to call the tracker and expose thetracked_elements_view: List[ElementTrack]
.omnimcp/core.py
):plan_action_for_ui
to accepttracking_info: List[ElementTrack]
.PROMPT_TEMPLATE
to include tracked element context and request JSON output withScreenAnalysis
andActionDecision
.call_llm_api
to expect theLLMAnalysisAndDecision
model.ActionDecision
back toLLMActionPlan
for compatibility with currentAgentExecutor
handlers.omnimcp/agent_executor.py
):PerceptionInterface
andPlannerCallable
type hints.tracked_elements_view
from perception and passes it to the planner.run_metrics.json
).LoggedStep
(run_log.jsonl
).tests/
):SimpleElementTracker
(test_tracking.py
).core
,visual_state
, andagent_executor
to reflect changes and ensure they pass.numpy
andscipy
.Current Status:
VisualState
.core.py
) updated to use tracking data & new LLM I/O (with temp conversion).AgentExecutor
updated for data flow & logging.Next Steps:
main
branch (or a dedicated testing branch) usingcli.py
with real tasks (e.g., calculator).run.log
,run_log.jsonl
) and metrics (run_metrics.json
) from testing.AgentExecutor
handlers to directly useActionDecision
(removing the temporary conversion incore.py
).Testing Done:
tracking
,core
,visual_state
, andagent_executor
pass.