feat(tracking): Implement SimpleElementTracker with optimal matching #27

abrichr · 2025-04-06T05:02:26Z

Integrates the SimpleElementTracker to provide temporal context to the planning LLM, aiming to improve state awareness and reliability, as outlined in Issue #8. This PR includes the tracker implementation, integration into the perception and planning pipeline, associated data models, updated tests, and metrics/logging infrastructure.

Resolves: #8 (Implements core tracking, integration, and planner update)
Relates to: #26 (Defines ActionDecision structure used internally by the planner)

Key Changes:

Data Structures (omnimcp/types.py):
- Added Pydantic models: ElementTrack, ScreenAnalysis, ActionDecision, LLMAnalysisAndDecision, LoggedStep.
Tracker Implementation (omnimcp/tracking.py):
- Added SimpleElementTracker class using scipy.optimize.linear_sum_assignment for optimal matching based on element type, center proximity, and relative size similarity.
- Implemented update method for track lifecycle management.
Tracker Integration (omnimcp/visual_state.py):
- Instantiated SimpleElementTracker within VisualState.
- Modified VisualState.update to call the tracker and expose the tracked_elements_view: List[ElementTrack].
Planner Update (omnimcp/core.py):
- Refactored plan_action_for_ui to accept tracking_info: List[ElementTrack].
- Updated PROMPT_TEMPLATE to include tracked element context and request JSON output with ScreenAnalysis and ActionDecision.
- Modified call_llm_api to expect the LLMAnalysisAndDecision model.
- Added temporary conversion from the LLM's ActionDecision back to LLMActionPlan for compatibility with current AgentExecutor handlers.
Executor Update (omnimcp/agent_executor.py):
- Updated PerceptionInterface and PlannerCallable type hints.
- Retrieves tracked_elements_view from perception and passes it to the planner.
- Implemented structured metrics collection (run_metrics.json).
- Implemented structured step logging based on LoggedStep (run_log.jsonl).
Testing (tests/):
- Added unit tests for SimpleElementTracker (test_tracking.py).
- Updated tests for core, visual_state, and agent_executor to reflect changes and ensure they pass.
Dependencies: Added numpy and scipy.

Current Status:

Next Steps:

Merge this PR (after review, if applicable).
Perform end-to-end testing on the main branch (or a dedicated testing branch) using cli.py with real tasks (e.g., calculator).
Analyze logs (run.log, run_log.jsonl) and metrics (run_metrics.json) from testing.
Create follow-up issues/PRs for:
- Tuning tracker parameters and refining prompts based on E2E results.
- Refactoring AgentExecutor handlers to directly use ActionDecision (removing the temporary conversion in core.py).
- Implementing performance optimizations like Active Window Cropping.

Testing Done:

Unit tests for tracking, core, visual_state, and agent_executor pass.
End-to-end testing with the integrated system is the immediate next step after merging.

Replaces the placeholder _match_elements method in SimpleElementTracker with an implementation based on scipy.optimize.linear_sum_assignment. - Matches elements based on type and squared Euclidean distance between centers. - Applies configured distance threshold and type constraints to the cost matrix. - Basic unit tests in test_tracking.py pass with this implementation. Completes the core logic for SimpleElementTracker as specified in Issue #8.

Enhances the `_match_elements` method in `SimpleElementTracker` to improve matching robustness. - Incorporates a check for relative width and height differences between elements, using a configurable `size_rel_threshold`. This adds size similarity as a constraint alongside type and proximity. - Cost matrix for `linear_sum_assignment` now penalizes matches where elements differ significantly in size. - Removes defensive try/except blocks for `scipy`/`types` imports, assuming dependencies are met. Basic unit tests in `test_tracking.py` continue to pass with this refined matching logic.

Copilot

Copilot reviewed 1 out of 1 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (2)

omnimcp/tracking.py:122

The current structure first continues when the cost exceeds the threshold and then applies the distance constraint again, which can bypass the subsequent size constraint. Consider consolidating these checks so that the size constraint is consistently applied before deciding to mark a match as invalid.

elif cost_matrix[i, j] > self.match_threshold_sq:

omnimcp/tracking.py:114

[nitpick] Consider using float('inf') instead of the magic number 1e8 for infinity_cost to improve clarity and reduce potential edge case issues.

infinity_cost = 1e8  # Use a large number for invalid assignments

- Updates mocks in test_core.py to return the correct LLMAnalysisAndDecision structure expected by the refactored plan_action_for_ui. - Corrects assertion strings checking for prompt headings to exactly match the PROMPT_TEMPLATE content. Resolves test failures introduced during planner refactoring. All tests now pass.

Copilot

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (1)

omnimcp/tracking.py:118

The condition 'elif cost_matrix[i, j] > self.match_threshold_sq:' is redundant because a similar check is performed immediately after, causing the size constraint branch to be unreachable for those indices. Consider removing this duplicate check to ensure that the size constraint is evaluated correctly for all valid assignments.

elif cost_matrix[i, j] > self.match_threshold_sq:

Implements the SimpleElementTracker (Issue #8) and integrates it throughout the perception-planning-execution loop to provide temporal context. Refactors the planner (core.py) to utilize tracking context and output a structured ActionDecision (aligning with Issue #26), improving maintainability by using PydanticPrompt for schema generation instead of hardcoding in the prompt. Key Changes: Core Functionality: - Add SimpleElementTracker (tracking.py) with scipy-based matching (type, proximity, size). - Integrate tracker into VisualState.update (visual_state.py). - Refactor plan_action_for_ui (core.py) to accept tracking_info, update prompt template to use tracking and generated schemas, call LLM expecting LLMAnalysisAndDecision, and return ActionDecision directly. - Refactor AgentExecutor (agent_executor.py) to pass tracking_info to planner, handle ActionDecision return type, add wait/finish action handlers, and update existing handlers. Data Structures & Schemas (types.py): - Add ElementTrack, ScreenAnalysis, ActionDecision, LLMAnalysisAndDecision, LoggedStep Pydantic models. - Add PydanticPrompt decorator and docstrings to generate LLM schema documentation. Observability (agent_executor.py, utils.py, __init__.py): - Implement structured metrics collection saved to run_metrics.json. - Implement structured step logging (LoggedStep format) saved to run_log.jsonl. - Refactor setup_run_logging function into utils.py. Testing (tests/): - Add unit tests for SimpleElementTracker (test_tracking.py). - Update tests for core, agent_executor, visual_state to reflect refactoring and fix issues. All tests pass. Environment & Dependencies: - Update Python requirement to >=3.11 in pyproject.toml. - Add pydantic-prompt, scipy, numpy dependencies. - Update uv.lock. - Update CI workflow for Python 3.11/3.12.

abrichr added 2 commits April 6, 2025 00:54

abrichr marked this pull request as draft April 6, 2025 05:02

abrichr requested a review from Copilot April 6, 2025 05:02

Copilot AI reviewed Apr 6, 2025

View reviewed changes

abrichr requested a review from Copilot April 6, 2025 06:17

Copilot AI reviewed Apr 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(tracking): Implement SimpleElementTracker with optimal matching #27

feat(tracking): Implement SimpleElementTracker with optimal matching #27

Uh oh!

abrichr commented Apr 6, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

feat(tracking): Implement SimpleElementTracker with optimal matching #27

Are you sure you want to change the base?

feat(tracking): Implement SimpleElementTracker with optimal matching #27

Uh oh!

Conversation

abrichr commented Apr 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

abrichr commented Apr 6, 2025 •

edited

Loading