Skip to content

feat(tracking): Implement SimpleElementTracker with optimal matching #27

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

abrichr
Copy link
Member

@abrichr abrichr commented Apr 6, 2025

Integrates the SimpleElementTracker to provide temporal context to the planning LLM, aiming to improve state awareness and reliability, as outlined in Issue #8. This PR includes the tracker implementation, integration into the perception and planning pipeline, associated data models, updated tests, and metrics/logging infrastructure.

Resolves: #8 (Implements core tracking, integration, and planner update)
Relates to: #26 (Defines ActionDecision structure used internally by the planner)

Key Changes:

  • Data Structures (omnimcp/types.py):
    • Added Pydantic models: ElementTrack, ScreenAnalysis, ActionDecision, LLMAnalysisAndDecision, LoggedStep.
  • Tracker Implementation (omnimcp/tracking.py):
    • Added SimpleElementTracker class using scipy.optimize.linear_sum_assignment for optimal matching based on element type, center proximity, and relative size similarity.
    • Implemented update method for track lifecycle management.
  • Tracker Integration (omnimcp/visual_state.py):
    • Instantiated SimpleElementTracker within VisualState.
    • Modified VisualState.update to call the tracker and expose the tracked_elements_view: List[ElementTrack].
  • Planner Update (omnimcp/core.py):
    • Refactored plan_action_for_ui to accept tracking_info: List[ElementTrack].
    • Updated PROMPT_TEMPLATE to include tracked element context and request JSON output with ScreenAnalysis and ActionDecision.
    • Modified call_llm_api to expect the LLMAnalysisAndDecision model.
    • Added temporary conversion from the LLM's ActionDecision back to LLMActionPlan for compatibility with current AgentExecutor handlers.
  • Executor Update (omnimcp/agent_executor.py):
    • Updated PerceptionInterface and PlannerCallable type hints.
    • Retrieves tracked_elements_view from perception and passes it to the planner.
    • Implemented structured metrics collection (run_metrics.json).
    • Implemented structured step logging based on LoggedStep (run_log.jsonl).
  • Testing (tests/):
    • Added unit tests for SimpleElementTracker (test_tracking.py).
    • Updated tests for core, visual_state, and agent_executor to reflect changes and ensure they pass.
  • Dependencies: Added numpy and scipy.

Current Status:

  • Tracker logic implemented and unit-tested.
  • Data structures defined.
  • Metrics/logging infrastructure added.
  • Tracker integrated into VisualState.
  • Planner (core.py) updated to use tracking data & new LLM I/O (with temp conversion).
  • AgentExecutor updated for data flow & logging.
  • All relevant unit tests updated and passing.
  • End-to-end testing and validation needed.
  • Refactor planner return / executor handlers (remove temp conversion).
  • Tune tracker/prompt parameters based on testing.
  • (Separate Task/Issue) Implement Active Window Cropping.

Next Steps:

  1. Merge this PR (after review, if applicable).
  2. Perform end-to-end testing on the main branch (or a dedicated testing branch) using cli.py with real tasks (e.g., calculator).
  3. Analyze logs (run.log, run_log.jsonl) and metrics (run_metrics.json) from testing.
  4. Create follow-up issues/PRs for:
    • Tuning tracker parameters and refining prompts based on E2E results.
    • Refactoring AgentExecutor handlers to directly use ActionDecision (removing the temporary conversion in core.py).
    • Implementing performance optimizations like Active Window Cropping.

Testing Done:

  • Unit tests for tracking, core, visual_state, and agent_executor pass.
  • End-to-end testing with the integrated system is the immediate next step after merging.

abrichr added 2 commits April 6, 2025 00:54
Replaces the placeholder _match_elements method in SimpleElementTracker
with an implementation based on scipy.optimize.linear_sum_assignment.

- Matches elements based on type and squared Euclidean distance between centers.
- Applies configured distance threshold and type constraints to the cost matrix.
- Basic unit tests in test_tracking.py pass with this implementation.

Completes the core logic for SimpleElementTracker as specified in Issue #8.
Enhances the `_match_elements` method in `SimpleElementTracker` to improve matching robustness.

- Incorporates a check for relative width and height differences between elements, using a configurable `size_rel_threshold`. This adds size similarity as a constraint alongside type and proximity.
- Cost matrix for `linear_sum_assignment` now penalizes matches where elements differ significantly in size.
- Removes defensive try/except blocks for `scipy`/`types` imports, assuming dependencies are met.

Basic unit tests in `test_tracking.py` continue to pass with this refined matching logic.
@abrichr abrichr marked this pull request as draft April 6, 2025 05:02
@abrichr abrichr requested a review from Copilot April 6, 2025 05:02
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 1 out of 1 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (2)

omnimcp/tracking.py:122

  • The current structure first continues when the cost exceeds the threshold and then applies the distance constraint again, which can bypass the subsequent size constraint. Consider consolidating these checks so that the size constraint is consistently applied before deciding to mark a match as invalid.
elif cost_matrix[i, j] > self.match_threshold_sq:

omnimcp/tracking.py:114

  • [nitpick] Consider using float('inf') instead of the magic number 1e8 for infinity_cost to improve clarity and reduce potential edge case issues.
infinity_cost = 1e8  # Use a large number for invalid assignments

- Updates mocks in test_core.py to return the correct LLMAnalysisAndDecision structure expected by the refactored plan_action_for_ui.
- Corrects assertion strings checking for prompt headings to exactly match the PROMPT_TEMPLATE content.

Resolves test failures introduced during planner refactoring. All tests now pass.
@abrichr abrichr requested a review from Copilot April 6, 2025 06:17
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (1)

omnimcp/tracking.py:118

  • The condition 'elif cost_matrix[i, j] > self.match_threshold_sq:' is redundant because a similar check is performed immediately after, causing the size constraint branch to be unreachable for those indices. Consider removing this duplicate check to ensure that the size constraint is evaluated correctly for all valid assignments.
elif cost_matrix[i, j] > self.match_threshold_sq:

Implements the SimpleElementTracker (Issue #8) and integrates it throughout the
perception-planning-execution loop to provide temporal context. Refactors the
planner (core.py) to utilize tracking context and output a structured
ActionDecision (aligning with Issue #26), improving maintainability by using
PydanticPrompt for schema generation instead of hardcoding in the prompt.

Key Changes:

Core Functionality:
- Add SimpleElementTracker (tracking.py) with scipy-based matching (type, proximity, size).
- Integrate tracker into VisualState.update (visual_state.py).
- Refactor plan_action_for_ui (core.py) to accept tracking_info, update prompt template to use tracking and generated schemas, call LLM expecting LLMAnalysisAndDecision, and return ActionDecision directly.
- Refactor AgentExecutor (agent_executor.py) to pass tracking_info to planner, handle ActionDecision return type, add wait/finish action handlers, and update existing handlers.

Data Structures & Schemas (types.py):
- Add ElementTrack, ScreenAnalysis, ActionDecision, LLMAnalysisAndDecision, LoggedStep Pydantic models.
- Add PydanticPrompt decorator and docstrings to generate LLM schema documentation.

Observability (agent_executor.py, utils.py, __init__.py):
- Implement structured metrics collection saved to run_metrics.json.
- Implement structured step logging (LoggedStep format) saved to run_log.jsonl.
- Refactor setup_run_logging function into utils.py.

Testing (tests/):
- Add unit tests for SimpleElementTracker (test_tracking.py).
- Update tests for core, agent_executor, visual_state to reflect refactoring and fix issues. All tests pass.

Environment & Dependencies:
- Update Python requirement to >=3.11 in pyproject.toml.
- Add pydantic-prompt, scipy, numpy dependencies.
- Update uv.lock.
- Update CI workflow for Python 3.11/3.12.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant