AgentOps-AI · teocns · Dec 23, 2024 · Dec 23, 2024 · Dec 23, 2024 · Dec 23, 2024
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -91,184 +91,15 @@ Even if you're not ready to contribute code, we'd love to hear your thoughts. Dr
 
 ## Testing
 
-We use a comprehensive testing stack to ensure code quality and reliability. Our testing framework includes pytest and several specialized testing tools.
-
-### Testing Dependencies
-
-Install all testing dependencies:
-```bash
-pip install -e ".[dev]"
-```
-
-We use the following testing packages:
-- `pytest==7.4.0`: Core testing framework
-- `pytest-depends`: Manage test dependencies
-- `pytest-asyncio`: Test async code
-- `pytest-vcr`: Record and replay HTTP interactions
-- `pytest-mock`: Mocking functionality
-- `pyfakefs`: Mock filesystem operations
-- `requests_mock==1.11.0`: Mock HTTP requests
-- `tach~=0.9`: Performance testing and dependency tracking to prevent circular dependencies
-
-### Using Tox
-
-We use tox to automate and standardize testing. Tox:
-- Creates isolated virtual environments for testing
-- Tests against multiple Python versions (3.7-3.12)
-- Runs all test suites consistently
-- Ensures dependencies are correctly specified
-- Verifies the package installs correctly
-
-Run tox:
-```bash
-tox
-```
-
-This will:
-1. Create fresh virtual environments
-2. Install dependencies
-3. Run pytest with our test suite
-4. Generate coverage reports
-
-### Running Tests
-
-1. **Run All Tests**:
-   ```bash
-   tox
-   ```
-
-2. **Run Specific Test File**:
-   ```bash
-   pytest tests/llms/test_anthropic.py -v
-   ```
-
-3. **Run with Coverage**:
-   ```bash
-   coverage run -m pytest
-   coverage report
-   ```
-
-### Writing Tests
-
-1. **Test Structure**:
-   ```python
-   import pytest
-   from pytest_mock import MockerFixture
-   from unittest.mock import Mock, patch
-
-   @pytest.mark.asyncio  # For async tests
-   async def test_async_function():
-       # Test implementation
-
-   @pytest.mark.depends(on=['test_prerequisite'])  # Declare test dependencies
-   def test_dependent_function():
-       # Test implementation
-   ```
-
-2. **Recording HTTP Interactions**:
-   ```python
-   @pytest.mark.vcr()  # Records HTTP interactions
-   def test_api_call():
-       response = client.make_request()
-       assert response.status_code == 200
-   ```
-
-3. **Mocking Filesystem**:
-   ```python
-   def test_file_operations(fs):  # fs fixture provided by pyfakefs
-       fs.create_file('/fake/file.txt', contents='test')
-       assert os.path.exists('/fake/file.txt')
-   ```
-
-4. **Mocking HTTP Requests**:
-   ```python
-   def test_http_client(requests_mock):
-       requests_mock.get('http://api.example.com', json={'key': 'value'})
-       response = make_request()
-       assert response.json()['key'] == 'value'
-   ```
-
-### Testing Best Practices
-
-1. **Test Categories**:
-   - Unit tests: Test individual components
-   - Integration tests: Test component interactions
-   - End-to-end tests: Test complete workflows
-   - Performance tests: Test response times and resource usage
-
-2. **Fixtures**:
-   Create reusable test fixtures in `conftest.py`:
-   ```python
-   @pytest.fixture
-   def mock_llm_client():
-       client = Mock()
-       client.chat.completions.create.return_value = Mock()
-       return client
-   ```
-
-3. **Test Data**:
-   - Store test data in `tests/data/`
-   - Use meaningful test data names
-   - Document data format and purpose
-
-4. **VCR Cassettes**:
-   - Store in `tests/cassettes/`
-   - Sanitize sensitive information
-   - Update cassettes when API changes
-
-5. **Performance Testing**:
-   ```python
-   from tach import Tach
+We maintain comprehensive testing documentation in [tests/README.md](tests/README.md). This includes:
 
-   def test_performance():
-       with Tach('operation_name'):
-           perform_operation()
-   ```
-
-### CI Testing Strategy
-
-We use Jupyter notebooks as integration tests for LLM providers. This approach:
-- Tests real-world usage patterns
-- Verifies end-to-end functionality
-- Ensures examples stay up-to-date
-- Tests against actual LLM APIs
-
-1. **Notebook Tests**:
-   - Located in `examples/` directory
-   - Each LLM provider has example notebooks
-   - CI runs notebooks on PR merges to main
-   - Tests run against multiple Python versions
-
-2. **Test Workflow**:
-   The `test-notebooks.yml` workflow:
-   ```yaml
-   name: Test Notebooks
-   on:
-     pull_request:
-       paths:
-         - "agentops/**"
-         - "examples/**"
-         - "tests/**"
-   ```
-   - Runs on PR merges and manual triggers
-   - Sets up environment with provider API keys
-   - Installs AgentOps from main branch
-   - Executes each notebook
-   - Excludes specific notebooks that require manual testing
-
-3. **Provider Coverage**:
-   Each provider should have notebooks demonstrating:
-   - Basic completion calls
-   - Streaming responses
-   - Async operations (if supported)
-   - Error handling
-   - Tool usage (if applicable)
+- Test structure and organization
+- How to run tests
+- Using VCR.py for HTTP interaction testing
+- Writing new tests
+- Test dependencies and setup
 
-4. **Adding Provider Tests**:
-   - Create notebook in `examples/provider_name/`
-   - Include all provider functionality
-   - Add necessary secrets to GitHub Actions
-   - Update `exclude_notebooks` in workflow if manual testing needed
+For detailed testing instructions and best practices, please refer to the testing documentation.
 
 ## Adding LLM Providers
 

diff --git a/pyproject.toml b/pyproject.toml
@@ -9,7 +9,8 @@ authors = [
   { name="Alex Reibman", email="[email protected]" },
   { name="Shawn Qiu", email="[email protected]" },
   { name="Braelyn Boynton", email="[email protected]" },
-  { name="Howard Gil", email="[email protected]" }
+  { name="Howard Gil", email="[email protected]" },
+  { name="Constantin Teodorescu", email="[email protected]"}
 ]
 description = "Observability and DevTool Platform for AI Agents"
 readme = "README.md"
@@ -51,6 +52,8 @@ langchain = [
     "langchain==0.2.14; python_version >= '3.8.1'"
 ]
 
+[project.scripts]
+agentops = "agentops.cli:main"
 
 [project.urls]
 Homepage = "https://github.com/AgentOps-AI/agentops"
@@ -59,12 +62,10 @@ Issues = "https://github.com/AgentOps-AI/agentops/issues"
 [tool.autopep8]
 max_line_length = 120
 
-[project.scripts]
-agentops = "agentops.cli:main"
 
 [tool.pytest.ini_options]
 asyncio_mode = "strict"
-asyncio_default_fixture_loop_scope = "function"
+asyncio_default_fixture_loop_scope = "function" # WARNING: Changing this may break tests. A `module`-scoped session might be faster, but also unstable.
 test_paths = [
     "tests",
 ]

diff --git a/tests/README.md b/tests/README.md
@@ -0,0 +1,121 @@
+# Testing AgentOps
+
+This directory contains the test suite for AgentOps. We use a comprehensive testing stack including pytest and several specialized testing tools.
+
+## Running Tests
+
+1. **Run All Tests**:
+   ```bash
+   pytest
+   ```
+
+2. **Run Specific Test File**:
+   ```bash
+   pytest tests/providers/test_openai_integration.py
+   ```
+
+3. **Run with Coverage**:
+   ```bash
+   coverage run -m pytest
+   coverage report
+   ```
+
+## Writing Tests
+
+1. **Test Structure**:
+   ```python
+   import pytest
+   from pytest_mock import MockerFixture
+   from unittest.mock import Mock, patch
+
+   @pytest.mark.asyncio  # For async tests
+   async def test_async_function():
+       # Test implementation
+
+   @pytest.mark.depends(on=['test_prerequisite'])  # Declare test dependencies
+   def test_dependent_function():
+       # Test implementation
+   ```
+
+2. **Using Fixtures**:
+   ```python
+   def test_with_mocks(llm_event_spy):
+       # Use the spy to track LLM events
+       pass
+   ```
+
+3. **Using VCR**:
+   ```python
+   def test_api_call(vcr_cassette):
+       # Make API calls - they will be recorded/replayed automatically
+       response = client.make_api_call()
+   ```
+
+## Test Categories
+
+### Core Tests
+- Unit tests for core functionality
+- Integration tests for SDK features
+- Performance benchmarks
+
+### Provider Tests
+Tests for LLM provider integrations. See [providers/README.md](providers/README.md) for details on:
+- VCR.py configuration for recording API interactions
+- Provider-specific test configuration
+- Recording and managing API fixtures
+
+### Manual Tests
+Located in `core_manual_tests/`:
+- API server tests
+- Multi-session scenarios
+- Provider-specific canary tests
+- Time travel debugging tests
+
+## Test Dependencies
+
+Required packages are included in the dev dependencies:
+```bash
+pip install -e ".[dev]"
+```
+
+Key testing packages:
+- `pytest`: Core testing framework
+- `pytest-depends`: Manage test dependencies
+- `pytest-asyncio`: Test async code
+- `pytest-vcr`: Record and replay HTTP interactions
+- `pytest-mock`: Mocking functionality
+- `pyfakefs`: Mock filesystem operations
+- `requests_mock`: Mock HTTP requests
+
+## Best Practices
+
+1. **Recording API Fixtures**:
+   - Use VCR.py to record API interactions
+   - Fixtures are stored in `.cassettes` directories
+   - VCR automatically filters sensitive headers and API keys
+   - New recordings are summarized at the end of test runs
+
+2. **Test Isolation**:
+   - Use fresh sessions for each test
+   - Clean up resources in test teardown
+   - Avoid test interdependencies
+
+3. **Async Testing**:
+   - Use `@pytest.mark.asyncio` for async tests
+   - Handle both sync and async variants
+   - Test streaming responses properly
+
+## VCR Configuration
+
+The VCR setup automatically:
+- Records API interactions on first run
+- Replays recorded responses on subsequent runs
+- Filters sensitive information (API keys, tokens)
+- Ignores AgentOps API and package management calls
+- Creates `.cassettes` directories as needed
+- Reports new recordings in the test summary
+
+To update existing cassettes:
+1. Delete the relevant `.cassette` file
+2. Run the tests
+3. Verify the new recordings in the VCR summary
diff --git a/tests/conftest.py b/tests/conftest.py
@@ -0,0 +1,23 @@
+from typing import TYPE_CHECKING
+import vcr
+from vcr.record_mode import RecordMode
+import pytest
+import os
+from collections import defaultdict
+
+if TYPE_CHECKING:
+    from pytest_mock import MockerFixture
+
+
+@pytest.fixture(scope="function")
+def llm_event_spy(agentops_client, mocker: "MockerFixture"):
+    """Fixture that provides spies on both providers' response handling"""
+    from agentops.llms.providers.anthropic import AnthropicProvider
+    from agentops.llms.providers.litellm import LiteLLMProvider
+    from agentops.llms.providers.openai import OpenAiProvider
+
+    return {
+        "litellm": mocker.spy(LiteLLMProvider(agentops_client), "handle_response"),
+        "openai": mocker.spy(OpenAiProvider(agentops_client), "handle_response"),
+        "anthropic": mocker.spy(AnthropicProvider(agentops_client), "handle_response"),
+    }