Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance Testing Infrastructure and Documentation #599

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
183 changes: 7 additions & 176 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,184 +91,15 @@ Even if you're not ready to contribute code, we'd love to hear your thoughts. Dr

## Testing

We use a comprehensive testing stack to ensure code quality and reliability. Our testing framework includes pytest and several specialized testing tools.

### Testing Dependencies

Install all testing dependencies:
```bash
pip install -e ".[dev]"
```

We use the following testing packages:
- `pytest==7.4.0`: Core testing framework
- `pytest-depends`: Manage test dependencies
- `pytest-asyncio`: Test async code
- `pytest-vcr`: Record and replay HTTP interactions
- `pytest-mock`: Mocking functionality
- `pyfakefs`: Mock filesystem operations
- `requests_mock==1.11.0`: Mock HTTP requests
- `tach~=0.9`: Performance testing and dependency tracking to prevent circular dependencies

### Using Tox

We use tox to automate and standardize testing. Tox:
- Creates isolated virtual environments for testing
- Tests against multiple Python versions (3.7-3.12)
- Runs all test suites consistently
- Ensures dependencies are correctly specified
- Verifies the package installs correctly

Run tox:
```bash
tox
```

This will:
1. Create fresh virtual environments
2. Install dependencies
3. Run pytest with our test suite
4. Generate coverage reports

### Running Tests

1. **Run All Tests**:
```bash
tox
```

2. **Run Specific Test File**:
```bash
pytest tests/llms/test_anthropic.py -v
```

3. **Run with Coverage**:
```bash
coverage run -m pytest
coverage report
```

### Writing Tests

1. **Test Structure**:
```python
import pytest
from pytest_mock import MockerFixture
from unittest.mock import Mock, patch

@pytest.mark.asyncio # For async tests
async def test_async_function():
# Test implementation

@pytest.mark.depends(on=['test_prerequisite']) # Declare test dependencies
def test_dependent_function():
# Test implementation
```

2. **Recording HTTP Interactions**:
```python
@pytest.mark.vcr() # Records HTTP interactions
def test_api_call():
response = client.make_request()
assert response.status_code == 200
```

3. **Mocking Filesystem**:
```python
def test_file_operations(fs): # fs fixture provided by pyfakefs
fs.create_file('/fake/file.txt', contents='test')
assert os.path.exists('/fake/file.txt')
```

4. **Mocking HTTP Requests**:
```python
def test_http_client(requests_mock):
requests_mock.get('http://api.example.com', json={'key': 'value'})
response = make_request()
assert response.json()['key'] == 'value'
```

### Testing Best Practices

1. **Test Categories**:
- Unit tests: Test individual components
- Integration tests: Test component interactions
- End-to-end tests: Test complete workflows
- Performance tests: Test response times and resource usage

2. **Fixtures**:
Create reusable test fixtures in `conftest.py`:
```python
@pytest.fixture
def mock_llm_client():
client = Mock()
client.chat.completions.create.return_value = Mock()
return client
```

3. **Test Data**:
- Store test data in `tests/data/`
- Use meaningful test data names
- Document data format and purpose

4. **VCR Cassettes**:
- Store in `tests/cassettes/`
- Sanitize sensitive information
- Update cassettes when API changes

5. **Performance Testing**:
```python
from tach import Tach
We maintain comprehensive testing documentation in [tests/README.md](tests/README.md). This includes:

def test_performance():
with Tach('operation_name'):
perform_operation()
```

### CI Testing Strategy

We use Jupyter notebooks as integration tests for LLM providers. This approach:
- Tests real-world usage patterns
- Verifies end-to-end functionality
- Ensures examples stay up-to-date
- Tests against actual LLM APIs

1. **Notebook Tests**:
- Located in `examples/` directory
- Each LLM provider has example notebooks
- CI runs notebooks on PR merges to main
- Tests run against multiple Python versions

2. **Test Workflow**:
The `test-notebooks.yml` workflow:
```yaml
name: Test Notebooks
on:
pull_request:
paths:
- "agentops/**"
- "examples/**"
- "tests/**"
```
- Runs on PR merges and manual triggers
- Sets up environment with provider API keys
- Installs AgentOps from main branch
- Executes each notebook
- Excludes specific notebooks that require manual testing

3. **Provider Coverage**:
Each provider should have notebooks demonstrating:
- Basic completion calls
- Streaming responses
- Async operations (if supported)
- Error handling
- Tool usage (if applicable)
- Test structure and organization
- How to run tests
- Using VCR.py for HTTP interaction testing
- Writing new tests
- Test dependencies and setup

4. **Adding Provider Tests**:
- Create notebook in `examples/provider_name/`
- Include all provider functionality
- Add necessary secrets to GitHub Actions
- Update `exclude_notebooks` in workflow if manual testing needed
For detailed testing instructions and best practices, please refer to the testing documentation.

## Adding LLM Providers

Expand Down
9 changes: 5 additions & 4 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,8 @@ authors = [
{ name="Alex Reibman", email="[email protected]" },
{ name="Shawn Qiu", email="[email protected]" },
{ name="Braelyn Boynton", email="[email protected]" },
{ name="Howard Gil", email="[email protected]" }
{ name="Howard Gil", email="[email protected]" },
{ name="Constantin Teodorescu", email="[email protected]"}
]
description = "Observability and DevTool Platform for AI Agents"
readme = "README.md"
Expand Down Expand Up @@ -51,6 +52,8 @@ langchain = [
"langchain==0.2.14; python_version >= '3.8.1'"
]

[project.scripts]
agentops = "agentops.cli:main"

[project.urls]
Homepage = "https://github.com/AgentOps-AI/agentops"
Expand All @@ -59,12 +62,10 @@ Issues = "https://github.com/AgentOps-AI/agentops/issues"
[tool.autopep8]
max_line_length = 120

[project.scripts]
agentops = "agentops.cli:main"

[tool.pytest.ini_options]
asyncio_mode = "strict"
asyncio_default_fixture_loop_scope = "function"
asyncio_default_fixture_loop_scope = "function" # WARNING: Changing this may break tests. A `module`-scoped session might be faster, but also unstable.
test_paths = [
"tests",
]
Expand Down
121 changes: 121 additions & 0 deletions tests/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
# Testing AgentOps

This directory contains the test suite for AgentOps. We use a comprehensive testing stack including pytest and several specialized testing tools.

## Running Tests

1. **Run All Tests**:
```bash
pytest
```

2. **Run Specific Test File**:
```bash
pytest tests/providers/test_openai_integration.py
```

3. **Run with Coverage**:
```bash
coverage run -m pytest
coverage report
```

## Writing Tests

1. **Test Structure**:
```python
import pytest
from pytest_mock import MockerFixture
from unittest.mock import Mock, patch

@pytest.mark.asyncio # For async tests
async def test_async_function():
# Test implementation

@pytest.mark.depends(on=['test_prerequisite']) # Declare test dependencies
def test_dependent_function():
# Test implementation
```

2. **Using Fixtures**:
```python
def test_with_mocks(llm_event_spy):
# Use the spy to track LLM events
pass
```

3. **Using VCR**:
```python
def test_api_call(vcr_cassette):
# Make API calls - they will be recorded/replayed automatically
response = client.make_api_call()
```

## Test Categories

### Core Tests
- Unit tests for core functionality
- Integration tests for SDK features
- Performance benchmarks

### Provider Tests
Tests for LLM provider integrations. See [providers/README.md](providers/README.md) for details on:
- VCR.py configuration for recording API interactions
- Provider-specific test configuration
- Recording and managing API fixtures

### Manual Tests
Located in `core_manual_tests/`:
- API server tests
- Multi-session scenarios
- Provider-specific canary tests
- Time travel debugging tests

## Test Dependencies

Required packages are included in the dev dependencies:
```bash
pip install -e ".[dev]"
```

Key testing packages:
- `pytest`: Core testing framework
- `pytest-depends`: Manage test dependencies
- `pytest-asyncio`: Test async code
- `pytest-vcr`: Record and replay HTTP interactions
- `pytest-mock`: Mocking functionality
- `pyfakefs`: Mock filesystem operations
- `requests_mock`: Mock HTTP requests

## Best Practices

1. **Recording API Fixtures**:
- Use VCR.py to record API interactions
- Fixtures are stored in `.cassettes` directories
- VCR automatically filters sensitive headers and API keys
- New recordings are summarized at the end of test runs

2. **Test Isolation**:
- Use fresh sessions for each test
- Clean up resources in test teardown
- Avoid test interdependencies

3. **Async Testing**:
- Use `@pytest.mark.asyncio` for async tests
- Handle both sync and async variants
- Test streaming responses properly

## VCR Configuration

The VCR setup automatically:
- Records API interactions on first run
- Replays recorded responses on subsequent runs
- Filters sensitive information (API keys, tokens)
- Ignores AgentOps API and package management calls
- Creates `.cassettes` directories as needed
- Reports new recordings in the test summary

To update existing cassettes:
1. Delete the relevant `.cassette` file
2. Run the tests
3. Verify the new recordings in the VCR summary
23 changes: 23 additions & 0 deletions tests/conftest.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
from typing import TYPE_CHECKING
import vcr
from vcr.record_mode import RecordMode
import pytest
import os
from collections import defaultdict

if TYPE_CHECKING:
from pytest_mock import MockerFixture


@pytest.fixture(scope="function")
def llm_event_spy(agentops_client, mocker: "MockerFixture"):
"""Fixture that provides spies on both providers' response handling"""
from agentops.llms.providers.anthropic import AnthropicProvider
from agentops.llms.providers.litellm import LiteLLMProvider
from agentops.llms.providers.openai import OpenAiProvider

return {
"litellm": mocker.spy(LiteLLMProvider(agentops_client), "handle_response"),
"openai": mocker.spy(OpenAiProvider(agentops_client), "handle_response"),
"anthropic": mocker.spy(AnthropicProvider(agentops_client), "handle_response"),
}
Loading
Loading