This repository contains the code and tests for the AI-Driven Testing project, designed to develop an LLM-based AI that can automatically generate test code for existing software through a chat-based interface.
- Project Overview
- Prerequisites
- Quick Start
- Backend Setup and Usage
- Frontend Setup
- API Reference
- Modular Plugin System
- Configuration
- Examples
- Troubleshooting
The goal of this project is to develop or customize a LLM-based (Large Language Model) AI that can automatically generate test code for existing software or documentation. The AI is controlled through a chat-based interface or a command-line interface and can be provided with information about the target software in various ways.
-
🔍 Test Code Generation
The AI can generate test code for arbitrary software using methods such as Retrieval-Augmented Generation (RAG), fine-tuning, or prompting. -
🔄 Incremental Test Extension
The AI can recognize and expand existing test code intelligently. -
🧪 Understanding of Test Types
The AI can distinguish between different layers and types of tests:- Layers: User interface, domain/business logic, persistence layer
- Test Types: Unit test, integration test, acceptance test
-
🛠️ On-Premise Operation
The solution can run fully offline, suitable for on-premise environments. -
🐳 Docker Support
The backend can run inside a Docker container and be accessed via an API. -
🔌 IDE Integration
The solution can be embedded into existing open-source development environments. -
🤝 Open Source
The solution is open source, allowing for community contributions and transparent development.
- Provide the software (source code or API/documentation) either via CLI or web interface
- Select from optional features like complexity analysis, incremental test evolution, or web-assisted research
- Examine the generated test code along with any supporting insights or recommendations
- Integrate test code into your existing test suite
Before setting up the project, ensure you have the following installed:
- Node.js - Install Node.js
- Docker - Install Docker
- Conda - Install Conda
- Git (for cloning the project) - Install Git
git clone https://github.com/amosproj/amos2025ss04-ai-driven-testing.git
cd amos2025ss04-ai-driven-testing
For quick setup, use the provided setup script (not working on windows):
chmod +x setup.sh
./setup.sh
-
Ensure Docker is running on your machine.
-
Set up the backend:
cd backend/ conda env create -f environment.yml conda activate backend
-
Ensure Docker is running on your machine
-
Start the docker compose that starts both the backend and the frontend:
docker compose up
api.py
— FastAPI wrapper for HTTP endpointscli.py
— Handling the CLI parametersschemas.py
— Defining the data structure for information in the projectmodel_manager.py
— loading the usable modelsmodule_manager.py
- loading the available modulesmain.py
— Main script to run a single modelllm_manager.py
— Docker container management and LLM interactionallowed_models.json
— Configuration for allowed language modelsprompt.txt
— Default input prompt fileoutput-<MODEL_ID>.md
— example output files for each model
python backend/main.py
The script supports the following command line parameters:
--model
- Model index to use (integer, choices based on available models, default: 0)--prompt_file
- Path to the prompt file (default:user_message.txt
)--source_code
- Path to the source code file (default:source_code.txt
)--output_file
- Path to the output file (default:output.md
)--modules
- List of module names to run (space-separated)--seed
- Random seed for reproducible results (default: 42)--num_ctx
- Context size for the model (default: 4096)--command-order
- Enable manual module ordering (flag, default: false)--timeout
- Timeout for operations (integer, in seconds)--use-links
- Provide one or more web links to include in the context (space-separated URLs)
python backend/main.py --model 1 --seed 123 --num_ctx 8192 --timeout 300 --use-links https://example.com/docs
python backend/example_all_models.py
This script:
- Starts each model's container
- Sends the provided prompt (from
prompt.txt
) - Saves each response into its own
output-<MODEL_ID>.md
- Stops all containers after completion
- The project uses the Docker image
ollama/ollama
to run language models locally - The
LLMManager
class:- Pulls the required Docker image with progress indication
- Selects a free port for each container
- Waits until the container's API becomes available
- Pulls the selected model inside the container
- Sends user prompts to the model endpoint and writes the Markdown-formatted response
-
Start the docker containers:
docker compose up
-
Open your browser and go to:
http://localhost:3000/
cd backend
uvicorn api:app --reload # --port 8000 by default
Method | Path | Purpose | Body / Query |
---|---|---|---|
GET | /models |
List all allowed models + whether running | – |
POST | /prompt |
Ensure container is running, send prompt | { "model_id": "<id>", "prompt": "<text>" } |
POST | /shutdown |
Stop & remove a model container | { "model_id": "<id>" } |
Open the automatically generated Swagger UI at:
http://127.0.0.1:8000/docs
The schema defined in schemas.py
provides a structured format for communication:
- PromptData: Encapsulates model metadata, prompt text, system instructions, and generation options
- ResponseData: Contains generated Markdown response, extracted code, token usage statistics, and timing metrics
This project uses a flexible module interface to add custom functionality before and after interacting with an LLM. Modules can handle tasks like logging, prompt modification, response postprocessing, and metric collection.
Models are configured in allowed_models.json
.
The following models are currently supported in the project:
-
mistral:7b-instruct-v0.3-q3_K_M - The 7B model released by Mistral AI, updated to version 0.3.
-
qwen2.5-coder:3b-instruct-q8_0 - The latest series of Code-Specific Qwen models, with significant improvements in code generation, code reasoning, and code fixing.
-
phi4-mini:3.8b-q4_K_M - Microsoft's Phi-4 Mini model, a compact language model optimized for efficiency.
-
tinyllama:1.1b - The TinyLlama project is an open endeavor to train a compact 1.1B Llama model on 3 trillion tokens.
-
qwen3:4b-q4_K_M - Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models
-
openhermes:v2.5 - OpenHermes 2.5 is a 7B model fine-tuned by Teknium on Mistral with fully open datasets.
-
smollm2:360m - SmolLM2 is a family of compact language models available in three size: 135M, 360M, and 1.7B parameters.
-
phi4-reasoning:14b - Microsoft's Phi-4 Reasoning model, optimized for complex reasoning tasks.
If your prompt.txt
contains:
Write unit tests for the following Python function:
```python
def add_numbers(a, b):
"""
Adds two numbers together and returns the result.
Args:
a (int or float): The first number.
b (int or float): The second number.
Returns:
int or float: The sum of a and b.
Examples:
>>> add_numbers(2, 3)
5
>>> add_numbers(-1, 1)
0
>>> add_numbers(0.5, 0.5)
1.0
"""
return a + b
Your output.md
will contain:
Here is how you can write unit tests for the `add_numbers` function using Python's built-in unittest module:
```python
import unittest
from add_numbers import add_numbers
class TestAddNumbers(unittest.TestCase):
def test_positive_integers(self):
self.assertEqual(add_numbers(2, 3), 5)
def test_negative_integers(self):
self.assertEqual(add_numbers(-1, 1), 0)
def test_decimal(self):
self.assertEqual(add_numbers(0.5, 0.5), 1.0)
if __name__ == "__main__":
unittest.main()
This unit test covers the basic functionality with positive integers, negative integers, and decimal numbers as shown in the examples.
- Docker not running: Ensure Docker is started before running the backend
- Port conflicts: The system automatically selects free ports for containers
- Model download failures: Check your internet connection and Docker Hub access
- Memory issues: Large models may require significant system resources
- Tokenizer download failures: May require Hugging Face authentication
- Each container is automatically stopped after completion to free up system resources
- The response is formatted as clean Markdown
- Progress indication is provided during Docker image and model pulling
For additional support:
- Check the logs for detailed error messages
- Ensure all prerequisites are properly installed
- Verify Docker container status using
docker ps
- Check available system resources before running large models
- The script automatically pulls necessary Docker images and models if not already available
- Each container starts on a free port with automatic API endpoint management
- All models in
allowed_models.json
are supported by the Context Size Calculator - The system is designed to work fully offline for on-premise environments