Skip to content

Commit

Permalink
add: publissh feats, fix: charts, naming
Browse files Browse the repository at this point in the history
  • Loading branch information
maisiukartyom committed Feb 24, 2025
1 parent 81a8653 commit 217fda1
Show file tree
Hide file tree
Showing 14 changed files with 2,861 additions and 52 deletions.
7 changes: 7 additions & 0 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
## Context
Bullet list of changes

## Checklist
- [ ] Self-Review
- [ ] Added Tests for functionality
- [ ] Did you need to update Readme?
38 changes: 38 additions & 0 deletions .github/workflows/run-unit-tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
name: run-unit-tests

on: push

jobs:
test:
runs-on: ubuntu-22.04
container: python:3.11-slim
steps:
- name: Checkout code
uses: actions/checkout@v3

- name: 'Create env file'
run: |
touch .env
echo OPENAI_API_KEY=${{ secrets.OPENAI_API_KEY }} >> .env
- name: Install dependencies
run: |
apt-get update && apt-get install -y curl build-essential
- name: Install Poetry
run: pip install poetry

- name: Install Python
uses: actions/setup-python@v3
with:
python-version: 3.11
cache: poetry

- name: Install Python libraries
run: poetry install

- name: Run tests with coverage
run: poetry run pytest --cache-clear -vv tests \
--cov=lamoom_cicd \
--cov-fail-under=80 \
--cov-report term-missing
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,5 @@
__pycache__
*.py[cod]
.pytest_cache
.vscode
.vscode
dist
33 changes: 18 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@ This tool allows you to evaluate how well your LLM responses match an ideal answ
"Blockchain is like a digital ledger that everyone can see but no one can change."
```

- **llm_response (optional):**
Your LLM's response. If prompt is provided in `optional_params`, `llm_response` can be left as `None`.
- **llm_response (required):**
Your LLM's response.

- **optional_params (optional):**
A JSON-like dictionary that may include extra details for the test. It has the following structure:
Expand All @@ -32,22 +32,25 @@ This tool allows you to evaluate how well your LLM responses match an ideal answ

### 1. Manual Testing

You can manually call the `compare()` method by passing the required `ideal_answer` and (optionally) `llm_response` and `optional_params`. Each call will automatically accumulate the test results based on the provided (or default) `prompt_id`.
You can manually call the `compare()` method by passing the required `ideal_answer` and `llm_response` and (optionally) `optional_params`. Each call will automatically accumulate the test results based on the provided (or default) `prompt_id` from `optional_params`.

**Example:**

```python
from lamoom_cicd import TestLLMResponsePipe
import time

ideal_answer = (
"Blockchain is like a digital notebook that everyone can see, but no one can secretly change. "
"Imagine a shared Google Doc where every change is recorded forever, and no one can edit past entries."
)
optional_params = {
"prompt": "Explain the concept of blockchain to someone with no technical background."
"prompt_id": f"test-{time.now()}"
}

test_response = TestLLMResponse(openai_key=os.environ.get("OPENAI_KEY"))
lamoom_pipe = TestLLMResponsePipe(openai_key=os.environ.get("OPENAI_KEY"))
# When llm_response is not passed, it defaults to None.
result = test_response.compare(ideal_answer, optional_params=optional_params)
result = lamoom_pipe.compare(ideal_answer, "Your LLM response here", optional_params=optional_params)

# Print individual question details
for question in result.questions:
Expand All @@ -62,25 +65,25 @@ print(result.score.to_dict())
You can also pass multiple test cases using a CSV file. The CSV file should contain the following columns:

- **ideal_answer:** (Required) The ideal answer text.
- **llm_response:** (Required) LLM response to compare with.
- **optional_params:** (Optional) A JSON string containing the optional parameters.
- **llm_response:** (Optional) If not provided, this value is set to `None`.

Multiple rows can be included, and you can use different `prompt_id` values to test various prompts.

**Example CSV Content:**
IMPORTANT: take notice of double quotes when putting json in a csv file!

```csv
ideal_answer,optional_params,llm_response
"Blockchain is a secure, immutable digital ledger.","{\"prompt\": \"Explain blockchain simply.\", \"prompt_id\": \"simple_blockchain\"}",
"Blockchain is like a shared Google Doc that records every change.","{\"prompt\": \"Describe blockchain in simple terms.\", \"prompt_id\": \"google_doc_blockchain\"}",
ideal_answer,llm_response, optional_params
"blockchain_prompt","Blockchain is a secure, immutable digital ledger.","Blockchain is like a shared Google Doc that records every change.","{""prompt_id"": ""google_doc_blockchain""}",
```

**Usage Example:**

```python
csv_file_path = "test_data.csv"
test_response = TestLLMResponse(openai_key=os.environ.get("OPENAI_KEY"))
accumulated_results = test_response.compare_from_csv(csv_file_path)
lamoom_pipe = TestLLMResponsePipe(openai_key=os.environ.get("OPENAI_KEY"))
accumulated_results = lamoom_pipe.compare_from_csv("test_prompt", csv_file_path)
```

### 3. Visualizing Test Scores
Expand All @@ -90,15 +93,15 @@ After running tests (whether manually or using a CSV), the results are automatic
**Example:**

```python
visualize_test_results(accumulated_results)
lamoom_pipe.visualize_test_results()
```

This function will generate a line chart with the x-axis representing the test instance number (as integers) and the y-axis representing the score percentage. Each line on the chart corresponds to a different `prompt_id`.

## Summary

- **ideal_answer** is the only required parameter.
- **llm_response** and **optional_params** are optional, with `optional_params` offering extra configuration (like a custom prompt and a unique `prompt_id`).
- **ideal_answer**, **llm_response** are required parameters.
- **optional_params** are optional, with `optional_params` offering extra configuration (like a custom prompt and a unique `prompt_id` for tests).
- You can compare responses either manually or via CSV (which supports multiple test cases).
- The tool accumulates results for each `prompt_id` across multiple calls.
- Use the visualization function to see your test scores on an easy-to-read chart.
Expand Down
160 changes: 160 additions & 0 deletions examples/example_ci_cd.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "4w_ZNZ46ddtF"
},
"outputs": [],
"source": [
"%pip install lamoom-cicd==0.1.4"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"id": "-5T7WYDOjrWD"
},
"outputs": [],
"source": [
"from lamoom_cicd import TestLLMResponsePipe"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "czcr362ntVl0"
},
"source": [
"# Initialize your `ideal_answer`, `llm_response` and `optional_params`"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"id": "Q_wAswxetVHx"
},
"outputs": [],
"source": [
"ideal_answer = \"\"\"Blockchain is like a digital notebook that everyone can see\n",
" but no one can secretly change. Imagine a shared Google Doc where every change\n",
" is recorded forever, and no one can erase or edit past entries.\n",
" Instead of one company controlling it, thousands of computers around\n",
" the world keep copies, making it nearly impossible to hack or fake.\n",
" This is why it’s used for things like Bitcoin—to keep transactions\n",
" secure and transparent without needing a bank in the middle.\"\"\"\n",
"\n",
"llm_response = \"\"\"Blockchain is like a shared digital notebook where everyone has a copy.\n",
"New records (blocks) are added in order and can’t be changed or erased.\n",
"Each block is securely locked with a code, and everyone in the network must agree\n",
"before adding new information. This makes blockchain transparent, secure, and\n",
"tamper-proof, which is why it's used for things like cryptocurrency, secure transactions,\n",
"and digital contracts.\"\"\"\n",
"\n",
"optional_params = {'prompt_id': \"blockchain_prompt\"}"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "WwOpSoELtkmr"
},
"source": [
"# Initialize `TestLLMResponsePipe`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "D-PX2WGWtkQY"
},
"outputs": [],
"source": [
"# Works with openai, azure, gemini, claude and nebibus keys\n",
"lamoom_pipe = TestLLMResponsePipe(threshold=75, openai_key=\"your_key\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "wnFqAK5Ys8Cz"
},
"source": [
"# 1. Manual Testing"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "b0EdE8A_jjos"
},
"outputs": [],
"source": [
"result = lamoom_pipe.compare(ideal_answer, llm_response, optional_params=optional_params)\n",
"\n",
"print(result.score.to_dict())"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "cspAD9xotFCM"
},
"source": [
"# 2. Testing with CSV"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "E8W2tCSVtIqr"
},
"outputs": [],
"source": [
"results = lamoom_pipe.compare_from_csv(\"your_csv_file_path\")\n",
"for result in results:\n",
" print(result.score.to_dict())"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "D0qbJXRMt5EG"
},
"source": [
"# 3. Visualize your test results"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "ycbh9-_iq5as"
},
"outputs": [],
"source": [
"lamoom_pipe.visualize_test_results()"
]
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
4 changes: 2 additions & 2 deletions lamoom_cicd/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
from .test_llm_response import TestLLMResponse
from .test_llm_response import TestLLMResponsePipe

__all__ = ["TestLLMResponse"]
__all__ = ["TestLLMResponsePipe"]
File renamed without changes.
File renamed without changes.
Loading

0 comments on commit 217fda1

Please sign in to comment.