add: publissh feats, fix: charts, naming

LamoomAI · Feb 24, 2025 · 217fda1 · 217fda1
1 parent 81a8653
commit 217fda1
Show file tree

Hide file tree

Showing 14 changed files with 2,861 additions and 52 deletions.
diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md
@@ -0,0 +1,7 @@
+## Context
+Bullet list of changes
+
+## Checklist
+- [ ] Self-Review
+- [ ] Added Tests for functionality
+- [ ] Did you need to update Readme?
diff --git a/.github/workflows/run-unit-tests.yml b/.github/workflows/run-unit-tests.yml
@@ -0,0 +1,38 @@
+name: run-unit-tests
+
+on: push
+
+jobs:
+  test:
+    runs-on: ubuntu-22.04
+    container: python:3.11-slim
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@v3
+
+      - name: 'Create env file'
+        run: |
+          touch .env
+          echo OPENAI_API_KEY=${{ secrets.OPENAI_API_KEY }} >> .env
+      
+      - name: Install dependencies
+        run: |
+          apt-get update && apt-get install -y curl build-essential
+
+      - name: Install Poetry
+        run: pip install poetry
+
+      - name: Install Python
+        uses: actions/setup-python@v3
+        with:
+          python-version: 3.11
+          cache: poetry
+
+      - name: Install Python libraries
+        run: poetry install
+
+      - name: Run tests with coverage
+        run: poetry run pytest --cache-clear -vv tests \
+          --cov=lamoom_cicd \
+          --cov-fail-under=80 \
+          --cov-report term-missing
diff --git a/.gitignore b/.gitignore
@@ -3,4 +3,5 @@
 __pycache__
 *.py[cod]
 .pytest_cache
-.vscode
+.vscode
+dist
diff --git a/README.md b/README.md
@@ -11,8 +11,8 @@ This tool allows you to evaluate how well your LLM responses match an ideal answ
   "Blockchain is like a digital ledger that everyone can see but no one can change."
   ```
 
-- **llm_response (optional):**  
-  Your LLM's response. If prompt is provided in `optional_params`, `llm_response` can be left as `None`.
+- **llm_response (required):**  
+  Your LLM's response.
 
 - **optional_params (optional):**  
   A JSON-like dictionary that may include extra details for the test. It has the following structure:
@@ -32,22 +32,25 @@ This tool allows you to evaluate how well your LLM responses match an ideal answ
 
 ### 1. Manual Testing
 
-You can manually call the `compare()` method by passing the required `ideal_answer` and (optionally) `llm_response` and `optional_params`. Each call will automatically accumulate the test results based on the provided (or default) `prompt_id`.
+You can manually call the `compare()` method by passing the required `ideal_answer` and `llm_response` and (optionally) `optional_params`. Each call will automatically accumulate the test results based on the provided (or default) `prompt_id` from `optional_params`.
 
 **Example:**
 
 ```python
+from lamoom_cicd import TestLLMResponsePipe
+import time
+
 ideal_answer = (
     "Blockchain is like a digital notebook that everyone can see, but no one can secretly change. "
     "Imagine a shared Google Doc where every change is recorded forever, and no one can edit past entries."
 )
 optional_params = {
-    "prompt": "Explain the concept of blockchain to someone with no technical background."
+    "prompt_id": f"test-{time.now()}"
 }
 
-test_response = TestLLMResponse(openai_key=os.environ.get("OPENAI_KEY"))
+lamoom_pipe = TestLLMResponsePipe(openai_key=os.environ.get("OPENAI_KEY"))
 # When llm_response is not passed, it defaults to None.
-result = test_response.compare(ideal_answer, optional_params=optional_params)
+result = lamoom_pipe.compare(ideal_answer, "Your LLM response here", optional_params=optional_params)
 
 # Print individual question details
 for question in result.questions:
@@ -62,25 +65,25 @@ print(result.score.to_dict())
 You can also pass multiple test cases using a CSV file. The CSV file should contain the following columns:
 
 - **ideal_answer:** (Required) The ideal answer text.
+- **llm_response:** (Required) LLM response to compare with.
 - **optional_params:** (Optional) A JSON string containing the optional parameters.  
-- **llm_response:** (Optional) If not provided, this value is set to `None`.
 
 Multiple rows can be included, and you can use different `prompt_id` values to test various prompts.
 
 **Example CSV Content:**
+IMPORTANT: take notice of double quotes when putting json in a csv file!
 
 ```csv
-ideal_answer,optional_params,llm_response
-"Blockchain is a secure, immutable digital ledger.","{\"prompt\": \"Explain blockchain simply.\", \"prompt_id\": \"simple_blockchain\"}", 
-"Blockchain is like a shared Google Doc that records every change.","{\"prompt\": \"Describe blockchain in simple terms.\", \"prompt_id\": \"google_doc_blockchain\"}", 
+ideal_answer,llm_response, optional_params
+"blockchain_prompt","Blockchain is a secure, immutable digital ledger.","Blockchain is like a shared Google Doc that records every change.","{""prompt_id"": ""google_doc_blockchain""}", 
 ```
 
 **Usage Example:**
 
 ```python
 csv_file_path = "test_data.csv"
-test_response = TestLLMResponse(openai_key=os.environ.get("OPENAI_KEY"))
-accumulated_results = test_response.compare_from_csv(csv_file_path)
+lamoom_pipe = TestLLMResponsePipe(openai_key=os.environ.get("OPENAI_KEY"))
+accumulated_results = lamoom_pipe.compare_from_csv("test_prompt", csv_file_path)
 ```
 
 ### 3. Visualizing Test Scores
@@ -90,15 +93,15 @@ After running tests (whether manually or using a CSV), the results are automatic
 **Example:**
 
 ```python
-visualize_test_results(accumulated_results)
+lamoom_pipe.visualize_test_results()
 ```
 
 This function will generate a line chart with the x-axis representing the test instance number (as integers) and the y-axis representing the score percentage. Each line on the chart corresponds to a different `prompt_id`.
 
 ## Summary
 
-- **ideal_answer** is the only required parameter.
-- **llm_response** and **optional_params** are optional, with `optional_params` offering extra configuration (like a custom prompt and a unique `prompt_id`).
+- **ideal_answer**, **llm_response** are required parameters.
+- **optional_params** are optional, with `optional_params` offering extra configuration (like a custom prompt and a unique `prompt_id` for tests).
 - You can compare responses either manually or via CSV (which supports multiple test cases).
 - The tool accumulates results for each `prompt_id` across multiple calls.
 - Use the visualization function to see your test scores on an easy-to-read chart.

diff --git a/examples/example_ci_cd.ipynb b/examples/example_ci_cd.ipynb
@@ -0,0 +1,160 @@
+{
+  "cells": [
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "4w_ZNZ46ddtF"
+      },
+      "outputs": [],
+      "source": [
+        "%pip install lamoom-cicd==0.1.4"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 2,
+      "metadata": {
+        "id": "-5T7WYDOjrWD"
+      },
+      "outputs": [],
+      "source": [
+        "from lamoom_cicd import TestLLMResponsePipe"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "czcr362ntVl0"
+      },
+      "source": [
+        "# Initialize your `ideal_answer`, `llm_response` and `optional_params`"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 5,
+      "metadata": {
+        "id": "Q_wAswxetVHx"
+      },
+      "outputs": [],
+      "source": [
+        "ideal_answer = \"\"\"Blockchain is like a digital notebook that everyone can see\n",
+        "            but no one can secretly change. Imagine a shared Google Doc where every change\n",
+        "            is recorded forever, and no one can erase or edit past entries.\n",
+        "            Instead of one company controlling it, thousands of computers around\n",
+        "            the world keep copies, making it nearly impossible to hack or fake.\n",
+        "            This is why it’s used for things like Bitcoin—to keep transactions\n",
+        "            secure and transparent without needing a bank in the middle.\"\"\"\n",
+        "\n",
+        "llm_response = \"\"\"Blockchain is like a shared digital notebook where everyone has a copy.\n",
+        "New records (blocks) are added in order and can’t be changed or erased.\n",
+        "Each block is securely locked with a code, and everyone in the network must agree\n",
+        "before adding new information. This makes blockchain transparent, secure, and\n",
+        "tamper-proof, which is why it's used for things like cryptocurrency, secure transactions,\n",
+        "and digital contracts.\"\"\"\n",
+        "\n",
+        "optional_params =  {'prompt_id': \"blockchain_prompt\"}"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "WwOpSoELtkmr"
+      },
+      "source": [
+        "# Initialize `TestLLMResponsePipe`"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "D-PX2WGWtkQY"
+      },
+      "outputs": [],
+      "source": [
+        "# Works with openai, azure, gemini, claude and nebibus keys\n",
+        "lamoom_pipe = TestLLMResponsePipe(threshold=75, openai_key=\"your_key\")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "wnFqAK5Ys8Cz"
+      },
+      "source": [
+        "# 1. Manual Testing"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "b0EdE8A_jjos"
+      },
+      "outputs": [],
+      "source": [
+        "result = lamoom_pipe.compare(ideal_answer, llm_response, optional_params=optional_params)\n",
+        "\n",
+        "print(result.score.to_dict())"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "cspAD9xotFCM"
+      },
+      "source": [
+        "# 2. Testing with CSV"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "E8W2tCSVtIqr"
+      },
+      "outputs": [],
+      "source": [
+        "results = lamoom_pipe.compare_from_csv(\"your_csv_file_path\")\n",
+        "for result in results:\n",
+        "  print(result.score.to_dict())"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "D0qbJXRMt5EG"
+      },
+      "source": [
+        "# 3. Visualize your test results"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "ycbh9-_iq5as"
+      },
+      "outputs": [],
+      "source": [
+        "lamoom_pipe.visualize_test_results()"
+      ]
+    }
+  ],
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "display_name": "Python 3",
+      "name": "python3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}
diff --git a/lamoom_cicd/__init__.py b/lamoom_cicd/__init__.py
@@ -1,3 +1,3 @@
-from .test_llm_response import TestLLMResponse
+from .test_llm_response import TestLLMResponsePipe
 
-__all__ = ["TestLLMResponse"]
+__all__ = ["TestLLMResponsePipe"]
diff --git a/prompts/prompt_compare_results.py → ...om_cicd/prompts/prompt_compare_results.py b/prompts/prompt_compare_results.py → ...om_cicd/prompts/prompt_compare_results.py
diff --git a/prompts/prompt_generate_facts.py → lamoom_cicd/prompts/prompt_generate_facts.py b/prompts/prompt_generate_facts.py → lamoom_cicd/prompts/prompt_generate_facts.py