feat: new readme, test results names

LamoomAI · Feb 26, 2025 · 9bb17e0 · 9bb17e0
1 parent 0d0b06b
commit 9bb17e0
Show file tree

Hide file tree

Showing 3 changed files with 37 additions and 54 deletions.
diff --git a/README.md b/README.md
@@ -1,94 +1,85 @@
 # LLM Prompt Evaluation Tool
 
-This tool allows you to evaluate how well your LLM responses match an ideal answer by comparing generated questions and answers. You can use the tool either by calling the comparison functions manually in your code or by passing a CSV file containing test cases. The tool also supports visualizing your test scores on a chart, with results automatically grouped by different prompt IDs.
+This tool allows you to evaluate how well your LLM response matches your ideal answer by comparing generated questions (from ideal answer) and answers (from llm response). The tool also supports visualizing your test scores on a chart, with results automatically grouped by different prompt IDs.
 
 ## Key Parameters
 
-- **ideal_answer (required):**  
-  The reference or "ideal" answer that your LLM response is compared against.  
-  **Example:**  
-  ```
-  "Blockchain is like a digital ledger that everyone can see but no one can change."
-  ```
+- **ideal_answer: str (required):**  
+  The ideal answer for your prompt.
 
-- **llm_response (required):**  
+- **llm_response: str (required):**  
   Your LLM's response.
 
-- **optional_params (optional):**  
-  A JSON-like dictionary that may include extra details for the test. It has the following structure:
-  - `prompt`: A string with the prompt to be used (if you want to override the `prompt_id` prompt).
-  - `context`: (Optional) Additional context for the prompt.
-  - `prompt_id`: (Optional) A unique identifier for the prompt to be fetched online from Lamoom Service.  
+- **optional_params: dict (optional):**  
+  A JSON-like dictionary that may include extra details for the test. For now you can pass `prompt_id` key to link test results to this `prompt_id` 
   **Example:**  
   ```json
   {
-    "prompt": "Explain blockchain to a beginner.",
-    "context": {},
-    "prompt_id": "beginner_blockchain"
+    "prompt_id": "blockchain_prompt"
   }
   ```
 
 ## Using the Tool
 
-### 1. Manual Testing
 
-You can manually call the `compare()` method by passing the required `ideal_answer` and `llm_response` and (optionally) `optional_params`. Each call will automatically accumulate the test results based on the provided (or default) `prompt_id` from `optional_params`.
+### 1. Pass ideal_answer, llm_response
+User calls the `compare()` method by passing the `ideal_answer: str` and `llm_response: str` and (optionally) `optional_params: dict` parameters. `compare()` method returns an object with test results.
 
 **Example:**
 
 ```python
 from lamoom_cicd import TestLLMResponsePipe
-import time
 
-ideal_answer = (
-    "Blockchain is like a digital notebook that everyone can see, but no one can secretly change. "
-    "Imagine a shared Google Doc where every change is recorded forever, and no one can edit past entries."
-)
+# Users must provide these values (it doesn't matter how they obtain them)
+ideal_answer = "Your ideal answer here"  # Replace with the expected response
+llm_response = "Your LLM-generated response here"  # Replace with the model's actual response
+
+# Optional: If you have a prompt_id, you can link results to it
 optional_params = {
-    "prompt_id": f"test-{time.now()}"
+    "prompt_id": "your_prompt_id"  # Replace with an actual prompt identifier if needed
 }
 
 lamoom_pipe = TestLLMResponsePipe(openai_key=os.environ.get("OPENAI_KEY"))
-# When llm_response is not passed, it defaults to None.
-result = lamoom_pipe.compare(ideal_answer, "Your LLM response here", optional_params=optional_params)
 
-# Print individual question details
+# Compare the ideal answer with the LLM's response
+result = lamoom_pipe.compare(ideal_answer, llm_response, optional_params=optional_params)
+
+# Print test questions details
 for question in result.questions:
     print(question.to_dict())
 
-# Print overall score details
+# Print final test score
 print(result.score.to_dict())
 ```
 
-### 2. Testing with CSV
-
+### 2. Pass a CSV file
 You can also pass multiple test cases using a CSV file. The CSV file should contain the following columns:
 
-- **ideal_answer:** (Required) The ideal answer text.
-- **llm_response:** (Required) LLM response to compare with.
+- **ideal_answer:** (Required) The ideal answer.
+- **llm_response:** (Required) Your LLM response.
 - **optional_params:** (Optional) A JSON string containing the optional parameters.  
 
-Multiple rows can be included, and you can use different `prompt_id` values to test various prompts.
+If multiple rows are added then multiple test will run.
 
 **Example CSV Content:**
 IMPORTANT: take notice of double quotes when putting json in a csv file!
 
 ```csv
 ideal_answer,llm_response, optional_params
-"blockchain_prompt","Blockchain is a secure, immutable digital ledger.","Blockchain is like a shared Google Doc that records every change.","{""prompt_id"": ""google_doc_blockchain""}", 
+"Blockchain is a secure, immutable digital ledger.","Blockchain is like a shared Google Doc that records every change.","{""prompt_id"": ""blockchain_prompt""}", 
 ```
 
 **Usage Example:**
 
 ```python
-csv_file_path = "test_data.csv"
-lamoom_pipe = TestLLMResponsePipe(openai_key=os.environ.get("OPENAI_KEY"))
-accumulated_results = lamoom_pipe.compare_from_csv("test_prompt", csv_file_path)
+csv_file_path = "your_file.csv"
+lamoom_pipe = TestLLMResponsePipe(openai_key="your_key")
+accumulated_results: list = lamoom_pipe.compare_from_csv(csv_file_path)
 ```
 
 ### 3. Visualizing Test Scores
 
-After running tests (whether manually or using a CSV), the results are automatically accumulated by `prompt_id`. To see a visual chart of test scores, use the provided visualization function.
+After running tests, the results are automatically accumulated by `prompt_id`. To see a visual chart of test scores, use the provided visualization function.
 
 **Example:**
 
@@ -98,12 +89,4 @@ lamoom_pipe.visualize_test_results()
 
 This function will generate a line chart with the x-axis representing the test instance number (as integers) and the y-axis representing the score percentage. Each line on the chart corresponds to a different `prompt_id`.
 
-## Summary
-
-- **ideal_answer**, **llm_response** are required parameters.
-- **optional_params** are optional, with `optional_params` offering extra configuration (like a custom prompt and a unique `prompt_id` for tests).
-- You can compare responses either manually or via CSV (which supports multiple test cases).
-- The tool accumulates results for each `prompt_id` across multiple calls.
-- Use the visualization function to see your test scores on an easy-to-read chart.
-
 Enjoy using the tool to refine and evaluate your LLM prompts!
diff --git a/examples/example_ci_cd.ipynb b/examples/example_ci_cd.ipynb
@@ -8,7 +8,7 @@
       },
       "outputs": [],
       "source": [
-        "%pip install lamoom-cicd==0.1.4"
+        "%pip install lamoom-cicd"
       ]
     },
     {

diff --git a/lamoom_cicd/responses.py b/lamoom_cicd/responses.py
@@ -1,18 +1,18 @@
 from dataclasses import dataclass
 
 class Question:
-    def __init__(self, question: str, answer: str, ideal_answer: str, comparison: bool):
-        self.question = question
-        self.answer = answer
+    def __init__(self, test_question: str, llm_answer: str, ideal_answer: str, does_match_ideal_answer: bool):
+        self.test_question = test_question
+        self.llm_answer = llm_answer
         self.ideal_answer = ideal_answer
-        self.comparison = comparison
+        self.does_match_ideal_answer = does_match_ideal_answer
 
     def to_dict(self):
         return {
-            "question": self.question,
-            "answer": self.answer,
+            "test_question": self.test_question,
+            "llm_answer": self.llm_answer,
             "ideal_answer": self.ideal_answer,
-            "comparison": self.comparison
+            "does_match_ideal_answer": self.does_match_ideal_answer
         }
 
 class Score: