Skip to content

Commit

Permalink
Adding github assistant code (#340)
Browse files Browse the repository at this point in the history
* Adding github assistant code

* fmt

* formating evaluation.py

* lint

* formating second attempt

* remoivng extra spaces

* formating index.py

* formating file_summary.append

* formating parse_document function

* thrid attempt of fixing index.py

* lint 4x

* removing extra lines

* Adding evaluation-result.txt

* adding README.md
  • Loading branch information
framsouza authored Oct 21, 2024
1 parent 6ddb6a2 commit 60cc76e
Show file tree
Hide file tree
Showing 6 changed files with 580 additions and 0 deletions.
28 changes: 28 additions & 0 deletions supporting-blog-content/github-assistant/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# GitHub Assistant

Easily ask questions about your GitHub repository using RAG and Elasticsearch as a Vector database.

### How to use this code

1. Install Required Libraries:

```bash
pip install -r requirements.txt
```

2. Set Up Environment Variables
`GITHUB_TOKEN`, `GITHUB_OWNER`, `GITHUB_REPO`, `GITHUB_BRANCH`, `ELASTIC_CLOUD_ID`, `ELASTIC_USER`, `ELASTIC_PASSWORD`, `ELASTIC_INDEX`, `OPENAI_API_KEY`

3. Index your data and create the embeddings by running:

```bash
python index.py
```

An Elasticsearch index will be generated, housing the embeddings. You can then connect to your ESS deployment and run search query against the index, you will see a new field named embeddings.

4. Ask questions about your codebase by running:

```bash
python query.py
```
90 changes: 90 additions & 0 deletions supporting-blog-content/github-assistant/evaluation-result.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
```
Number of documents loaded: 5
\All available questions generated:
0. What is the purpose of chunking monitors in the updated push command as mentioned in the changelog?
1. How does the changelog describe the improvement made to the performance of the push command?
2. What new feature is added to the synthetics project when it is created via the `init` command?
3. According to the changelog, what is the file size of the CHANGELOG.md document?
4. On what date was the CHANGELOG.md file last modified?
5. What is the significance of the example lightweight monitor yaml file mentioned in the changelog?
6. How might the changes described in the changelog impact the workflow of users creating or updating monitors?
7. What is the file path where the CHANGELOG.md document is located?
8. Can you identify the issue numbers associated with the changes mentioned in the changelog?
9. What is the creation date of the CHANGELOG.md file as per the context information?
10. What type of file is the document described in the context information?
11. On what date was the CHANGELOG.md file last modified?
12. What is the file size of the CHANGELOG.md document?
13. Identify one of the bug fixes mentioned in the CHANGELOG.md file.
14. What command is referenced in the context of creating new synthetics projects?
15. How does the CHANGELOG.md file address the issue of varying NDJSON chunked response sizes?
16. What is the significance of the number #680 in the context of the document?
17. What problem is addressed by skipping the addition of empty values for locations?
18. How many bug fixes are explicitly mentioned in the provided context?
19. What is the file path of the CHANGELOG.md document?
20. What is the file path of the document being referenced in the context information?
...

Generated questions:
1. What command is referenced in relation to the bug fix in the CHANGELOG.md?
2. On what date was the CHANGELOG.md file created?
3. What is the primary purpose of the document based on the context provided?

Total number of questions generated: 3

Processing Question 1 of 3:

Evaluation Result:
+---------------------------------------------------+-------------------------------------------------+----------------------------------------------------+----------------------+----------------------+-------------------+------------------+------------------+
| Query | Response | Source | Relevancy Response | Relevancy Feedback | Relevancy Score | Faith Response | Faith Feedback |
+===================================================+=================================================+====================================================+======================+======================+===================+==================+==================+
| What command is referenced in relation to the bug | The `init` command is referenced in relation to | Bug Fixes | Pass | YES | 1 | Pass | YES |
| fix in the CHANGELOG.md? | the bug fix in the CHANGELOG.md. | | | | | | |
| | | | | | | | |
| | | - Pick the correct loader when bundling TypeScript | | | | | |
| | | or JavaScript journey files | | | | | |
| | | | | | | | |
| | | during push command #626 | | | | | |
+---------------------------------------------------+-------------------------------------------------+----------------------------------------------------+----------------------+----------------------+-------------------+------------------+------------------+

Processing Question 2 of 3:

Evaluation Result:
+-------------------------------------------------+------------------------------------------------+------------------------------+----------------------+----------------------+-------------------+------------------+------------------+
| Query | Response | Source | Relevancy Response | Relevancy Feedback | Relevancy Score | Faith Response | Faith Feedback |
+=================================================+================================================+==============================+======================+======================+===================+==================+==================+
| On what date was the CHANGELOG.md file created? | The date mentioned in the CHANGELOG.md file is | v1.0.0-beta-38 (20222-11-02) | Pass | YES | 1 | Pass | YES |
| | November 2, 2022. | | | | | | |
+-------------------------------------------------+------------------------------------------------+------------------------------+----------------------+----------------------+-------------------+------------------+------------------+

Processing Question 3 of 3:

Evaluation Result:
+---------------------------------------------------+---------------------------------------------------+------------------------------+----------------------+----------------------+-------------------+------------------+------------------+
| Query | Response | Source | Relevancy Response | Relevancy Feedback | Relevancy Score | Faith Response | Faith Feedback |
+===================================================+===================================================+==============================+======================+======================+===================+==================+==================+
| What is the primary purpose of the document based | The primary purpose of the document is to provide | v1.0.0-beta-38 (20222-11-02) | Pass | YES | 1 | Pass | YES |
| on the context provided? | a changelog detailing the features and | | | | | | |
| | improvements made in version 1.0.0-beta-38 of a | | | | | | |
| | software project. It highlights specific | | | | | | |
| | enhancements such as improved validation for | | | | | | |
| | monitor schedules and an enhanced push command | | | | | | |
| | experience. | | | | | | |
+---------------------------------------------------+---------------------------------------------------+------------------------------+----------------------+----------------------+-------------------+------------------+------------------+
(clean_env) (base) framsouza@Frams-MacBook-Pro-2 git-assistant %
+-------------------------------------------------+------------------------------------------------+------------------------------+----------------------+----------------------+-------------------+------------------+------------------+------+------------------+

Processing Question 3 of 3:

Evaluation Result:
+---------------------------------------------------+---------------------------------------------------+------------------------------+----------------------+----------------------+-------------------+------------------+------------------+-----------+------------------+
| Query | Response | Source | Relevancy Response | Relevancy Feedback | Relevancy Score | Faith Response | Faith Feedback |Response | Faith Feedback |
+===================================================+===================================================+==============================+======================+======================+===================+==================+==================+===========+==================+
| What is the primary purpose of the document based | The primary purpose of the document is to provide | v1.0.0-beta-38 (20222-11-02) | Pass | YES | 1 | Pass | YES | | YES |
| on the context provided? | a changelog detailing the features and | | | | | | | | |
| | improvements made in version 1.0.0-beta-38 of a | | | | | | | | |
| | software project. It highlights specific | | | | | | | | |
| | enhancements such as improved validation for | | | | | | | | |
| | monitor schedules and an enhanced push command | | | | | | | | |
| | experience. | | | | | | | | |
+---------------------------------------------------+---------------------------------------------------+------------------------------+----------------------+----------------------+-------------------+------------------+------------------+-----------+------------------+
```
197 changes: 197 additions & 0 deletions supporting-blog-content/github-assistant/evaluation.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,197 @@
import logging
import sys
import os
import pandas as pd
from dotenv import load_dotenv
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Response
from llama_index.core.evaluation import (
DatasetGenerator,
RelevancyEvaluator,
FaithfulnessEvaluator,
EvaluationResult,
)
from llama_index.llms.openai import OpenAI
from tabulate import tabulate
import textwrap
import argparse
import traceback
from httpx import ReadTimeout

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

parser = argparse.ArgumentParser(
description="Process documents and questions for evaluation."
)
parser.add_argument(
"--num_documents",
type=int,
default=None,
help="Number of documents to process (default: all)",
)
parser.add_argument(
"--skip_documents",
type=int,
default=0,
help="Number of documents to skip at the beginning (default: 0)",
)
parser.add_argument(
"--num_questions",
type=int,
default=None,
help="Number of questions to process (default: all)",
)
parser.add_argument(
"--skip_questions",
type=int,
default=0,
help="Number of questions to skip at the beginning (default: 0)",
)
parser.add_argument(
"--process_last_questions",
action="store_true",
help="Process last N questions instead of first N",
)
args = parser.parse_args()

load_dotenv(".env")

reader = SimpleDirectoryReader("/tmp/elastic/production-readiness-review")
documents = reader.load_data()
print(f"First document: {documents[0].text}")
print(f"Second document: {documents[1].text}")
print(f"Thrid document: {documents[2].text}")


if args.skip_documents > 0:
documents = documents[args.skip_documents :]

if args.num_documents is not None:
documents = documents[: args.num_documents]

print(f"Number of documents loaded: {len(documents)}")

llm = OpenAI(model="gpt-4o", request_timeout=120)

data_generator = DatasetGenerator.from_documents(documents, llm=llm)

try:
eval_questions = data_generator.generate_questions_from_nodes()
if isinstance(eval_questions, str):
eval_questions_list = eval_questions.strip().split("\n")
else:
eval_questions_list = eval_questions
eval_questions_list = [q for q in eval_questions_list if q.strip()]

if args.skip_questions > 0:
eval_questions_list = eval_questions_list[args.skip_questions :]

if args.num_questions is not None:
if args.process_last_questions:
eval_questions_list = eval_questions_list[-args.num_questions :]
else:
eval_questions_list = eval_questions_list[: args.num_questions]

print("\All available questions generated:")
for idx, q in enumerate(eval_questions):
print(f"{idx}. {q}")

print("\nGenerated questions:")
for idx, q in enumerate(eval_questions_list, start=1):
print(f"{idx}. {q}")
except ReadTimeout as e:
print(
"Request to Ollama timed out during question generation. Please check the server or increase the timeout duration."
)
traceback.print_exc()
sys.exit(1)
except Exception as e:
print(f"An error occurred while generating questions: {e}")
traceback.print_exc()
sys.exit(1)

print(f"\nTotal number of questions generated: {len(eval_questions_list)}")

evaluator_relevancy = RelevancyEvaluator(llm=llm)
evaluator_faith = FaithfulnessEvaluator(llm=llm)

vector_index = VectorStoreIndex.from_documents(documents)


def display_eval_df(
query: str,
response: Response,
eval_result_relevancy: EvaluationResult,
eval_result_faith: EvaluationResult,
) -> None:
relevancy_feedback = getattr(eval_result_relevancy, "feedback", "")
relevancy_passing = getattr(eval_result_relevancy, "passing", False)
relevancy_passing_str = "Pass" if relevancy_passing else "Fail"

relevancy_score = 1.0 if relevancy_passing else 0.0

faithfulness_feedback = getattr(eval_result_faith, "feedback", "")
faithfulness_passing_bool = getattr(eval_result_faith, "passing", False)
faithfulness_passing = "Pass" if faithfulness_passing_bool else "Fail"

def wrap_text(text, width=50):
if text is None:
return ""
text = str(text)
text = text.replace("\r", "")
lines = text.split("\n")
wrapped_lines = []
for line in lines:
wrapped_lines.extend(textwrap.wrap(line, width=width))
wrapped_lines.append("")
return "\n".join(wrapped_lines)

if response.source_nodes:
source_content = wrap_text(response.source_nodes[0].node.get_content())
else:
source_content = ""

eval_data = {
"Query": wrap_text(query),
"Response": wrap_text(str(response)),
"Source": source_content,
"Relevancy Response": relevancy_passing_str,
"Relevancy Feedback": wrap_text(relevancy_feedback),
"Relevancy Score": wrap_text(str(relevancy_score)),
"Faith Response": faithfulness_passing,
"Faith Feedback": wrap_text(faithfulness_feedback),
}

eval_df = pd.DataFrame([eval_data])

print("\nEvaluation Result:")
print(
tabulate(
eval_df, headers="keys", tablefmt="grid", showindex=False, stralign="left"
)
)


query_engine = vector_index.as_query_engine(llm=llm)

total_questions = len(eval_questions_list)
for idx, question in enumerate(eval_questions_list, start=1):
try:
response_vector = query_engine.query(question)
eval_result_relevancy = evaluator_relevancy.evaluate_response(
query=question, response=response_vector
)
eval_result_faith = evaluator_faith.evaluate_response(response=response_vector)

print(f"\nProcessing Question {idx} of {total_questions}:")
display_eval_df(
question, response_vector, eval_result_relevancy, eval_result_faith
)
except ReadTimeout as e:
print(f"Request to OpenAI timed out while processing question {idx}.")
traceback.print_exc()
continue
except Exception as e:
print(f"An error occurred while processing question {idx}: {e}")
traceback.print_exc()
continue
Loading

0 comments on commit 60cc76e

Please sign in to comment.