Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added code for JSON oriented model approach #4

Merged
merged 9 commits into from
Jun 24, 2024
Merged

Added code for JSON oriented model approach #4

merged 9 commits into from
Jun 24, 2024

Conversation

deepnayak
Copy link
Collaborator

@deepnayak deepnayak commented Jun 17, 2024

Summary by Sourcery

This pull request introduces a JSON-oriented model approach by adding a new script for generating query datasets, a new Flask application for handling user queries, and enhancing the existing index and query engine functionalities. It also removes obsolete files to clean up the codebase.

  • New Features:
    • Introduced a new script generate_query_dataset.py to generate sample queries and their corresponding JSON outputs and API queries.
    • Added a new Flask application in src/app.py to handle user queries and interact with the query engine.
  • Enhancements:
    • Updated the build_index function to use a simplified directory name and a different method for creating the index.
    • Modified the load_index function to load queries from a JSON file and use a question store for query retrieval.
    • Enhanced the GoaTAPIQueryEngine class to include a question_store and updated the custom_query method to format the context string using JSON data from the question store.
  • Chores:
    • Removed obsolete files: query_reformulation.py, rich_queries/queryV1.json, app.py, and prompt.py.

Copy link

sourcery-ai bot commented Jun 17, 2024

Reviewer's Guide by Sourcery

This pull request introduces a JSON-oriented model approach for handling queries. The main changes include updating the embedding and LLM models, refactoring the indexing and query engine to use JSON-based queries, and adding new scripts and a Flask application for generating and handling queries. Obsolete files related to the old query reformulation approach have been removed.

File-Level Changes

Files Changes
index.py
query_engine.py
Refactored the indexing and query engine to use JSON-based queries and updated the LLM model.
src/scripts/generate_query_dataset.py
src/app.py
Added new scripts and Flask application for generating and handling queries.
query_reformulation.py
rich_queries/queryV1.json
app.py
prompt.py
Removed obsolete files related to the old query reformulation approach.

Tips
  • Trigger a new Sourcery review by commenting @sourcery-ai review on the pull request.
  • You can change your review settings at any time by accessing your dashboard:
    • Enable or disable the Sourcery-generated pull request summary or reviewer's guide;
    • Change the review language;
  • You can always contact us if you have any questions or feedback.

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @deepnayak - I've reviewed your changes and they look great!

Here's what I looked at during the review
  • 🟡 General issues: 2 issues found
  • 🟡 Security: 1 issue found
  • 🟢 Testing: all looks good
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment to tell me if it was helpful.

@@ -0,0 +1,136 @@
import json
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: New script for generating query dataset.

Consider adding docstrings to the functions in this script to improve readability and maintainability.

src/app.py Show resolved Hide resolved
@@ -0,0 +1,32 @@
from llama_index.core import PromptTemplate
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: New prompt template for query parsing.

Ensure that the prompt template is comprehensive enough to handle a wide range of queries and edge cases.

Suggested change
from llama_index.core import PromptTemplate
from llama_index.core import PromptTemplate
prompt_template = PromptTemplate(
template="Please provide a detailed response to the following query: {query}. Ensure your response covers all possible edge cases and scenarios.",
variables=["query"]
)

@deepnayak deepnayak requested a review from rjchallis June 18, 2024 17:25
@rjchallis
Copy link
Contributor

I've not started the review yet as when I try to run the code using the instructions in INSTALL.md (from the src directory) I get an error in the browser and httpx.HTTPStatusError: Client error '404 Not Found' for url 'http://127.0.0.1:11434/api/generate' on the server. Swapping to the main branch with the same ollama server running the code works so I'm not clear why the 404. Any thoughts @deepnayak?

@rjchallis
Copy link
Contributor

I'm not clear why the 404

no problem, I see the model is now codellama

@@ -20,7 +20,7 @@ jobs:
- name: flake8 Lint
uses: TrueBrain/actions-flake8@v2
with:
ignore: E203,E701
ignore: E203,E701,W503
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think W504 and go in this list too. I hadn't spotted the difference between ignore and extend-ignore so should list both W503 and W504 to match setup.cfg

src/app.py Outdated

query_string = " AND ".join(params)
return (
base_url + endpoint + "query=" + urllib.parse.quote_plus(query_string) + suffix
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for API spaces need to be escaped as %20, not +

src/app.py Outdated
)

if json_output["intent"] == "count":
endpoint = "count?"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for now this could be endpoint = api/v2/count? to generate valid links

var messageElement = $('<div class="message ' + sender + '-message"></div>').text(message);
var messageElement;
if (sender == 'bot' && message != 'Bot is typing...' && message != 'Sorry, something went wrong.')
messageElement = $('<div class="message ' + sender + '-message"><a href=' + message + ' target="_blank">GoaT Link!</a></div>');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should include a JSON representation for debugging

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good for now - for next iteration will want to include:

  • plurals
  • name class (common/ scientific)

params = []

if "taxon" in json_output:
params.append(f"tax_tree(* {json_output['taxon']})")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fine for now - for next iteration use tax_name search to replace user tax name with taxon_id

@deepnayak deepnayak requested a review from rjchallis June 23, 2024 06:44
Copy link
Contributor

@rjchallis rjchallis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@rjchallis rjchallis merged commit 7f2131f into main Jun 24, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants