-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added code for JSON oriented model approach #4
Conversation
Reviewer's Guide by SourceryThis pull request introduces a JSON-oriented model approach for handling queries. The main changes include updating the embedding and LLM models, refactoring the indexing and query engine to use JSON-based queries, and adding new scripts and a Flask application for generating and handling queries. Obsolete files related to the old query reformulation approach have been removed. File-Level Changes
Tips
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @deepnayak - I've reviewed your changes and they look great!
Here's what I looked at during the review
- 🟡 General issues: 2 issues found
- 🟡 Security: 1 issue found
- 🟢 Testing: all looks good
- 🟢 Complexity: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment to tell me if it was helpful.
@@ -0,0 +1,136 @@ | |||
import json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: New script for generating query dataset.
Consider adding docstrings to the functions in this script to improve readability and maintainability.
@@ -0,0 +1,32 @@ | |||
from llama_index.core import PromptTemplate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: New prompt template for query parsing.
Ensure that the prompt template is comprehensive enough to handle a wide range of queries and edge cases.
from llama_index.core import PromptTemplate | |
from llama_index.core import PromptTemplate | |
prompt_template = PromptTemplate( | |
template="Please provide a detailed response to the following query: {query}. Ensure your response covers all possible edge cases and scenarios.", | |
variables=["query"] | |
) |
Refined logic for time based queries
I've not started the review yet as when I try to run the code using the instructions in INSTALL.md (from the src directory) I get an error in the browser and |
no problem, I see the model is now codellama |
.github/workflows/flake8.yml
Outdated
@@ -20,7 +20,7 @@ jobs: | |||
- name: flake8 Lint | |||
uses: TrueBrain/actions-flake8@v2 | |||
with: | |||
ignore: E203,E701 | |||
ignore: E203,E701,W503 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think W504
and go in this list too. I hadn't spotted the difference between ignore
and extend-ignore
so should list both W503
and W504
to match setup.cfg
src/app.py
Outdated
|
||
query_string = " AND ".join(params) | ||
return ( | ||
base_url + endpoint + "query=" + urllib.parse.quote_plus(query_string) + suffix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for API spaces need to be escaped as %20
, not +
src/app.py
Outdated
) | ||
|
||
if json_output["intent"] == "count": | ||
endpoint = "count?" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for now this could be endpoint = api/v2/count?
to generate valid links
src/templates/chat.html
Outdated
var messageElement = $('<div class="message ' + sender + '-message"></div>').text(message); | ||
var messageElement; | ||
if (sender == 'bot' && message != 'Bot is typing...' && message != 'Sorry, something went wrong.') | ||
messageElement = $('<div class="message ' + sender + '-message"><a href=' + message + ' target="_blank">GoaT Link!</a></div>'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should include a JSON representation for debugging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good for now - for next iteration will want to include:
- plurals
- name class (common/ scientific)
params = [] | ||
|
||
if "taxon" in json_output: | ||
params.append(f"tax_tree(* {json_output['taxon']})") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fine for now - for next iteration use tax_name search to replace user tax name with taxon_id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
Summary by Sourcery
This pull request introduces a JSON-oriented model approach by adding a new script for generating query datasets, a new Flask application for handling user queries, and enhancing the existing index and query engine functionalities. It also removes obsolete files to clean up the codebase.
generate_query_dataset.py
to generate sample queries and their corresponding JSON outputs and API queries.src/app.py
to handle user queries and interact with the query engine.build_index
function to use a simplified directory name and a different method for creating the index.load_index
function to load queries from a JSON file and use a question store for query retrieval.GoaTAPIQueryEngine
class to include aquestion_store
and updated thecustom_query
method to format the context string using JSON data from the question store.query_reformulation.py
,rich_queries/queryV1.json
,app.py
, andprompt.py
.