-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Freeform SQL execution with similarity indexes #24
Conversation
Trivy scanning results. |
fe3ed7c
to
6ebd9f2
Compare
Code Coverage Summary
Diff against main
Results for commit: e25280c Minimum allowed coverage is ♻️ This comment has been updated with latest results |
@@ -91,22 +154,28 @@ async def ask( | |||
|
|||
async def _generate_sql( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ludwiktrammer wdyt about introducing strategy pattern here?
Initially we would have 2 strategies:
- in case there is no similarity indexes defined we use prompt without "parameters" logic (to reduce token usage)
- with indexes defined we use parameter binding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, although to be honest I like the decoupling of user-provided values and the SQL code as a general idea, even without similarity indexes. It seems to me that it makes SQL injections less likely (although still possible) and gives us more visibility.
if "name" not in data or not isinstance(data["name"], str): | ||
raise ValueError("Parameter name should be a string") | ||
|
||
if "value" not in data or not isinstance(data["value"], str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why value must be a string? JSON supports integers and for some questions GPT returned it which was causing Exceptions. Is it expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I was thinking about the fact that only string values can be passed to similarity indexes, but the LLM is not wrong to use placeholders for values of other types here. I will fix this.
@@ -91,22 +156,28 @@ async def ask( | |||
|
|||
async def _generate_sql( | |||
self, query: str, conversation: PromptTemplate, llm_client: LLMClient, event_tracker: EventTracker | |||
) -> Tuple[str, PromptTemplate]: | |||
) -> Tuple[str, List[SQLParameterOption], PromptTemplate]: | |||
response = await llm_client.text_generation( | |||
template=conversation, | |||
fmt={"tables": self._get_tables_context(), "dialect": self._engine.dialect.name, "question": query}, | |||
event_tracker=event_tracker, | |||
) | |||
|
|||
conversation = conversation.add_assistant_message(response) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somehow AzureOpen AI still returns this json {}
format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean the ```json
marking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
@@ -91,22 +156,28 @@ async def ask( | |||
|
|||
async def _generate_sql( | |||
self, query: str, conversation: PromptTemplate, llm_client: LLMClient, event_tracker: EventTracker | |||
) -> Tuple[str, PromptTemplate]: | |||
) -> Tuple[str, List[SQLParameterOption], PromptTemplate]: | |||
response = await llm_client.text_generation( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All {
and }
characters should be escaped. If they are not prompt_builder fails in the case of retried query as this JSON response is appended to the conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the prompt builder fail? Generally, I believe that strings should be escaped when they are put into medium that needs the escaping, before that it should live in its canonical non-escaped form. So most likely it's prompt builder itself that should do the escaping - although its hard to tell, since I haven't seen the exact error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exact message is just what I pasted below. It fails on the string.format() as format
function expects placeholder to be between {} It interprets JSON as a big placeholder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this may be tricky to implement it inside prompt builder as it is not awere of what is placeholder and what is JSON
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the problem happening in db-ally's code or your code? Generally speaking prompt consists of two parts: prompt template (which is static) and dynamic values that are injected into placeholders defined in the template. The json is a dynamic value. The root of the problem seems to be that it is added as part of the template itself, and not as a value injected into the template?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the db-ally code. I just copied it to the notebook to showcase the problem clearly. It occurs because in _generate_sql
method in 166
line we do add_assistant_message
and it is just string with JSON, this causes this exception.
@@ -65,15 +130,15 @@ async def ask( | |||
# We want to catch all exceptions to retry the process. | |||
# pylint: disable=broad-except | |||
try: | |||
sql, conversation = await self._generate_sql(query, conversation, llm_client, event_tracker) | |||
sql, parameters, conversation = await self._generate_sql(query, conversation, llm_client, event_tracker) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an output we have parameterized sql query without true values, is this behaviour expected? Should the user take care of it by themselves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by "true values"? The user-provided values should be present under "value" for each parameter object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right so at the pointed place the sql
variable will look like
SELECT * FROM table WHERE X = :placeholder
From this point we do not modify sql
variable to fill these :placeholder. If someone runs it in dry_run
mode we do not even provide parameters
. So if someone want to execute this query later, how can they do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I get it now. Here, as an output of _generate_sql
we definitely want the parameterized SQL but the ViewExecutionResult
returned by ask
should probably include the parameters, in one way or another.
No description provided.