Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification about workflow #15

Open
rragundez opened this issue Jul 16, 2024 · 5 comments
Open

Clarification about workflow #15

rragundez opened this issue Jul 16, 2024 · 5 comments
Assignees
Labels
question Further information is requested

Comments

@rragundez
Copy link

I see that the CLI receives as a parameter user_database , just to clarify:

  • does this refers to the Vector Database in the Architecture diagram and not to the SQL database?

Are the queries based on a single table or does it also relates several tables by means of joins?

If I understand correctly the LLM prompt creation before hitting the SQL Generator joins a lot of elements, like the query + some context about the tables, can you share an example of that prompt, because I guess it would need to add a lot of information in raw format (not vector format).

@orpzs orpzs self-assigned this Jul 16, 2024
@orpzs
Copy link
Collaborator

orpzs commented Jul 16, 2024

@rragundez The user_database refers to the schema/dataset against which you would ask questions. It filters the various stuff from the vector embedding tables. This essentially helps to decide which source you are trying to talk from the vector embedding table.

Queries can be asked on multiple tables if they are all in the same schema/dataset.

We are soon going to release the v2 of this solution with multi turn support which has prompts pulled outside in a separate file where you can view the prompts. For now to understand the prompt you have to go to each .py files in /agents and /dbconnectors folders

@rragundez
Copy link
Author

If I understand you correctly, a caveat is that it will not be able to relate tables if they are in different big query datasets but if they are in the same big query dataset it will be able to relate tables by means of joins in the generated sql query?

Can you point me to the relevant prompt py file before hitting the SQL generator? Thanks.

@orpzs
Copy link
Collaborator

orpzs commented Jul 24, 2024

Yes, for the main branch your statement holds true. If you have a use case which requires the ability to join tables from different biguqery datasets, please consider using v2-draft branch. Word of caution: v2 version has more features like multi turn support yaml based reading of prompts dynamically etc. Please go through the readme.

On the part regarding code for SQL generation please have a look at /agents/BuildSQLAgent.py

@orpzs orpzs added the question Further information is requested label Jul 24, 2024
@hnegi01
Copy link

hnegi01 commented Aug 12, 2024

Hello,

Do you think we should generate and add contents for knowgoodsql whenever a successful query is generated.
As sometimes for the same question the response is generated successfully and sometimes it gives errors.I was testing using bigquery public dataset sample_ecom.

My concern is if it generated a wrong query(not syntax) which will give wrong sql results and if we store that then we are always using the same sql. Maybe a flag needs to be added in the knowgoodsql to label it using human intervention so that only used when set as correct?

Thanks

@orpzs
Copy link
Collaborator

orpzs commented Aug 19, 2024

Yes, the functionality is already provided in the demo UI we have. There is a thumbs up button that inserts the question and SQL back to the SQL embedding table. Please have a look at the backend API section where you can see a URI /embed_sql (Backend APIs).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants