Clarification about workflow #15

rragundez · 2024-07-16T07:04:58Z

I see that the CLI receives as a parameter user_database , just to clarify:

does this refers to the Vector Database in the Architecture diagram and not to the SQL database?

Are the queries based on a single table or does it also relates several tables by means of joins?

If I understand correctly the LLM prompt creation before hitting the SQL Generator joins a lot of elements, like the query + some context about the tables, can you share an example of that prompt, because I guess it would need to add a lot of information in raw format (not vector format).

The text was updated successfully, but these errors were encountered:

orpzs · 2024-07-16T10:16:47Z

@rragundez The user_database refers to the schema/dataset against which you would ask questions. It filters the various stuff from the vector embedding tables. This essentially helps to decide which source you are trying to talk from the vector embedding table.

Queries can be asked on multiple tables if they are all in the same schema/dataset.

We are soon going to release the v2 of this solution with multi turn support which has prompts pulled outside in a separate file where you can view the prompts. For now to understand the prompt you have to go to each .py files in /agents and /dbconnectors folders

rragundez · 2024-07-16T10:26:32Z

If I understand you correctly, a caveat is that it will not be able to relate tables if they are in different big query datasets but if they are in the same big query dataset it will be able to relate tables by means of joins in the generated sql query?

Can you point me to the relevant prompt py file before hitting the SQL generator? Thanks.

orpzs · 2024-07-24T07:23:36Z

Yes, for the main branch your statement holds true. If you have a use case which requires the ability to join tables from different biguqery datasets, please consider using v2-draft branch. Word of caution: v2 version has more features like multi turn support yaml based reading of prompts dynamically etc. Please go through the readme.

On the part regarding code for SQL generation please have a look at /agents/BuildSQLAgent.py

hnegi01 · 2024-08-12T21:52:03Z

Hello,

Do you think we should generate and add contents for knowgoodsql whenever a successful query is generated.
As sometimes for the same question the response is generated successfully and sometimes it gives errors.I was testing using bigquery public dataset sample_ecom.

My concern is if it generated a wrong query(not syntax) which will give wrong sql results and if we store that then we are always using the same sql. Maybe a flag needs to be added in the knowgoodsql to label it using human intervention so that only used when set as correct?

Thanks

orpzs · 2024-08-19T07:15:07Z

Yes, the functionality is already provided in the demo UI we have. There is a thumbs up button that inserts the question and SQL back to the SQL embedding table. Please have a look at the backend API section where you can see a URI /embed_sql (Backend APIs).

orpzs self-assigned this Jul 16, 2024

orpzs added the question Further information is requested label Jul 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification about workflow #15

Clarification about workflow #15

rragundez commented Jul 16, 2024

orpzs commented Jul 16, 2024

rragundez commented Jul 16, 2024

orpzs commented Jul 24, 2024 •

edited

Loading

hnegi01 commented Aug 12, 2024

orpzs commented Aug 19, 2024

Clarification about workflow #15

Clarification about workflow #15

Comments

rragundez commented Jul 16, 2024

orpzs commented Jul 16, 2024

rragundez commented Jul 16, 2024

orpzs commented Jul 24, 2024 • edited Loading

hnegi01 commented Aug 12, 2024

orpzs commented Aug 19, 2024

orpzs commented Jul 24, 2024 •

edited

Loading