Skip to content

Latest commit

 

History

History
45 lines (31 loc) · 1.78 KB

README.md

File metadata and controls

45 lines (31 loc) · 1.78 KB

Example Runner

Setup

Make sure to download and unzip the Spider dataset from the Spider website to th local_data directory. This is used for the topic generation example. Make sure to download the KaggleDBQA dataset from the KaggleDBQA repository and place it in the local_data directory.

This project uses Poetry for dependency management. This project was written using Python 3.12.3. Ensure that you have installed the necessary packages for SynQL, and activate the environment. You can do this using poetry:

cd synql

poetry install

poetry shell

cd ..

cd runner

Make sure you have set your .env file at the root of the project with the following variables:

OPENAI_API_KEY=<Your API Key Here>

Topic Generation

To generate topics for a given dataset, you can use the topic.py script. We have provided an example script that uses the Spider dataset.

python topic.py --config configs/topic_example.json

This will generate topics for the Spider dataset and save them to the local_data directory.

Joint Generation

We have two examples of joint generation: one for batch generation and the other for real-time generation. Batch generation is useful for generating a large number of examples at once at a discounted inference cost (see here), while real-time generation is useful for generating examples on-the-fly.

Real-Time Generation

python joint.py --config configs/joint_example.json

Batch Generation

python batch.py --config configs/batch_example.json