Make sure to download and unzip the Spider dataset from the Spider website to th local_data
directory. This is used for the topic generation example. Make sure to download the KaggleDBQA dataset from the KaggleDBQA repository and place it in the local_data
directory.
This project uses Poetry for dependency management. This project was written using Python 3.12.3. Ensure that you have installed the necessary packages for SynQL, and activate the environment. You can do this using poetry:
cd synql
poetry install
poetry shell
cd ..
cd runner
Make sure you have set your .env
file at the root of the project with the following variables:
OPENAI_API_KEY=<Your API Key Here>
To generate topics for a given dataset, you can use the topic.py
script. We have provided an example script that uses the Spider dataset.
python topic.py --config configs/topic_example.json
This will generate topics for the Spider dataset and save them to the local_data
directory.
We have two examples of joint generation: one for batch generation and the other for real-time generation. Batch generation is useful for generating a large number of examples at once at a discounted inference cost (see here), while real-time generation is useful for generating examples on-the-fly.
python joint.py --config configs/joint_example.json
python batch.py --config configs/batch_example.json