Skip to content

performance evaluation of LLM models on Text to SQL

Notifications You must be signed in to change notification settings

Saivignesh-05/LLM_Text2SQL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM_Text2SQL

performance evaluation of LLM models on Text to SQL

Below are the various LLM models tested on the SPIDER Dataset on the Text to SQL problem

Model Name - Parameter size

  1. Mistral - 7B
  2. LLaMA 2 - 7B
  3. WizardLM - 7B
  4. Flan-T5 - 11B
  5. PaLM - 540B

HOW TO RUN THE LLM

All the open-source models were run locally using the module llama-cpp-python. The GGUF files for the open-source models were downloaded from HuggingFace Repositories. For the PaLM model, the PaLM API was used to send requests and receive the results of the query.

Once the gguf files are downloaded, place them in a directory named models. The test set used for this is the dev_set from SPIDER dataset. The test set is in the location : "spider/dev.json. The spider directory must also contain the database to use when we want to provide the schema for the DB along with the user query.

For the PaLM testing, run the following command

  • python main_Palm.py --test "test_file " --schema 1

The --schema 1 queries the database manually for the schema of the database and appends it to the model prompt as additional information. Set it to 0 to not include this information

For the other gguf files, the python file internally uses the llama-cpp-python to run the inference locally.

  • python main.py --model_name "" --test "" --schema ""

Create a folder "results" to store results of the inference. The result file is of csv format which contains

  • The question queried
  • Gold query
  • Predicted query

The file name will be "model_name/with_schema" if schema bit is 1. Otherwise it will be "model_name/without_schema"

Spider dataset provides an evaluator to test the accuracy of the predictions. To run the Python program, use the following command

  • python3 evaluators/evaluation.py --gold "Gold_file --pred "Pred_file" --etype all

The Gold_file must contain only the Gold queries where each query is separated by a newline The pred_file must contain only the predicted queries where each query is separated by a newline

Results on Zero-shot performance of models

exact_without exact_with exec_without exec_with

component_without component_with

sim_without sim_with

CONSISTENCY METRIC

MISTRAL 7B

consistency_mistral

LLaMA 2 7B

consistency_PaLM

Description of each model

1. Mistral LLM

HIGHLIGHTS

  • Uses Grouped-query attention
    • Speeds up inference of the model
    • Reduces memory req during decoding
  • Uses Sliding window attention
    • Handles longer sequences with a reduced computation cost

CAPABILITIES

  • Code generation
  • Reasoning
  • Mathematics

LIMITATIONS

  • Prone to hallucination
  • Prone to prompt injections
  • Low knowledge store due to low parameter size

2. WizardLM LLM

HIGHLIGHTS

  • It is a fine-tuned LLaMA LLM using the evol-instruct method
    • trained with fully evolved instructions
  • Optimized to perform highly complex instructions
  • Outperforms Vicuna and Alpaca

CAPABILITIES

  • instruction-following LLMs
  • Code Generation

LIMITATIONS

  • Prone to hallucination
  • Low knowledge store due to low parameter size

2. LLaMA 2 LLM by META

HIGHLIGHTS

  • Llama 2 is pre-trained using publicly available online data (2 trillion "tokens").
  • Iteratively refined using (RLHF), which includes rejection sampling and proximal policy optimization (PPO)
  • Only open-source model on par with ChatGPT, Anthropic, and PaLM on all general NLP tasks

CAPABILITIES

  • Applied to many different use cases for example
  • Code Generation
  • Sentence completion
  • Summarization
  • Sentiment analysis

LIMITATIONS

  • Prone to hallucination
  • Inappropriate content (if not used responsibly)
  • Potential for bias

Flan-T5 LLM

Highlights

  • Enhanced T5: Builds upon the powerful T5 model with further fine-tuning
  • Multi-task learning: Trained on diverse tasks, making it versatile for various NLP applications.
  • Five sizes: small, base, large, XL, and XXL for different performance and resource requirements.
  • Open-sourced: Accessible through Hugging Face and can be fine-tuned for specific tasks.

CAPABILITIES:

  • Text summarization
  • Question answering
  • Text generation
  • Language Translation

LIMITATIONS:

  • Potential for bias
  • Inappropriate content (if not used responsibly)
  • Significant computational resources for training and inference.

PaLM

HIGHLIGHTS

  • Massive parameter size: advanced reasoning and understanding capabilities.
  • Multi-task learning: Trained on a diverse set of tasks
  • Improved zero-shot and few-shot learning.
  • Handles multiple languages with fluency and accuracy.

CAPABILITIES

  • Advanced reasoning tasks: Solves complex problems, comprehends riddles
  • Question answering
  • Natural language generation: creative text formats like poems, scripts, emails
  • Code understanding and generation: Analyzes existing code, generates new code snippets, and helps with code completion.

Limitations:

  • Potential for bias: Trained on a massive dataset that may contain inherent biases, reflected in its outputs.
  • Ethical considerations: Can generate inappropriate content if not used responsibly.
  • Demands significant computational resources for training and inference.

About

performance evaluation of LLM models on Text to SQL

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages