Skip to content

neatnettech/ollama_chatgpt_private

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ChatGPT Data Preparation and Fine-Tuning Pipeline

Python MIT License Hugging Face

A project to extract, preprocess, and fine-tune datasets for large language models like CodeQwen using ChatGPT data.


Project Structure

.
├── Makefile                       # Automates tasks like data extraction and export
├── README.md                      # Project documentation
├── chatgpt_export/                # Directory containing exported ChatGPT data
│   ├── conversations/             # Individual conversations in plain text
│   ├── conversations.json         # Exported conversations
│   ├── prepared_data/             # Processed data for fine-tuning
│   └── ...                        # Other exported metadata
├── notebooks/                     # Jupyter notebooks for fine-tuning
│   └── fine_tune_codeqwen.ipynb   # Notebook for fine-tuning CodeQwen
└── scripts/                       # Python scripts for data preparation
    ├── export_to_conversations.py # Converts extracted data into conversational format
    └── extract_prompts.py         # Extracts prompts from ChatGPT data

Getting Started

Prerequisites

  • Python 3.8 or higher
  • Required Python packages (install via requirements.txt):
    pip install -r requirements.txt
  • Exported data from ChatGPT, or any other training data

How to Use

  1. Extract Prompts: Run the extract_prompts.py script to extract prompts from ChatGPT-exported data:

    make extract
  2. Export to Conversations: Format the extracted prompts into a conversational format:

    make export
  3. Fine-Tune: Use the formatted data to fine-tune a large language model. Refer to notebooks/fine_tune_codeqwen.ipynb for detailed steps.

  4. Clean Up: Remove intermediate and output files:

    make clean

License

This project is licensed under the MIT License.


Contributing

We welcome contributions! Feel free to open an issue or submit a pull request.

Once I get my GPU, this will accelerate !


Acknowledgments

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published