🎓 Wand University: Debate-Driven Research Enhancement 🤖

Welcome to the Wand University repository, a platform for advancing AI research through dynamic debate and iterative knowledge refinement. This system utilizes a novel approach, combining structured debate with adaptive paper analysis to enhance the capabilities of our AI agents. The core idea revolves around agents engaging in debates to refine their understanding and generate high quality training data, creating a self-improving cycle of knowledge acquisition.

📚 Core Capabilities

This repository provides a robust set of tools for:

🔍 Dynamic arXiv Paper Exploration: Agents initiate research with broad queries that dynamically evolve based on debate outcomes and knowledge gaps. This ensures exploration of diverse but related research, creating a comprehensive knowledge base.
💭 Structured Debate Generation: We utilize constrained generation templates to extract key arguments from research papers, fostering a structured debate format.
🎲 High-Temperature Sampling with min_p: Creative exploration is encouraged using high-temperature sampling with min_p, resulting in diverse viewpoints and innovative arguments within debates.
📊 Multi-Perspective Argument Synthesis: The system synthesizes multiple perspectives through structured evaluations and then refines and preserves arguments, leading to a cohesive and well-rounded understanding.
🔄 Iterative Knowledge Refinement: Through debate cycles, the system refines its knowledge, adapting search terms based on emerging research directions, and preserving these insights in structured debate records.
🚀 LoRA Fine-tuning: We are able to take our acquired data and fine-tune an LLM on it as well as finetune using other datasets like EQ-Bench and GPQA.
🧪 Evaluation: We have integrated the lm-evaluation-harness to evaluate our models after fine-tuning to track their performance.

🤖 The Enhancement Process

Paper Discovery:
- The process begins with a broad arXiv query based on current research focus.
- The system fetches relevant papers and selects one for in-depth analysis.
- Subsequent queries adapt based on debate outcomes and identified knowledge gaps to ensure diverse yet relevant exploration.
Structured Analysis:
- The selected paper is processed, and key arguments and concepts are extracted using constrained generation templates.
- These extractions are transformed into general knowledge question-answer pairs to facilitate knowledge generalization and broader understanding.
Debate Generation:
- Using high-temperature sampling with min_p, the system generates diverse, opposing viewpoints for each question-answer pair.
- This approach encourages creative exploration and deepens understanding through debate.
Argument Synthesis:
- Generated arguments are evaluated for accuracy, clarity, and generalizability by an evaluation agent that provides supporting and critical feedback.
- The system determines which QA pairs to keep or discard based on this evaluation process.
Query Evolution:
- The evaluation agent also proposes a next search query based on knowledge gaps and promising research directions found in the paper and arguments, steering the research process.
Knowledge Integration:
- Accepted question-answer pairs are stored in a structured format for future fine-tuning of language models and are augmented with supporting and challenging arguments, facilitating a transparent and iterative knowledge-building process.
LoRA Fine-tuning:
- A LoRA (Low-Rank Adaptation) fine-tuning process is initiated, either using our custom-generated data, or by using datasets like EQ-Bench and GPQA.
- This fine-tuning enhances the model’s ability to understand and apply the acquired knowledge.
Evaluation:
- We then evaluate our fine-tuned models with the lm-evaluation-harness in order to track our model's performance on datasets of interest, including GPQA.

🎯 Performance Tracking

We use Weights & Biases to monitor the debate and research process. Key metrics tracked include:

Argument Diversity Metrics: Assessing the range of perspectives generated during debates.
Knowledge Evolution Patterns: Observing how the system refines its understanding over time.
Search Query Effectiveness: Evaluating the relevance and diversity of papers retrieved by evolving queries.
Debate Quality Assessment: Monitoring the coherence, accuracy, and depth of arguments generated.
Evaluation Metrics: tracking metrics like eq_bench_score and percent_parseable on the EQ-Bench dataset and tracking loss on GPQA.

⚙️ Code Structure

The repository is structured as follows:

wand_university.py: This is the primary script containing the core logic for dynamic paper analysis, structured debate, and knowledge synthesis. It is the starting point for running the Wand University knowledge acquisition system. It has the following main functionalities:
- load_arxiv_papers: Fetches papers from arXiv based on a given query.
- generate_qa: Creates question-answer pairs based on the content of a research paper, with a specified system prompt to guide the generation process.
- evaluate_qa: Evaluates the generated question-answer pairs, providing arguments for and against their inclusion, and recommending a next search query.
- archive_enchanted_dialogues: Archives the accepted QA pairs along with the supporting and challenging arguments into a structured CSV file.
- synthesize_and_evaluate_knowledge: Manages the overall process of knowledge synthesis and evaluation, orchestrating the debate cycles.
- Main loop: Sets up WandB tracking, initializes the language model (vLLM), iterates through research cycles, saves the knowledge, and initiates a LoRA fine-tuning and evaluation process.
wand_test_eqbench.py: Script for LoRA fine-tuning using the EQ-Bench dataset, as well as performing a final evaluation of the trained model.
wand_test_gpqa.py: Script for LoRA fine-tuning using the GPQA dataset, as well as performing a final evaluation of the trained model.
wand_university_training_grimoire_test.csv: CSV archive where the validated knowledge exchanges are stored.

⚠️ System Requirements

Important: This system requires significant computational resources:

Minimum 4x NVIDIA A100 GPUs: For parallel debate simulation using vLLM.
Additional GPU(s): For paper processing, LoRA fine-tuning, and evaluation.
vLLM Server: Must be launched separately to facilitate efficient language model inference.
lm-evaluation-harness: This framework is used for the evaluation of the fine-tuned models.

🛠️ Setup and Installation

Clone the Repository:

git clone https://github.com/ai-wand/wand-university.git
cd wand-university

Install Dependencies:
```
pip install -r requirements.txt
```

Note that this repository also assumes the existence of a configured lm-evaluation-harness setup. The requirements can be found in the root of that repository at lm-evaluation-harness/requirements.txt.

Set Up vLLM Server:
- Ensure the vLLM server is running and accessible.
  - It should be launched with enough GPUs specified to meet your requirements.
  - You can look at the vllm documentation for more details on how to set this up.
Configure WandB:
- Set up your Weights & Biases account and ensure that the API key is configured in your environment.
Run the Wand University System:
```
 python wand_university.py
```
- This will launch the system, which iteratively searches, debates, and refines knowledge and then initiates the fine-tuning process, and finally evaluates the fine-tuned model.
Run the Standalone Fine-tuning and Evaluation Scripts (Optional):
- You can optionally run these scripts separately for LoRA finetuning on EQ-Bench and GPQA:
```
    python wand_test_eqbench.py
    python wand_test_gpqa.py
```

🚀 Getting Started

Initial Research Focus: The system starts with the initial research focus defined in wand_university.py. Modify this to match your initial area of research.
Monitoring: Monitor the progress and results via Weights & Biases dashboards. You can check the logs to see the evolution of the search queries, the debate cycles, and evaluation metrics.
Customization: Adjust parameters, such as sampling temperature, min_p, and debate rounds, in wand_university.py to experiment with different configurations.
Dataset Focus: The fine-tuning scripts wand_test_eqbench.py and wand_test_gpqa.py are set up to fine-tune on the respective datasets. If you want to change to your own dataset please change the code accordingly.
LoRA Rank: If you want to change the LoRA Rank or alpha, please modify the lora_r and lora_alpha variables in the fine-tuning scripts.

🤝 Contributing

Contributions are welcome! Please submit a pull request with your enhancements.

📜 Acknowledgements

This project uses:

vLLM: For efficient language model inference.
lm-format-enforcer: For structured output generation.
transformers: For model loading and training.
lm_eval: For model evaluation.
llama-index: For interacting with arXiv papers.
peft: For Parameter Efficient Fine-Tuning (LoRA).
Weights & Biases: For experiment tracking.

Let us know if you have any questions!

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
README.md		README.md
accelerator.tar.xz		accelerator.tar.xz
accelerator.zip		accelerator.zip
accelerator2.zip		accelerator2.zip
accelerator_wand.zip		accelerator_wand.zip
archive.tar.gz		archive.tar.gz
wand_agent_demo.ipynb		wand_agent_demo.ipynb
wand_test_eqbench.py		wand_test_eqbench.py
wand_test_gpqa.py		wand_test_gpqa.py
wand_test_gsm8k.py		wand_test_gsm8k.py
wand_university.ipynb		wand_university.ipynb
wand_university.py		wand_university.py
wand_university_training_grimoire_test.csv		wand_university_training_grimoire_test.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎓 Wand University: Debate-Driven Research Enhancement 🤖

📚 Core Capabilities

🤖 The Enhancement Process

🎯 Performance Tracking

⚙️ Code Structure

⚠️ System Requirements

🛠️ Setup and Installation

🚀 Getting Started

🤝 Contributing

📜 Acknowledgements

About

Releases

Packages

Languages

ai-wand/wand-university

Folders and files

Latest commit

History

Repository files navigation

🎓 Wand University: Debate-Driven Research Enhancement 🤖

📚 Core Capabilities

🤖 The Enhancement Process

🎯 Performance Tracking

⚙️ Code Structure

⚠️ System Requirements

🛠️ Setup and Installation

🚀 Getting Started

🤝 Contributing

📜 Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages