ZeroSearch: Incentivize the Search Capability of LLMs without Searching

Hao Sun, Zile Qiao, Jiayan Guo, Xuanbo Fan, Yingyan Hou
Yong Jiang, Pengjun Xie, Yan Zhang, Fei Huang, Jingren Zhou
Tongyi Lab , Alibaba Group

🔥 News

[2025.05.17] Released a new version of simulation LLMs and policy models.
[2025.05.17] Released the simulation tuning dataset.
[2025.05.17] Added support for three RL algorithms: REINFORCE, GPRO, and PPO.
[2025.05.08] Released the initial codebase and paper.

📌 Introduction

We propose ZeroSearch, a novel reinforcement learning framework that incentivizes the capability of LLMs to use a real search engine with simulated searches during training.
Through supervised fine-tuning, we transform the LLM into a retrieval module capable of generating both relevant and noisy documents in response to a query. We further introduce a curriculum rollout mechanism to progressively elicit the model’s reasoning ability by exposing it to increasingly challenging retrieval scenarios.
We conduct extensive experiments on both in-domain and out-of-domain datasets. Results show that ZeroSearch outperforms real search engine-based models while incurring zero API cost. Moreover, it generalizes well across both base and instruction-tuned LLMs of various sizes and supports different reinforcement learning algorithms.

🛠 Dependencies

conda create -n zerosearch python=3.9
conda activate zerosearch
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
pip install vllm==0.6.3
pip install wandb
pip install serpapi

# verl
pip install -e .

# flash attention 2
pip3 install flash-attn --no-build-isolation

# sglang
# If you encounter package conflicts when trying to install sglang in the current environment, we recommend creating a new environment and installing sglang there.
pip install sglang[all]

📖 Quick Start

(1) Download the training dataset.

huggingface-cli download --repo-type dataset --resume-download sunhaonlp/ZeroSearch_dataset --local-dir ZeroSearch_dataset

# (Optional) Download the Simulation Tuning dataset, required only if you want to train your own simulation LLMs
huggingface-cli download --repo-type dataset --resume-download sunhaonlp/SimulationTuning_dataset --local-dir SimulationTuning_dataset

(2) Download the simulation LLMs.

# Simulation LLMs are available in different parameter sizes. Choose the one that best suits your needs.
# The 14B version is recommended for its stable and reliable simulation performance.
huggingface-cli download --resume-download sunhaonlp/SearchSimulation_3B_V2 --local-dir SearchSimulation_3B

huggingface-cli download --resume-download sunhaonlp/SearchSimulation_7B_V2 --local-dir SearchSimulation_7B

huggingface-cli download --resume-download sunhaonlp/SearchSimulation_14B_V2 --local-dir SearchSimulation_14B

(3) Launch a local simulation server.

# Prompt-based simulation
python -m sglang.launch_server --model-path Qwen2.5-14B-Instruct --host 0.0.0.0 --tp 2 --dp 2 --port 6001

# Fine-tuning-based simulation
python -m sglang.launch_server --model-path SearchSimulation_14B --host 0.0.0.0 --tp 2 --dp 2 --port 6001

(4) Conduct RL training with Qwen2.5-3B.

# Activate the conda environment
conda activate zerosearch

# Set your Google Search API key
export SER_API_KEY=your_api_key

# You can run REINFORCE, GRPO or PPO training using the scripts below. We recommend REINFORCE for its greater training stability.
# The START_THRESHOLD and END_THRESHOLD parameters define the initial and final difficulty levels of the training tasks. Adjusting these values can help optimize model performance.

## Prompt-based simulation
bash train_reinforce.sh NUM_GPUS_PER_NODE 4 MODEL_PATH Qwen2.5-3B DATA_PATH ZeroSearch_dataset TOTAL_STEPS 203 IP localhost SEARCH_MODE simulate_prompt SIMULATION_LLM Qwen2.5-14B-Instruct START_THRESHOLD 0 END_THRESHOLD 0.5
bash train_grpo.sh NUM_GPUS_PER_NODE 4 MODEL_PATH Qwen2.5-3B DATA_PATH ZeroSearch_dataset TOTAL_STEPS 203 IP localhost SEARCH_MODE simulate_prompt SIMULATION_LLM Qwen2.5-14B-Instruct START_THRESHOLD 0 END_THRESHOLD 0.5
bash train_ppo.sh NUM_GPUS_PER_NODE 4 MODEL_PATH Qwen2.5-3B DATA_PATH ZeroSearch_dataset TOTAL_STEPS 203 IP localhost SEARCH_MODE simulate_prompt SIMULATION_LLM Qwen2.5-14B-Instruct START_THRESHOLD 0 END_THRESHOLD 0.5

## Fine-tuning-based simulation
bash train_reinforce.sh NUM_GPUS_PER_NODE 4 MODEL_PATH Qwen2.5-3B DATA_PATH ZeroSearch_dataset TOTAL_STEPS 203 IP localhost SEARCH_MODE simulate_sft SIMULATION_LLM SearchSimulation_14B START_THRESHOLD 0 END_THRESHOLD 0.5
bash train_grpo.sh NUM_GPUS_PER_NODE 4 MODEL_PATH Qwen2.5-3B DATA_PATH ZeroSearch_dataset TOTAL_STEPS 203 IP localhost SEARCH_MODE simulate_sft SIMULATION_LLM SearchSimulation_14B START_THRESHOLD 0 END_THRESHOLD 0.5
bash train_ppo.sh NUM_GPUS_PER_NODE 4 MODEL_PATH Qwen2.5-3B DATA_PATH ZeroSearch_dataset TOTAL_STEPS 203 IP localhost SEARCH_MODE simulate_sft SIMULATION_LLM SearchSimulation_14B START_THRESHOLD 0 END_THRESHOLD 0.5

💡 Performance

📊 Main Results

📊 Compare ZeroSearch with Real Search Engine

📊 Choice of Simulation LLMs

📊 Case Study

🙏 Acknowledgements

This work is implemented based on Search-R1, veRL, and RAGEN. We sincerely thank the authors of these projects for their valuable contributions to the open-source community.

📧 Contact

If you have any questions, feel free to reach out to me via email: [email protected]

🚩Citation

If this work is helpful, please kindly cite as:

@article{sun2025zerosearch,
  title={ZeroSearch: Incentivize the Search Capability of LLMs without Searching},
  author={Sun, Hao and Qiao, Zile and Guo, Jiayan and Fan, Xuanbo and Hou, Yingyan and Jiang, Yong and Xie, Pengjun and Huang, Fei and Zhang, Yan},
  journal={arXiv preprint arXiv:2505.04588},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
assets		assets
docs		docs
llm_agent		llm_agent
verl		verl
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
train_grpo.sh		train_grpo.sh
train_ppo.sh		train_ppo.sh
train_reinforce.sh		train_reinforce.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ZeroSearch: Incentivize the Search Capability of LLMs without Searching

🔥 News

📌 Introduction

🛠 Dependencies

📖 Quick Start

💡 Performance

📊 Main Results

📊 Compare ZeroSearch with Real Search Engine

📊 Choice of Simulation LLMs

📊 Case Study

🙏 Acknowledgements

📧 Contact

🚩Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

Alibaba-NLP/ZeroSearch

Folders and files

Latest commit

History

Repository files navigation

ZeroSearch: Incentivize the Search Capability of LLMs without Searching

🔥 News

📌 Introduction

🛠 Dependencies

📖 Quick Start

💡 Performance

📊 Main Results

📊 Compare ZeroSearch with Real Search Engine

📊 Choice of Simulation LLMs

📊 Case Study

🙏 Acknowledgements

📧 Contact

🚩Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages