LOLA — An Open-Source Massively Multilingual Large Language Model

LOLA is a massively multilingual large language model trained on more than 160 languages using a sparse Mixture-of-Experts Transformer architecture. Evaluation results shows competitive performance in natural language generation and understanding tasks. As an open-source model, LOLA promotes reproducibility and serves as a robust foundation for future research.

The final model weights, trained using the Deepspeed-Megatron framework, are available at: https://files.dice-research.org/projects/LOLA/large/global_step296000/

Additional information about the model, along with its HuggingFace 🤗 implementation, can be found at: https://huggingface.co/dice-research/lola_v1

Note: This repository is a detached fork of https://github.com/microsoft/Megatron-DeepSpeed. It contains the training source code for LOLA, which can be mainly found in lola_ws/. Some of the implementations from the original source have been modified within this fork for our use-case.

The original README.md can be found here: archive/README.md

What can I do with this repository?

This repository contains various utilities and implementations that can be used within the context of LOLA or adapted for other similar projects. Below is a list of key functionalities provided by this code repository:

1. Fine-tune

You can find the scripts for fine-tuning the model in the lola_ws/fine-tune directory. We recommend using the PEFT-based implementations. An example implementation to train the model on Alpaca instructions format using LORAs on top of our model can be found here: lola_ws/fine-tune/alpaca_instructions/lora_peft/. To fine-tune LORAs on your own custom task, please refer to the example script provided here: lola_ws/fine-tune/custom. The scripts can be easily adapted for other similar (decoder-only) models or datasets.

2. Perform Mixture-of-Experts (MoE) Analysis

To conduct your own analysis of the LOLA MoE routing, you can reuse the scripts in lola_ws/moe_analysis.
Note: Some scripts are configured for a specific SLURM-based computing cluster, such as noctua2. Feel free to modify them for your own use case.

3. Pretrain

You can pretrain a similar model from scratch or continue training the LOLA model using the script: lola_ws/gpt/run-gpt3-moe-pretrain.sh.
To prepare the CulturaX dataset for pretraining, refer to this README: lola_ws/README.md.

4. Reuse Code

If you plan to train your own model using frameworks like Megatron or Megatron-DeepSpeed, the scripts in lola_ws/ can be especially useful. For preprocessing large datasets, we included a distributed implementation inspired by Megatron-LM/issues/492. This approach significantly improves efficiency on computing clusters with ample CPU resources.

Citation

If you use this code or data in your research, please cite our work:

@inproceedings{srivastava-etal-2025-lola,
  author    = {Nikit Srivastava and Denis Kuchelev and Tatiana Moteu Ngoli and Kshitij Shetty and Michael Röder and Hamada Zahera and Diego Moussallem and Axel-Cyrille Ngonga Ngomo},
  title     = {{LOLA} -- An Open-Source Massively Multilingual Large Language Model},
  booktitle = {Proceedings of the 31st International Conference on Computational Linguistics},
  editor    = {Owen Rambow and Leo Wanner and Marianna Apidianaki and Hend Al-Khalifa and Barbara Di Eugenio and Steven Schockaert},
  month     = jan,
  year      = {2025},
  address   = {Abu Dhabi, UAE},
  publisher = {Association for Computational Linguistics},
  pages     = {6420--6446},
  url       = {https://aclanthology.org/2025.coling-main.428/},
  note      = {arXiv:2409.11272 [cs.CL]},
}

Name		Name	Last commit message	Last commit date
Latest commit History 2,458 Commits
archive		archive
dataset		dataset
docs		docs
examples		examples
examples_deepspeed		examples_deepspeed
images		images
lola_ws		lola_ws
megatron		megatron
tasks		tasks
tests		tests
tools		tools
.coveragerc		.coveragerc
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
CODEOWNERS		CODEOWNERS
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
SECURITY.md		SECURITY.md
dist-sample.sh		dist-sample.sh
pretrain_bert.py		pretrain_bert.py
pretrain_gpt.py		pretrain_gpt.py
pretrain_gpt_core.py		pretrain_gpt_core.py
pretrain_ict.py		pretrain_ict.py
pretrain_retro.py		pretrain_retro.py
pretrain_t5.py		pretrain_t5.py
pretrain_t5_moe.py		pretrain_t5_moe.py
pretrain_vision_classify.py		pretrain_vision_classify.py
pretrain_vision_dino.py		pretrain_vision_dino.py
pretrain_vision_inpaint.py		pretrain_vision_inpaint.py
run-dist-sample.sh		run-dist-sample.sh
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LOLA — An Open-Source Massively Multilingual Large Language Model

What can I do with this repository?

1. Fine-tune

2. Perform Mixture-of-Experts (MoE) Analysis

3. Pretrain

4. Reuse Code

Citation

About

Contributors 89

Languages

License

dice-group/LOLA

Folders and files

Latest commit

History

Repository files navigation

LOLA — An Open-Source Massively Multilingual Large Language Model

What can I do with this repository?

1. Fine-tune

2. Perform Mixture-of-Experts (MoE) Analysis

3. Pretrain

4. Reuse Code

Citation

About

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Contributors 89

Languages