Skip to content

iMmOrTaL2121/geometric_median_llm_merging

Repository files navigation

LLM-Merging: Building LLMs Efficiently through Merging

This repository contains our submission for the LLM-Merging competition.

Training high-performing large language models (LLMs) from scratch is a notoriously expensive and challenging task. The LLM Merging Competition (NeurIPS 2024 Challenge): Building LLMs Efficiently through Merging promotes research in model merging techniques, where pretrained LLMs are fine-tuned for specific tasks and then combined to generate LLMs that can perform well across a wide variety of skills, such as reasoning, coding, math, chat, and tool use. Our submission to the competition introduces an approach that associates each pretrained LLM with a "task vector" relative to a “Base LLM”. These “task vectors” are derived from the LoRA (Low Rank Adaptation) weights of pretrained LLM's. We compute the geometric median of these task vectors in a high-dimensional space, applying Weiszfeld’s iterative algorithm and adding it to pretrained base LLM, effectively merging the models to generalize their capabilities and achieve state-of-the-art results on benchmark tests.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published