-
Notifications
You must be signed in to change notification settings - Fork 439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The DARE-TIES experiment. #411
Comments
If DARE-Ties gives dramatically different results each time, maybe I don't understand it correctly, but that sounds less like a good thing and more like a bad thing. |
This all depends... in my first case it was bad, because I deleted the source and found out the hard way... and it was a great version. One of the open questions is: Does this apply to other archs too? Llama2? 3? 3.1? ... A more interesting method or change may be pruning controls for DARE TIES , which limit the range. |
Thanks for sharing your results here! DARE-TIES does have a randomized element, yeah - it's part of the algorithm by design. If you want more reproducible merges you can set a random seed by passing |
*** Thank you ; that was one of the questions I had ; thanks again ... I think there is so much untapped potential in mergekit yet to be discovered. |
I just wanted to pass on some "lab" results using dare-ties and mistral nemo.
I created a triple dare-ties merge of 3 pass-through "instruct/fine" models.
Each instruct/fine tune uses the same merge format:
slices:
layer_range: [0, 14]
layer_range: [8, 24]
parameters:
scale:
- filter: o_proj
value: 1
- filter: down_proj
value: 1
- value: 1
layer_range: [14, 22]
parameters:
scale:
- filter: o_proj
value: .5
- filter: down_proj
value: .5
- value: 1
layer_range: [22, 31]
parameters:
scale:
- filter: o_proj
value: .75
- filter: down_proj
value: .75
- value: 1
layer_range: [24, 40]
parameters:
scale:
- filter: o_proj
value: 1
- filter: down_proj
value: 1
- value: 1
merge_method: passthrough
dtype: bfloat16
THE DARE-TIES:
models:
parameters:
weight: .6
density: .8
parameters:
weight: .38
density: .6
merge_method: dare_ties
tokenizer_source: union
base_model: E:/MN-Rocinante-12B-v1.1-Instruct
dtype: bfloat16
What is interesting here is that EACH TIME I run the "dare-ties" it creates a slightly different or VERY DIFFERENT model, despite no changes in the the models nor the settings.
This shows up in PPL and "real world" tests.
PPL range of 7.7327 to 7.8024 ... and that is on just 10 generations.
Real world testing the "core" changes -> wow.
Attibute, scale, word choice, sentence structure,... changes across the board.
I am not sure if this is a mistral nemo artifact or not.
From these 10, I did some merging of these using breadcrumbs ; wow.
All I can say.
When everything is F32 ... they shine even brighter.
With enough generations + merging of the "best DNA" could create truly legendary model(s).
Just saying - job well done and then some!!!
NOTE: Models for "fine/instruct" and "DARE-TIES" supermerges are posted at my repo.
The text was updated successfully, but these errors were encountered: