Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The DARE-TIES experiment. #411

Open
David-AU-github opened this issue Aug 29, 2024 · 4 comments
Open

The DARE-TIES experiment. #411

David-AU-github opened this issue Aug 29, 2024 · 4 comments

Comments

@David-AU-github
Copy link

David-AU-github commented Aug 29, 2024

I just wanted to pass on some "lab" results using dare-ties and mistral nemo.

I created a triple dare-ties merge of 3 pass-through "instruct/fine" models.

Each instruct/fine tune uses the same merge format:

slices:

  • sources:
    • model: g:/11b/Mistral-Nemo-Instruct-2407-12B
      layer_range: [0, 14]
  • sources:
    • model: G:/11B/Rocinante-12B-v1.1
      layer_range: [8, 24]
      parameters:
      scale:
      - filter: o_proj
      value: 1
      - filter: down_proj
      value: 1
      - value: 1
  • sources:
    • model: g:/11b/Mistral-Nemo-Instruct-2407-12B
      layer_range: [14, 22]
      parameters:
      scale:
      - filter: o_proj
      value: .5
      - filter: down_proj
      value: .5
      - value: 1
  • sources:
    • model: g:/11b/Mistral-Nemo-Instruct-2407-12B
      layer_range: [22, 31]
      parameters:
      scale:
      - filter: o_proj
      value: .75
      - filter: down_proj
      value: .75
      - value: 1
  • sources:
    • model: G:/11B/Rocinante-12B-v1.1
      layer_range: [24, 40]
      parameters:
      scale:
      - filter: o_proj
      value: 1
      - filter: down_proj
      value: 1
      - value: 1
      merge_method: passthrough
      dtype: bfloat16

THE DARE-TIES:

models:

  • model: E:/MN-Rocinante-12B-v1.1-Instruct
  • model: E:/MN-magnum-v2.5-12b-kto-Instruct
    parameters:
    weight: .6
    density: .8
  • model: E:/MN-12B-Celeste-V1.9-Instruct
    parameters:
    weight: .38
    density: .6
    merge_method: dare_ties
    tokenizer_source: union
    base_model: E:/MN-Rocinante-12B-v1.1-Instruct
    dtype: bfloat16

What is interesting here is that EACH TIME I run the "dare-ties" it creates a slightly different or VERY DIFFERENT model, despite no changes in the the models nor the settings.

This shows up in PPL and "real world" tests.
PPL range of 7.7327 to 7.8024 ... and that is on just 10 generations.

Real world testing the "core" changes -> wow.
Attibute, scale, word choice, sentence structure,... changes across the board.

I am not sure if this is a mistral nemo artifact or not.

From these 10, I did some merging of these using breadcrumbs ; wow.
All I can say.

When everything is F32 ... they shine even brighter.

With enough generations + merging of the "best DNA" could create truly legendary model(s).

Just saying - job well done and then some!!!

NOTE: Models for "fine/instruct" and "DARE-TIES" supermerges are posted at my repo.

@CasualDev242
Copy link

If DARE-Ties gives dramatically different results each time, maybe I don't understand it correctly, but that sounds less like a good thing and more like a bad thing.

@David-AU-github
Copy link
Author

If DARE-Ties gives dramatically different results each time, maybe I don't understand it correctly, but that sounds less like a good thing and more like a bad thing.

This all depends... in my first case it was bad, because I deleted the source and found out the hard way... and it was a great version.
That being said, in creating 10+ versions, the "Dna" of each model can be mapped, and these combined creating stronger models with specific attributes while reducing the negative ones.

One of the open questions is: Does this apply to other archs too? Llama2? 3? 3.1? ...
And some of the other mergekit methods also involve this same type of "random pruning"... too.
I mapped these out after looking at the programming code to verify operations.

A more interesting method or change may be pruning controls for DARE TIES , which limit the range.

@cg123
Copy link
Collaborator

cg123 commented Aug 31, 2024

Thanks for sharing your results here!

DARE-TIES does have a randomized element, yeah - it's part of the algorithm by design. If you want more reproducible merges you can set a random seed by passing --random-seed <N> on the command line. I usually do when I'm iterating on a recipe that involves DARE.

@David-AU-github
Copy link
Author

Thanks for sharing your results here!

DARE-TIES does have a randomized element, yeah - it's part of the algorithm by design. If you want more reproducible merges you can set a random seed by passing --random-seed <N> on the command line. I usually do when I'm iterating on a recipe that involves DARE.

*** Thank you ; that was one of the questions I had ; thanks again ... I think there is so much untapped potential in mergekit yet to be discovered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants