Replies: 9 comments 10 replies
-
Wow! unbeliveable!
I will try this method sooon! |
Beta Was this translation helpful? Give feedback.
-
#121 this is a minimal fix. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the implementation! Any thoughts on potential effectiveness as a merge method as compared to the standard options? I'll definitely do my own testing, but appreciate any quick thoughts on the idea in general from your perspective. Thx! |
Beta Was this translation helpful? Give feedback.
-
def dare_weights(delta, p):
# Calculate the delta of the weights
#delta = tensor2 - tensor1
# Generate the mask m^t from Bernoulli distribution
#m = torch.from_numpy(np.random.binomial(1, p, delta.shape)).to(tensor1.dtype) # slow
# randomly select delta with p probability.
m = torch.bernoulli(torch.full_like(input=delta.float(), fill_value=p)).to(delta.dtype)
# Apply the mask to the delta to get δ̃^t
delta_tilde = m * delta
# Scale the masked delta by the dropout rate to get δ̂^t
delta_hat = delta_tilde / (1 - p)
return delta_hat DARE is similar with Add-Difference (
|
Beta Was this translation helpful? Give feedback.
-
Edit: Seems like my A1111 install works w/ the current commit/DARE w/o issue so probably something w/ Forge or my installation of that. DARE doesn't seem to work for me (at least in Forge?), and have gotten some errors that seem to cause instability for the rest of the session. It's late here so I can't try to do a full outline to reproduce/etc, but errors I've seen often are: File does not contain tensor alphas_cumprod
|
Beta Was this translation helpful? Give feedback.
-
@wkpark FYI, there were commits to Forge morning that addressed "alphas_cumprod": lllyasviel/stable-diffusion-webui-forge@b9705c5 lllyasviel/stable-diffusion-webui-forge@72139b0 Assume that may fix what you had to fix the other day. |
Beta Was this translation helpful? Give feedback.
-
fixed again by commit a22a29b (fixed mistake of the last commit a22a29b) |
Beta Was this translation helpful? Give feedback.
-
def dare_merge(tensor0, tensor1, alpha, p):
# Calculate the delta of the weights
#delta = tensor1 - tensor0
# Generate the mask m^t from Bernoulli distribution
#m = torch.from_numpy(np.random.binomial(1, p, theta0.shape)).to(tensor1.dtype) # slow
m = torch.bernoulli(torch.full_like(input=theta0.float(), fill_value=p))
# Apply the mask to the delta to get δ̃^t
#delta_tilde = m * delta
# Scale the masked delta by the dropout rate to get δ̂^t
#return torch.add(theta0.float(), delta_hat.float(), alpha=alpha).to(theta0.dtype)
#delta_hat = delta * m / (1 - p)
#return = tensor0 + delta_hat * alpha = tensor0 + (tensor1 - tensor0) * m / (1 - p) * alpha
alpha = alpha * m / (1-p)
return torch.lerp(theta0.float(), theta1.float(), alpha).to(theta0.dtype) DARE is similar with Add-Difference (
|
Beta Was this translation helpful? Give feedback.
-
@wkpark - Off-topic but I suggested you as someone who might be able to assist w/ a "general" developer question in regards to model loading in A1111/Forge. A developer for ByteDance, who created and shared "Res-Adapter" (https://github.com/bytedance/res-adapter) is seeking information on how loading of their models for that tool would be achieved in the WebUI. They've worked out a ComfyUI node for their tool, but there isn't an A1111/Forge implementation and I think they're trying to see how to do that. Thread w/ their request for help where i mentioned you: lllyasviel/stable-diffusion-webui-forge#497 (comment) Their tool is very interesting (allows for using v1.5 at higher resolutions a la Kohya's Deep Shrink) and also allows for using SDXL under resolutions usually known as its lower limit (ie: 512x512). Repo: https://github.com/bytedance/res-adapter The user who is seeking help (replied/asked me on that thread) is: @iaxiangc https://github.com/jiaxiangc |
Beta Was this translation helpful? Give feedback.
-
@wkpark are you aware of this method of merging models nicknamed "Super Mario" using a method described in a papercalled, "Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch"?
There's an implementation of it here that I've been playing with tonight: https://github.com/martyn/safetensors-merge-supermario & the main repo is here: https://github.com/yule-BUAA/MergeLM
Is this a method already incorporated in "Model Mixer", or if not do you have any thoughts about it? There are many features in your extension that are way over my head so forgive me if this "Super Mario" merge method is something that is already possible w/ your app.
Thx for an ELI5
Beta Was this translation helpful? Give feedback.
All reactions