Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Did you try sequence-wise concatenation in self-attention? #2

Open
lucasgblu opened this issue Dec 27, 2024 · 0 comments
Open

Did you try sequence-wise concatenation in self-attention? #2

lucasgblu opened this issue Dec 27, 2024 · 0 comments

Comments

@lucasgblu
Copy link

First of all, OmniEdit is a great paper and I really enjoy reading it.

After going through the paper, I have a question regarding the architecture of EditNet. In your comparison of the three variants—EditNet, ControlNet, and InstructPix2Pix—did you explore the possibility of sequence-wise concatenation in self-attention?

More specifically, by concatenating the noisy latent token, text token, and condition image token in a sequence-wise (or token-wise) manner, it may still be feasible to inject conditional information into the base model. This approach would allow us to compute self-attention only once, rather than twice. From my perspective, EditNet appears to perform incremental training by adding residual information into the image and text streams, while sequence-wise concatenation facilitates the sharing of information between the original text and image for condition learning. Specifically, the attention softmax of the text and image originally sums to 1, but after concatenating the condition, the softmax of the text and image sums to less than 1, with the remaining portion supplemented by the condition.

I am curious to know if you have tried the method I proposed. If so, could you share any insights or results from your comparisons?

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant