Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于蒸馏中LLM模型的语义特征 #5

Open
wddwzwhhxx opened this issue Sep 7, 2023 · 1 comment
Open

关于蒸馏中LLM模型的语义特征 #5

wddwzwhhxx opened this issue Sep 7, 2023 · 1 comment

Comments

@wddwzwhhxx
Copy link

首先感谢你们的文章!看了很有启发性。
我有一点困惑,对于文中提到的取llama模型的语义特征,并且你们提到了下面这两行:
for layer in self.layers:
h = layer(h, start_pos, freqs_cis, mask)
我使用了你们的例子输入”a colorful animal with big eyes on a blue background,“但当我打印第40层layer的最终结果shape时,显示h的shape是【1, 12, 5120】,显然每个word都拥有一个[5120]长度的token,但你们的sur_data_small里面却是一个【5120】的token,这是怎么回事呢?我应该取LLM哪个位置的语义特征呢?

期待你们的回复

@zhongshsh
Copy link
Contributor

感谢你的关注!我们在 论文Knowledge from LLM 这部分内容对于语义特征的处理进行了描述:
image

我们在 Knowledge from LLM 中说明了我们在 token 维度取了均值,以使得 token 对齐。也即您只需要通过 h.mean(1) 就可以获取 SUR-adapter 蒸馏过程中使用的语义表征。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants