You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The V2 paper only directly compared MLA with MHA. Table 8 in V2 paper only compares dense models with MHA, MQA, and GQA, without MLA. Table 9 compares MoE models with MHA and MLA, without MQA or GQA.
I feel the MLA is kind of similar to MQA in the essence. I am curious if there are any apple-to-apple ablation studies on MLA vs MQA?
The text was updated successfully, but these errors were encountered:
The V2 paper only directly compared MLA with MHA. Table 8 in V2 paper only compares dense models with MHA, MQA, and GQA, without MLA. Table 9 compares MoE models with MHA and MLA, without MQA or GQA.
I feel the MLA is kind of similar to MQA in the essence. I am curious if there are any apple-to-apple ablation studies on MLA vs MQA?
The text was updated successfully, but these errors were encountered: