You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for your valuable arena. I am currently researching the way of LLMs evaluation and got stack with a question about Bradley-Terry model.
As it stands, from multiple sources, BT is obtained through maximizing BT likelihood (as well as in your paper). However inside the code, logistic regression is fitted on some kind of "one-hot" matrix, where +1 is model_a and -1 is model_b, and target is 1 in case model_a wins and 0 if model_b wins. Lets neglect controlling length of answer for simplicity, but I can not understand why this is equivalent to BT model.
Could you please explain this or give me some sources where i could find the derivation?
def compute_elo_mle_with_tie(
df, SCALE=400, BASE=10, INIT_RATING=1000, sample_weight=None
):
from sklearn.linear_model import LogisticRegression
ptbl_a_win = pd.pivot_table(
df[df["winner"] == "model_a"],
index="model_a",
columns="model_b",
aggfunc="size",
fill_value=0,
)
ptbl_tie = pd.pivot_table(
df[df["winner"].isin(["tie", "tie (bothbad)"])],
index="model_a",
columns="model_b",
aggfunc="size",
fill_value=0,
)
ptbl_tie = ptbl_tie + ptbl_tie.T
ptbl_b_win = pd.pivot_table(
df[df["winner"] == "model_b"],
index="model_a",
columns="model_b",
aggfunc="size",
fill_value=0,
)
ptbl_win = ptbl_a_win * 2 + ptbl_b_win.T * 2 + ptbl_tie
models = pd.Series(np.arange(len(ptbl_win.index)), index=ptbl_win.index)
p = len(models)
X = np.zeros([p * (p - 1) * 2, p])
Y = np.zeros(p * (p - 1) * 2)
cur_row = 0
sample_weights = []
for m_a in ptbl_win.index:
for m_b in ptbl_win.columns:
if m_a == m_b:
continue
# if nan skip
if math.isnan(ptbl_win.loc[m_a, m_b]) or math.isnan(ptbl_win.loc[m_b, m_a]):
continue
X[cur_row, models[m_a]] = +math.log(BASE)
X[cur_row, models[m_b]] = -math.log(BASE)
Y[cur_row] = 1.0
sample_weights.append(ptbl_win.loc[m_a, m_b])
X[cur_row + 1, models[m_a]] = math.log(BASE)
X[cur_row + 1, models[m_b]] = -math.log(BASE)
Y[cur_row + 1] = 0.0
sample_weights.append(ptbl_win.loc[m_b, m_a])
cur_row += 2
X = X[:cur_row]
Y = Y[:cur_row]
lr = LogisticRegression(fit_intercept=False, penalty=None)
lr.fit(X, Y, sample_weight=sample_weights)
elo_scores = SCALE * lr.coef_[0] + INIT_RATING
if "mixtral-8x7b-instruct-v0.1" in models.index:
elo_scores += 1114 - elo_scores[models["mixtral-8x7b-instruct-v0.1"]]
return pd.Series(elo_scores, index=models.index).sort_values(ascending=False)
The text was updated successfully, but these errors were encountered:
@VityaVitalich I wrote a blog post which includes an explanation of this. The idea is that if you do an exponential reparameterization of the Bradley-Terry strength parameters, the probabilities can be expressed as the sigmoid of the difference in ratings. Then if you construct the X matrix such that each row has only two non-zero entries, with a 1 and a -1 and the competitor indices, then when you do the dot product of that row with the parameter vector (the ratings) it acts to just produce the difference between the two selected ratings.
Dear maintainers,
Thank you for your valuable arena. I am currently researching the way of LLMs evaluation and got stack with a question about Bradley-Terry model.
As it stands, from multiple sources, BT is obtained through maximizing BT likelihood (as well as in your paper). However inside the code, logistic regression is fitted on some kind of "one-hot" matrix, where +1 is model_a and -1 is model_b, and target is 1 in case model_a wins and 0 if model_b wins. Lets neglect controlling length of answer for simplicity, but I can not understand why this is equivalent to BT model.
Could you please explain this or give me some sources where i could find the derivation?
The text was updated successfully, but these errors were encountered: