You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add linear layers to any Causal LM to get rewards.
We formalize this requirement in this issue and invite @M0gician to contribute with us.
Features Request
1. Add linear layers to any Causal LM to get rewards.
Add linear layer at the end and assign a specific token (like final eos in the prompt) and manuplate the logits of it as rewards.
Add linear layer after a spcific value head name at any layer, manuplate it's logits as rewards.
2. Add --task parameter.
Get rewards/embedding from any Causal LM, adding a parameter like --task embedding.
3. Better Accuracy.
Many users may have noticed that the reward results of SGLang's current API show a discrepancy (around 3/1000) compared to those obtained from training engines like DeepSpeed or Llama-factory. This discrepancy is not due to an issue with our framework implementation; in fact, this problem exists in all current inference engines:
The kernel fusion in inference engines differs significantly from that in training engines. When the batch size varies, inference requests are dispatched to different kernels, and numerical errors accumulate layer by layer. By the time it reaches the logits layer, these errors become noticeable. This issue has been around since the BERT era—precision differences between training and inference engines are unavoidable.
As a result, in RLHF, inference engines are primarily used to accelerate sampling, while reward and embeddings still rely on training scripts. It may take several months for our team to address this issue properly.
We will add a logging regarding this issue in our Engine and our documents for this. Even if the reward may be inaccurate, we provide a general reward interface, in hope that community users could design more robust RL algorithm that works well in this scenario.
Related resources
The reward forward function in OpenRLHF.
HuggingFace LlamaForSequence and AutoLinearXXX.
The text was updated successfully, but these errors were encountered:
Motivation
As mentioned in our devlopmap, #1487:
Support generalized reward API (adding linear layers to any Causal LM to get the reward) as required by the OpenRLHF team.
https://github.com/OpenRLHF/OpenRLHF
Add linear layers to any Causal LM to get rewards.
We formalize this requirement in this issue and invite @M0gician to contribute with us.
Features Request
1. Add linear layers to any Causal LM to get rewards.
eos
in the prompt) and manuplate the logits of it as rewards.2. Add
--task
parameter.--task embedding
.3. Better Accuracy.
Many users may have noticed that the reward results of SGLang's current API show a discrepancy (around 3/1000) compared to those obtained from training engines like DeepSpeed or Llama-factory. This discrepancy is not due to an issue with our framework implementation; in fact, this problem exists in all current inference engines:
As a result, in RLHF, inference engines are primarily used to accelerate sampling, while reward and embeddings still rely on training scripts. It may take several months for our team to address this issue properly.
We will add a logging regarding this issue in our Engine and our documents for this. Even if the reward may be inaccurate, we provide a general reward interface, in hope that community users could design more robust RL algorithm that works well in this scenario.
Related resources
LlamaForSequence
andAutoLinearXXX
.The text was updated successfully, but these errors were encountered: