issues Search Results · repo:deepspeedai/DeepSpeedExamples language:Python
Filter by
548 results
(64 ms)548 results
indeepspeedai/DeepSpeedExamples (press backspace or delete to remove)DeepSpeed-FastGen support ascend npu, deepseek-r1-distilled-qwen2.5-32b?
RyanOvO
- 3
- Opened on Mar 4
- #960
critic_loss: def critic_loss_fn(self, values, old_values, returns, mask):
value loss
values_clipped = torch.clamp( values, old_values - self.cliprange_value, old_values + self.cliprange_value, ) vf_loss1 ...
Morizhaoyang
- 1
- Opened on Feb 7
- #956
I’m excited about the recent introduction of Domino and its impressive TP optimization. When I was using
deepspeed-domino to better overlap comm comp in TP, I found domino use forward_backward_no_pipelining() ...
XZQshiyu
- Opened on Jan 9
- #950
Hello! I encountered some errors when running
https://github.com/microsoft/DeepSpeedExamples/blob/master/training/DeepSpeed-Domino/pretrain_gpt3_2.7b.sh and here is
the error information:
[rank1]:[W102 ...
ZhiyiHu1999
- 1
- Opened on Jan 2
- #948
When I try to run Stage 3 finetuning PPO for qwen 2 0.5B model, I got the following bug:
Assertion srcIndex srcSelectDimSize failed, which seems like issue about input dataset sequence length?
I have ...
boqiny
- 1
- Opened on Dec 18, 2024
- #946
Hi, thank you for the amazing demo and doc! I have a question regarding this section in zero-inference. It is mentioned
that Thus, our current implementation computes attention scores on CPU. May I ask ...
yuzhenmao
- Opened on Dec 15, 2024
- #944
Hi, I am using the latest huggingface transformers (version==4.48.0.dev0). When I tried to run the demo from here, I
have this error: AttributeError: LlamaForCausalLM object has no attribute set_kv_cache_offload ...
yuzhenmao
- 2
- Opened on Dec 15, 2024
- #943
Issue: In the original code: e2e_rlhf.py line 68
parser.add_argument(
--reward-model ,
type=lambda x: x.replace( facebook/opt- , ),
default= 350m ,
choices=( 350m ),
help= Which facebook/opt-* ...
ChenDaiwei-99
- Opened on Dec 10, 2024
- #941
Following the guide of DeepSpeed-Domino, I ran into the following issue when bash pretrain_gpt3_2.7b.sh
[rank0]: return cdb.all_reduce(tensor, op, group, async_op)
[rank0]: ^^^^^^^^^^^^^^ ...
lucifer1004
- 2
- Opened on Nov 29, 2024
- #940

Learn how you can use GitHub Issues to plan and track your work.
Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub IssuesProTip!
Restrict your search to the title by using the in:title qualifier.
Learn how you can use GitHub Issues to plan and track your work.
Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub IssuesProTip!
Restrict your search to the title by using the in:title qualifier.