多gpu 的时候运行ppo_training.py报错， #420

cqray1990 · 2024-09-03T09:30:02Z

if self.is_encoder_decoder:
            input_ids = input_kwargs["decoder_input_ids"]
            attention_mask = input_kwargs["decoder_attention_mask"]
        else:
            input_ids = input_kwargs["input_ids"]
            attention_mask = input_kwargs["attention_mask"]

        logprobs = logprobs_from_logits(logits[:, :-1, :], input_ids[:, 1:])

/lib/python3.10/site-packages/trl/core.py", line 139, in logprobs_from_logits
logpy = torch.gather(logp, 2, labels.unsqueeze(2)).squeeze(-1)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:7 and cuda:0! (when checking argument for argument index in method wrapper_CUDA_gather)

model device_map 设置为auto,奖励模型默认设置cuda:0上
model = AutoModelForCausalLMWithValueHead.from_pretrained(
args.model_name_or_path,
config=config,
torch_dtype=torch_dtype,
load_in_4bit=args.load_in_4bit,
load_in_8bit=args.load_in_8bit,
device_map=args.device_map,
trust_remote_code=args.trust_remote_code,
peft_config=peft_config if args.use_peft else None,
)

奖励模型默认设置cuda:0上
reward_model = AutoModelForSequenceClassification.from_pretrained(
args.reward_model_name_or_path,
config=reward_config,
load_in_8bit=args.load_in_8bit,
trust_remote_code=args.trust_remote_code,
)
reward_model.to(device)

model device_map 设置为auto,奖励模型默认设置cuda:0上,这样导致计算的时候不在统一gpu，大佬没遇到这种情况？多GPU的话一定会出现呢

The text was updated successfully, but these errors were encountered:

shibing624 · 2024-09-04T07:36:33Z

奖励模型设置cuda:0，ppo model set cuda:1

cqray1990 added the bug Something isn't working label Sep 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

多gpu 的时候运行ppo_training.py报错， #420

多gpu 的时候运行ppo_training.py报错， #420

cqray1990 commented Sep 3, 2024

shibing624 commented Sep 4, 2024

多gpu 的时候运行ppo_training.py报错， #420

多gpu 的时候运行ppo_training.py报错， #420

Comments

cqray1990 commented Sep 3, 2024

shibing624 commented Sep 4, 2024