You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I want to report an issue observed when running inference for a classification task.
Description
When running inference (either do_eval=True or do_predict=True), the results are different whether it's in distributed mode (multiple gpus ) or not (one gpu).
When doing evaluation, data is prepared sequentially in batches using SequentialSampler, BatchSampler, and DistributedBatchSampler
After this step, the order of data instances is no longer similar to the one in the original input file (dev.tsv or test.tsv) in case of distributed modes with multiple gpus, resulting in a different accuracy.
Steps to reproduce
Use cola.sh to evaluate --init_model deberta-v3-large
Set other paramaters: --do_eval, --eval_batch_size 4, using 3 gpus CUDA_VISIBLE_DEVICES=7,5,6 or 1 gpu CUDA_VISIBLE_DEVICES=7
Check oder of instances after logit calculation is done and collected from all gpus. For example, print the first 10 instances in predicts and labels after the above merge_distributed
1 gpu case - as expected according to the input file:
Hi, I want to report an issue observed when running inference for a classification task.
Description
When running inference (either
do_eval=True
ordo_predict=True
), the results are different whether it's in distributed mode (multiple gpus ) or not (one gpu).When doing evaluation, data is prepared sequentially in batches using
SequentialSampler
,BatchSampler
, andDistributedBatchSampler
DeBERTa/DeBERTa/apps/run.py
Line 172 in 4d7fe0b
and then sent to gpus. But once logits are computed, there is a step to gather results across devices -
merge_distributed
.DeBERTa/DeBERTa/apps/run.py
Line 228 in 4d7fe0b
After this step, the order of data instances is no longer similar to the one in the original input file (dev.tsv or test.tsv) in case of distributed modes with multiple gpus, resulting in a different accuracy.
Steps to reproduce
cola.sh
to evaluate--init_model deberta-v3-large
--do_eval
,--eval_batch_size 4
, using 3 gpusCUDA_VISIBLE_DEVICES=7,5,6
or 1 gpuCUDA_VISIBLE_DEVICES=7
predicts
andlabels
after the abovemerge_distributed
1 gpu case - as expected according to the input file:
[-0.05667 -0.1713 ]
[-0.0438 -0.1727 ]
[-0.0396 -0.1794 ]
[-0.03604 -0.1823 ]
[-0.0433 -0.1809 ]
[-0.01921 -0.1947 ]
[-0.04788 -0.1741 ]
[-0.05774 -0.1755 ]
[-0.05173 -0.1692 ]]
3 gpu case:
[-0.05667 -0.1713 ]
[-0.0438 -0.1727 ]
[-0.0396 -0.1794 ]
[-0.04428 -0.167 ]
[-0.04428 -0.167 ]
[-0.03604 -0.1823 ]
[-0.0433 -0.1809 ]
[-0.01921 -0.1947 ]
[-0.04788 -0.1741 ]]
Additional information
My system setup is:
The text was updated successfully, but these errors were encountered: