Skip to content

Commit

Permalink
Update index.html
Browse files Browse the repository at this point in the history
  • Loading branch information
JeiminJeon authored Jul 15, 2024
1 parent cd11f55 commit 66e40a1
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion LBT/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ <h3><a href="https://eccv.ecva.net/Conferences/2024">ECCV 2024</a></h3>
<div class="col-sm-12"><h3>Results</h3></div>
<div class="col-sm-12 image"><img src="images/results.png" style="width: 100%;"></div>
<div class="col-sm-12 caption">Quantitative comparison of gradient quantization methods on image classification. We report results on the validation split of ImageNet in terms of a top-1 accuracy. W/A/G: Bit-precision of weights/activations/gradients; FP: Results obtained by full-precision models; $\dagger$: Results reproduced by ourselves. Numbers in bold and parentheses are the best result and accuracy improvements or degradations, w.r.t full-precision models, respectively.</div>
<div class="col-sm-12 content">From these tables, we observe four things: 1) Our method outperforms other FXP training methods by a significant margin in terms of a top-1 accuracy, regardless of datasets, network architectures, and bit-widths. The accuracy of DSGC is slightly better than ours for the 8/8/8-bit setting only on the ResNet-50 architecture. Nevertheless, ours shows a lower accuracy drop w.r.t the full-precision model. Note that the full-precision model in DSGC also shows a higher accuracy, possibly due to different training settings for, e.g., the number of epochs and learning rate scheduling. 2) We can see that the accuracy drop of DSGC becomes severe as bit-widths decrease. A plausible reason is that reducing the bit-width increases the quantization error for entire gradients, and the quantization interval of DSGC becomes narrower in order for keeping a small error for entire gradients. It incurs a significant quantization error for large gradients, and the performance in turn degrades drastically. Compared to DSGC, our method provides better results consistently, confirming once more that lowering the quantization error for large gradients is important in the FXP training. 3) Our method shows better results compared to the state of the art, including DSGC and IQB, in particularly low-bit settings (i.e., 6/6/6, 5/5/5, and 4/4/4-bit settings). For example, our method performs better than IQB employing a piecewise FXP format for gradient quantization, when training ResNet-18 and -34 in 4/4/4 and 5/5/5-bit settings, and obtains the superior results over the baseline when training in 4/4/4 and 5/5/5-bit settings. This suggests that maintaining a small error for large gradients is effective to improve the quantization performance in the low-bit settings. 4) We can clearly observe that ours gives better results than the baselines with various architectures consistently, especially in the 4/4/4 and 5/5/5-bit settings. This indicates that maintaining a small quantization error for large gradients, regardless of the layers or training iterations, is significant in the FXP training. </div>
<div class="col-sm-12 content">From the table, we observe four things: 1) Our method outperforms other FXP training methods by a significant margin in terms of a top-1 accuracy, regardless of network architectures, and bit-widths. The accuracy of DSGC is slightly better than ours for the 8/8/8-bit setting only on the ResNet-50 architecture. Nevertheless, ours shows a lower accuracy drop w.r.t the full-precision model. Note that the full-precision model in DSGC also shows a higher accuracy, possibly due to different training settings for, e.g., the number of epochs and learning rate scheduling. 2) We can see that the accuracy drop of DSGC becomes severe as bit-widths decrease. A plausible reason is that reducing the bit-width increases the quantization error for entire gradients, and the quantization interval of DSGC becomes narrower in order for keeping a small error for entire gradients. It incurs a significant quantization error for large gradients, and the performance in turn degrades drastically. Compared to DSGC, our method provides better results consistently, confirming once more that lowering the quantization error for large gradients is important in the FXP training. 3) Our method shows better results compared to the state of the art, including DSGC and IQB, in particularly low-bit settings (i.e., 6/6/6, 5/5/5, and 4/4/4-bit settings). For example, our method performs better than IQB employing a piecewise FXP format for gradient quantization, when training ResNet-18 and -34 in 4/4/4 and 5/5/5-bit settings, and obtains the superior results over the baseline when training in 4/4/4 and 5/5/5-bit settings. This suggests that maintaining a small error for large gradients is effective to improve the quantization performance in the low-bit settings. 4) We can clearly observe that ours gives better results than the baselines with various architectures consistently, especially in the 4/4/4 and 5/5/5-bit settings. This indicates that maintaining a small quantization error for large gradients, regardless of the layers or training iterations, is significant in the FXP training. </div>
</div>

<div class="row paper">
Expand Down

0 comments on commit 66e40a1

Please sign in to comment.