-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathCITATION.cff
62 lines (61 loc) · 2.31 KB
/
CITATION.cff
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: Zoom Is What You Need
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Mohammad Reza
family-names: Taesiri
email: [email protected]
affiliation: University of Alberta
- given-names: Giang
email: [email protected]
family-names: Nguyen
affiliation: Auburn University
- given-names: Sarra
family-names: Habchi
email: [email protected]
affiliation: Ubisoft
- given-names: Cor-Paul
family-names: ' Bezemer'
email: [email protected]
affiliation: University of Alberta
- given-names: Anh
family-names: Nguyen
email: [email protected]
affiliation: Auburn University
repository-code: 'https://github.com/taesiri/ZoomIsAllYouNeed'
url: 'https://taesiri.github.io/ZoomIsAllYouNeed/'
abstract: >-
Image classifiers are information-discarding machines, by
design. Yet, how these models discard information remains
mysterious. We hypothesize that one way for image
classifiers to reach high accuracy is to first learn to
zoom to the most discriminative region in the image and
then extract features from there to predict image labels.
We study six popular networks ranging from AlexNet to
CLIP, and we show that proper framing of the input image
can lead to the correct classification of 98.91% of
ImageNet images. Furthermore, we explore the potential and
limits of zoom transforms in image classification and
uncover positional biases in various datasets, especially
a strong center bias in two popular datasets: ImageNet-A
and ObjectNet. Finally, leveraging our insights into the
potential of zoom, we propose a state-of-the-art test-time
augmentation (TTA) technique that improves classification
accuracy by forcing models to explicitly perform zoom-in
operations before making predictions. Our method is more
interpretable, accurate, and faster than MEMO, a
state-of-the-art TTA method. Additionally, we propose
ImageNet-Hard, a new benchmark where zooming in alone
often does not help state-of-the-art models better label
images.
keywords:
- Zoom
- Representation Learning
- ImageNet-Hard
- Robustness
license: MIT