-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathCVPR2022.txt
2074 lines (2074 loc) · 145 KB
/
CVPR2022.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Dual Cross-Attention Learning for Fine-Grained Visual Categorization and Object Re-Identification
SimAN: Exploring Self-Supervised Representation Learning of Scene Text via Similarity-Aware Normalization
GASP, a Generalized Framework for Agglomerative Clustering of Signed Graphs and Its Application to Instance Segmentation
Estimating Example Difficulty Using Variance of Gradients
One Loss for Quantization: Deep Hashing With Discrete Wasserstein Distributional Matching
Pixel Screening Based Intermediate Correction for Blind Deblurring
Weakly Supervised Semantic Segmentation by Pixel-to-Prototype Contrast
Controllable Animation of Fluid Elements in Still Images
Holocurtains: Programming Light Curtains via Binary Holography
Recurrent Dynamic Embedding for Video Object Segmentation
Deep Hierarchical Semantic Segmentation
f-SfT: Shape-From-Template With a Physics-Based Deformation Model
Continual Object Detection via Prototypical Task Correlation Guided Gating Mechanism
DATA: Domain-Aware and Task-Aware Self-Supervised Learning
TWIST: Two-Way Inter-Label Self-Training for Semi-Supervised 3D Instance Segmentation
Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection From Point Clouds
Learning Adaptive Warping for Real-World Rolling Shutter Correction
Siamese Contrastive Embedding Network for Compositional Zero-Shot Learning
Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions
RIM-Net: Recursive Implicit Fields for Unsupervised Learning of Hierarchical Shape Structures
Do Learned Representations Respect Causal Relationships?
ZebraPose: Coarse To Fine Surface Encoding for 6DoF Object Pose Estimation
ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
Learning To Affiliate: Mutual Centralized Learning for Few-Shot Classification
CAPRI-Net: Learning Compact CAD Shapes With Adaptive Primitive Assembly
ATPFL: Automatic Trajectory Prediction Model Design Under Federated Learning Framework
Revisiting Learnable Affines for Batch Norm in Few-Shot Transfer Learning
Bridging the Gap Between Classification and Localization for Weakly Supervised Object Localization
Multi-Class Token Transformer for Weakly Supervised Semantic Segmentation
3D Moments From Near-Duplicate Photos
Exact Feature Distribution Matching for Arbitrary Style Transfer and Domain Generalization
Blind2Unblind: Self-Supervised Image Denoising With Visible Blind Spots
Balanced and Hierarchical Relation Learning for One-Shot Object Detection
End-to-End Generative Pretraining for Multimodal Video Captioning
Delving Deep Into the Generalization of Vision Transformers Under Distribution Shifts
NICE-SLAM: Neural Implicit Scalable Encoding for SLAM
HyperDet3D: Learning a Scene-Conditioned 3D Object Detector
Stochastic Trajectory Prediction via Motion Indeterminacy Diffusion
CLRNet: Cross Layer Refinement Network for Lane Detection
Cross-Modal Map Learning for Vision and Language Navigation
Motion-Aware Contrastive Video Representation Learning via Foreground-Background Merging
Incremental Transformer Structure Enhanced Image Inpainting With Masking Positional Encoding
Pointly-Supervised Instance Segmentation
Cross-Modal Clinical Graph Transformer for Ophthalmic Report Generation
Human-Object Interaction Detection via Disentangled Transformer
DINE: Domain Adaptation From Single and Multiple Black-Box Predictors
LGT-Net: Indoor Panoramic Room Layout Estimation With Geometry-Aware Transformer Network
CRIS: CLIP-Driven Referring Image Segmentation
Multi-View Mesh Reconstruction With Neural Deferred Shading
CVF-SID: Cyclic Multi-Variate Function for Self-Supervised Image Denoising by Disentangling Noise From Image
Infrared Invisible Clothing: Hiding From Infrared Detectors at Multiple Angles in Real World
Distribution-Aware Single-Stage Models for Multi-Person 3D Pose Estimation
FaceFormer: Speech-Driven 3D Facial Animation With Transformers
Exploring Patch-Wise Semantic Relation for Contrastive Learning in Image-to-Image Translation Tasks
High-Resolution Face Swapping via Latent Semantics Disentanglement
Searching the Deployable Convolution Neural Networks for GPUs
Sparse Local Patch Transformer for Robust Face Alignment and Landmarks Inherent Relation Learning
DeepFake Disrupter: The Detector of DeepFake Is My Friend
Rotationally Equivariant 3D Object Detection
Accelerating DETR Convergence via Semantic-Aligned Matching
Long-Short Temporal Contrastive Learning of Video Transformers
Vision Transformer With Deformable Attention
Towards General Purpose Vision Systems: An End-to-End Task-Agnostic Vision-Language Architecture
Deep Vanishing Point Detection: Geometric Priors Make Dataset Variations Vanish
RM-Depth: Unsupervised Learning of Recurrent Monocular Depth in Dynamic Scenes
LiT: Zero-Shot Transfer With Locked-Image Text Tuning
Cloning Outfits From Real-World Images to 3D Characters for Generalizable Person Re-Identification
GeoNeRF: Generalizing NeRF With Geometry Priors
ABPN: Adaptive Blend Pyramid Network for Real-Time Local Retouching of Ultra High-Resolution Photo
PhoCaL: A Multi-Modal Dataset for Category-Level Object Pose Estimation With Photometrically Challenging Objects
Neural Compression-Based Feature Learning for Video Restoration
Expanding Low-Density Latent Regions for Open-Set Object Detection
Drop the GAN: In Defense of Patches Nearest Neighbors As Single Image Generative Models
Uformer: A General U-Shaped Transformer for Image Restoration
Exploring Dual-Task Correlation for Pose Guided Person Image Generation
Portrait Eyeglasses and Shadow Removal by Leveraging 3D Synthetic Data
Neural Rays for Occlusion-Aware Image-Based Rendering
Modeling 3D Layout for Group Re-Identification
Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity
SIOD: Single Instance Annotated per Category per Image for Object Detection
Toward Fast, Flexible, and Robust Low-Light Image Enhancement
Online Learning of Reusable Abstract Models for Object Goal Navigation
Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos
SimMatch: Semi-Supervised Learning With Similarity Matching
OrphicX: A Causality-Inspired Latent Variable Model for Interpreting Graph Neural Networks
HandOccNet: Occlusion-Robust 3D Hand Mesh Estimation Network
EfficientNeRF Efficient Neural Radiance Fields
Quantifying Societal Bias Amplification in Image Captioning
Modular Action Concept Grounding in Semantic Video Prediction
StyleSwin: Transformer-Based GAN for High-Resolution Image Generation
Reinforced Structured State-Evolution for Vision-Language Navigation
Sub-Word Level Lip Reading With Visual Attention
Weakly Supervised High-Fidelity Clothing Model Generation
Highly-Efficient Incomplete Large-Scale Multi-View Clustering With Consensus Bipartite Graph
Towards Principled Disentanglement for Domain Generalization
Discrete Cosine Transform Network for Guided Depth Map Super-Resolution
Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing
CoSSL: Co-Learning of Representation and Classifier for Imbalanced Semi-Supervised Learning
Discovering Objects That Can Move
Knowledge Mining With Scene Text for Fine-Grained Recognition
Self-Supervised Learning of Object Parts for Semantic Segmentation
Iterative Corresponding Geometry: Fusing Region and Depth for Highly Efficient 3D Tracking of Textureless Objects
Single-Photon Structured Light
Deblurring via Stochastic Refinement
3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds
TransGeo: Transformer Is All You Need for Cross-View Image Geo-Localization
R(Det)2: Randomized Decision Routing for Object Detection
Abandoning the Bayer-Filter To See in the Dark
SASIC: Stereo Image Compression With Latent Shifts and Stereo Attention
Exploiting Temporal Relations on Radar Perception for Autonomous Driving
Multi-Instance Point Cloud Registration by Efficient Correspondence Clustering
Contrastive Boundary Learning for Point Cloud Segmentation
Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution
CVNet: Contour Vibration Network for Building Extraction
Hyperbolic Image Segmentation
Forward Compatible Training for Large-Scale Embedding Retrieval Systems
Everything at Once - Multi-Modal Fusion Transformer for Video Retrieval
Swin Transformer V2: Scaling Up Capacity and Resolution
Neural Template: Topology-Aware Reconstruction and Disentangled Generation of 3D Meshes
DEFEAT: Deep Hidden Feature Backdoor Attacks by Imperceptible Perturbation and Latent Representation Constraints
Projective Manifold Gradient Layer for Deep Rotation Regression
CLIMS: Cross Language Image Matching for Weakly Supervised Semantic Segmentation
Learning To Refactor Action and Co-Occurrence Features for Temporal Action Localization
It's Time for Artistic Correspondence in Music and Video
Mixed Differential Privacy in Computer Vision
AdaFace: Quality Adaptive Margin for Face Recognition
Learning Soft Estimator of Keypoint Scale and Orientation With Probabilistic Covariant Loss
DN-DETR: Accelerate DETR Training by Introducing Query DeNoising
HCSC: Hierarchical Contrastive Selective Coding
TransRank: Self-Supervised Video Representation Learning via Ranking-Based Transformation Recognition
KeyTr: Keypoint Transporter for 3D Reconstruction of Deformable Objects in Videos
Invariant Grounding for Video Question Answering
Prompt Distribution Learning
RAGO: Recurrent Graph Optimizer for Multiple Rotation Averaging
Arch-Graph: Acyclic Architecture Relation Predictor for Task-Transferable Neural Architecture Search
On Aliased Resizing and Surprising Subtleties in GAN Evaluation
Lepard: Learning Partial Point Cloud Matching in Rigid and Deformable Scenes
Virtual Elastic Objects
DiSparse: Disentangled Sparsification for Multitask Model Compression
Pushing the Limits of Simple Pipelines for Few-Shot Learning: External Data and Fine-Tuning Make a Difference
Opening Up Open World Tracking
Towards Efficient and Scalable Sharpness-Aware Minimization
VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention
Rethinking Deep Face Restoration
OSSO: Obtaining Skeletal Shape From Outside
Temporal Alignment Networks for Long-Term Video
Few-Shot Head Swapping in the Wild
A Study on the Distribution of Social Biases in Self-Supervised Learning Visual Models
LAR-SR: A Local Autoregressive Model for Image Super-Resolution
Bayesian Invariant Risk Minimization
Democracy Does Matter: Comprehensive Feature Mining for Co-Salient Object Detection
Alleviating Semantics Distortion in Unsupervised Low-Level Image-to-Image Translation via Structure Consistency Constraint
Doodle It Yourself: Class Incremental Learning by Drawing a Few Sketches
Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes
ICON: Implicit Clothed Humans Obtained From Normals
Comparing Correspondences: Video Prediction With Correspondence-Wise Losses
Uni-Perceiver: Pre-Training Unified Architecture for Generic Perception for Zero-Shot and Few-Shot Tasks
The Auto Arborist Dataset: A Large-Scale Benchmark for Multiview Urban Forest Monitoring Under Domain Shift
On the Instability of Relative Pose Estimation and RANSAC's Role
Shape From Polarization for Complex Scenes in the Wild
Real-Time, Accurate, and Consistent Video Semantic Segmentation via Unsupervised Adaptation and Cross-Unit Deployment on Mobile Device
SNUG: Self-Supervised Neural Dynamic Garments
Towards Fewer Annotations: Active Learning via Region Impurity and Prediction Uncertainty for Domain Adaptive Semantic Segmentation
Glass Segmentation Using Intensity and Spectral Polarization Cues
CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding
Few Shot Generative Model Adaption via Relaxed Spatial Structural Alignment
Target-Relevant Knowledge Preservation for Multi-Source Domain Adaptive Object Detection
Pyramid Grafting Network for One-Stage High Resolution Saliency Detection
A Style-Aware Discriminator for Controllable Image Translation
Non-Iterative Recovery From Nonlinear Observations Using Generative Models
Incremental Cross-View Mutual Distillation for Self-Supervised Medical CT Synthesis
Enhancing Adversarial Training With Second-Order Statistics of Weights
Partially Does It: Towards Scene-Level FG-SBIR With Partial Input
Dual Temperature Helps Contrastive Learning Without Many Negative Samples: Towards Understanding and Simplifying MoCo
Moving Window Regression: A Novel Approach to Ordinal Regression
UniCoRN: A Unified Conditional Image Repainting Network
Forecasting Characteristic 3D Poses of Human Actions
ACPL: Anti-Curriculum Pseudo-Labelling for Semi-Supervised Medical Image Classification
Learning to Deblur Using Light Field Generated and Real Defocus Images
Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection
Safe Self-Refinement for Transformer-Based Domain Adaptation
Density-Preserving Deep Point Cloud Compression
StyleMesh: Style Transfer for Indoor 3D Scene Reconstructions
Which Model To Transfer? Finding the Needle in the Growing Haystack
Fast and Unsupervised Action Boundary Detection for Action Segmentation
Class-Incremental Learning With Strong Pre-Trained Models
Robust Optimization As Data Augmentation for Large-Scale Graphs
Robust Structured Declarative Classifiers for 3D Point Clouds: Defending Adversarial Attacks With Implicit Gradients
PhotoScene: Photorealistic Material and Lighting Transfer for Indoor Scenes
Improving the Transferability of Targeted Adversarial Examples Through Object-Based Diverse Input
IRON: Inverse Rendering by Optimizing Neural SDFs and Materials From Photometric Images
ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer
Versatile Multi-Modal Pre-Training for Human-Centric Perception
360MonoDepth: High-Resolution 360deg Monocular Depth Estimation
Splicing ViT Features for Semantic Appearance Transfer
Contrastive Regression for Domain Adaptation on Gaze Estimation
MUSE-VAE: Multi-Scale VAE for Environment-Aware Long Term Trajectory Prediction
Multi-View Consistent Generative Adversarial Networks for 3D-Aware Image Synthesis
Putting People in Their Place: Monocular Regression of 3D People in Depth
POCO: Point Convolution for Surface Reconstruction
Memory-Augmented Non-Local Attention for Video Super-Resolution
Neural Texture Extraction and Distribution for Controllable Person Image Synthesis
Classification-Then-Grounding: Reformulating Video Scene Graphs As Temporal Bipartite Graphs
Transformer-Empowered Multi-Scale Contextual Matching and Aggregation for Multi-Contrast MRI Super-Resolution
GazeOnce: Real-Time Multi-Person Gaze Estimation
GateHUB: Gated History Unit With Background Suppression for Online Action Detection
Few-Shot Font Generation by Learning Fine-Grained Local Styles
Bridging Video-Text Retrieval With Multiple Choice Questions
Depth-Aware Generative Adversarial Network for Talking Head Video Generation
Dual-Path Image Inpainting With Auxiliary GAN Inversion
DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis
Generative Flows With Invertible Attentions
Clipped Hyperbolic Classifiers Are Super-Hyperbolic Classifiers
Estimating Fine-Grained Noise Model via Contrastive Learning
DiffPoseNet: Direct Differentiable Camera Pose Estimation
The Flag Median and FlagIRLS
Implicit Feature Decoupling With Depthwise Quantization
Graph-Context Attention Networks for Size-Varied Deep Graph Matching
FENeRF: Face Editing in Neural Radiance Fields
CoNeRF: Controllable Neural Radiance Fields
Noise2NoiseFlow: Realistic Camera Noise Modeling Without Clean Images
ZeroWaste Dataset: Towards Deformable Object Segmentation in Cluttered Scenes
Remember Intentions: Retrospective-Memory-Based Trajectory Prediction
Measuring Compositional Consistency for Video Question Answering
Category Contrast for Unsupervised Domain Adaptation in Visual Tasks
SwapMix: Diagnosing and Regularizing the Over-Reliance on Visual Context in Visual Question Answering
UNIST: Unpaired Neural Implicit Shape Translation Network
Local-Adaptive Face Recognition via Graph-Based Meta-Clustering and Regularized Adaptation
The DEVIL Is in the Details: A Diagnostic Evaluation Benchmark for Video Inpainting
Mutual Information-Driven Pan-Sharpening
Shifting More Attention to Visual Backbone: Query-Modulated Refinement Networks for End-to-End Visual Grounding
A Framework for Learning Ante-Hoc Explainable Models via Concepts
Generating Useful Accident-Prone Driving Scenarios via a Learned Traffic Prior
FLOAT: Factorized Learning of Object Attributes for Improved Multi-Object Multi-Part Scene Parsing
Efficient Geometry-Aware 3D Generative Adversarial Networks
DO-GAN: A Double Oracle Framework for Generative Adversarial Networks
Dancing Under the Stars: Video Denoising in Starlight
FocusCut: Diving Into a Focus View in Interactive Segmentation
Medial Spectral Coordinates for 3D Shape Analysis
Contextualized Spatio-Temporal Contrastive Learning With Self-Supervision
Rethinking Architecture Design for Tackling Data Heterogeneity in Federated Learning
APES: Articulated Part Extraction From Sprite Sheets
Dressing in the Wild by Watching Dance Videos
SPAct: Self-Supervised Privacy Preservation for Action Recognition
Uni6D: A Unified CNN Framework Without Projection Breakdown for 6D Pose Estimation
De-Rendering 3D Objects in the Wild
SPAMs: Structured Implicit Parametric Models
Global Sensing and Measurements Reuse for Image Compressed Sensing
SeeThroughNet: Resurrection of Auxiliary Loss by Preserving Class Probability Information
Representing 3D Shapes With Probabilistic Directed Distance Fields
Learning ABCs: Approximate Bijective Correspondence for Isolating Factors of Variation With Weak Supervision
ABO: Dataset and Benchmarks for Real-World 3D Object Understanding
DETReg: Unsupervised Pretraining With Region Priors for Object Detection
Learning To Restore 3D Face From In-the-Wild Degraded Images
Practical Evaluation of Adversarial Robustness via Adaptive Auto Attack
Convolutions for Spatial Interaction Modeling
MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection
Salvage of Supervision in Weakly Supervised Object Detection
Cross-View Transformers for Real-Time Map-View Semantic Segmentation
Distinguishing Unseen From Seen for Generalized Zero-Shot Learning
Online Continual Learning on a Contaminated Data Stream With Blurry Task Boundaries
Controllable Dynamic Multi-Task Architectures
Learning To Imagine: Diversify Memory for Incremental Learning Using Unlabeled Data
SmartAdapt: Multi-Branch Object Detection Framework for Videos on Mobiles
VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks
Deep Hybrid Models for Out-of-Distribution Detection
Accelerating Video Object Segmentation With Compressed Video
Exploring Domain-Invariant Parameters for Source Free Domain Adaptation
FastDOG: Fast Discrete Optimization on GPU
Fire Together Wire Together: A Dynamic Pruning Approach With Self-Supervised Mask Prediction
Multi-Source Uncertainty Mining for Deep Unsupervised Saliency Detection
Self-Supervised Equivariant Learning for Oriented Keypoint Detection
Wavelet Knowledge Distillation: Towards Efficient Image-to-Image Translation
Focal and Global Knowledge Distillation for Detectors
Learning To Prompt for Continual Learning
Human Mesh Recovery From Multiple Shots
Improving Adversarial Transferability via Neuron Attribution-Based Attacks
Better Trigger Inversion Optimization in Backdoor Scanning
GANSeg: Learning To Segment by Unsupervised Hierarchical Image Generation
Dense Learning Based Semi-Supervised Object Detection
Fixing Malfunctional Objects With Learned Physical Simulation and Functional Prediction
Convolution of Convolution: Let Kernels Spatially Collaborate
Make It Move: Controllable Image-to-Video Generation With Text Descriptions
C2AM Loss: Chasing a Better Decision Boundary for Long-Tail Object Detection
Neural Points: Point Cloud Representation With Neural Fields for Arbitrary Upsampling
Distribution Consistent Neural Architecture Search
Video-Text Representation Learning via Differentiable Weak Temporal Alignment
Bi-Directional Object-Context Prioritization Learning for Saliency Ranking
FreeSOLO: Learning To Segment Objects Without Annotations
What Do Navigation Agents Learn About Their Environment?
Progressive Minimal Path Method With Embedded CNN
FIFO: Learning Fog-Invariant Features for Foggy Scene Segmentation
3D Human Tongue Reconstruction From Single "In-the-Wild" Images
Enhancing Adversarial Robustness for Deep Metric Learning
Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation
Lite-MDETR: A Lightweight Multi-Modal Detector
CoordGAN: Self-Supervised Dense Correspondences Emerge From GANs
A Simple Multi-Modality Transfer Learning Baseline for Sign Language Translation
Unsupervised Visual Representation Learning by Online Constrained K-Means
Neural Point Light Fields
Vehicle Trajectory Prediction Works, but Not Everywhere
PSMNet: Position-Aware Stereo Merging Network for Room Layout Estimation
MonoDTR: Monocular 3D Object Detection With Depth-Aware Transformer
Learning Graph Regularisation for Guided Super-Resolution
Instance-Wise Occlusion and Depth Orders in Natural Scenes
Look for the Change: Learning Object States and State-Modifying Actions From Untrimmed Web Videos
Attribute Surrogates Learning and Spectral Tokens Pooling in Transformers for Few-Shot Learning
Generalized Category Discovery
Maximum Consensus by Weighted Influences of Monotone Boolean Functions
TransforMatcher: Match-to-Match Attention for Semantic Correspondence
Robust Outlier Detection by De-Biasing VAE Likelihoods
Contour-Hugging Heatmaps for Landmark Detection
Voxel Field Fusion for 3D Object Detection
Divide and Conquer: Compositional Experts for Generalized Novel Class Discovery
Programmatic Concept Learning for Human Motion Description and Synthesis
Interpretable Part-Whole Hierarchies and Conceptual-Semantic Relationships in Neural Networks
Fast Algorithm for Low-Rank Tensor Completion in Delay-Embedded Space
Panoptic, Instance and Semantic Relations: A Relational Context Encoder To Enhance Panoptic Segmentation
Point2Seq: Detecting 3D Objects As Sequences
Less Is More: Generating Grounded Navigation Instructions From Landmarks
Task-Adaptive Negative Envision for Few-Shot Open-Set Recognition
DisARM: Displacement Aware Relation Module for 3D Detection
ETHSeg: An Amodel Instance Segmentation Network and a Real-World Dataset for X-Ray Waste Inspection
MixFormer: Mixing Features Across Windows and Dimensions
Killing Two Birds With One Stone: Efficient and Robust Training of Face Recognition CNNs by Partial FC
NeRF-Editing: Geometry Editing of Neural Radiance Fields
Optimal Correction Cost for Object Detection Evaluation
Contextual Similarity Distillation for Asymmetric Image Retrieval
FineDiving: A Fine-Grained Dataset for Procedure-Aware Action Quality Assessment
Artistic Style Discovery With Independent Components
HEAT: Holistic Edge Attention Transformer for Structured Reconstruction
HyperStyle: StyleGAN Inversion With HyperNetworks for Real Image Editing
DASO: Distribution-Aware Semantics-Oriented Pseudo-Label for Imbalanced Semi-Supervised Learning
Mobile-Former: Bridging MobileNet and Transformer
Exploiting Pseudo Labels in a Self-Supervised Learning Framework for Improved Monocular Depth Estimation
DESTR: Object Detection With Split Transformer
LTP: Lane-Based Trajectory Prediction for Autonomous Driving
CycleMix: A Holistic Strategy for Medical Image Segmentation From Scribble Supervision
VideoINR: Learning Video Implicit Neural Representation for Continuous Space-Time Super-Resolution
Towards End-to-End Unified Scene Text Detection and Layout Analysis
Image Based Reconstruction of Liquids From 2D Surface Detections
Contextual Outpainting With Object-Level Contrastive Learning
AP-BSN: Self-Supervised Denoising for Real-World Images via Asymmetric PD and Blind-Spot Network
AutoSDF: Shape Priors for 3D Completion, Reconstruction and Generation
ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior
Depth-Guided Sparse Structure-From-Motion for Movies and TV Shows
End-to-End Referring Video Object Segmentation With Multimodal Transformers
Unpaired Cartoon Image Synthesis via Gated Cycle Mapping
IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo
Not All Points Are Equal: Learning Highly Efficient Point-Based Detectors for 3D LiDAR Point Clouds
FedCorr: Multi-Stage Federated Learning for Label Noise Correction
Detecting Camouflaged Object in Frequency Domain
RigNeRF: Fully Controllable Neural 3D Portraits
CLIP-Forge: Towards Zero-Shot Text-To-Shape Generation
Style-Based Global Appearance Flow for Virtual Try-On
Source-Free Object Detection by Learning To Overlook Domain Style
Active Learning for Open-Set Annotation
SceneSqueezer: Learning To Compress Scene for Camera Relocalization
SelfRecon: Self Reconstruction Your Digital Avatar From Monocular Video
Instance-Dependent Label-Noise Learning With Manifold-Regularized Transition Matrix Estimation
Rethinking the Augmentation Module in Contrastive Learning: Learning Hierarchical Augmentation Invariance With Expanded Views
Self-Supervised Models Are Continual Learners
Dreaming To Prune Image Deraining Networks
Equivariant Point Cloud Analysis via Learning Orientations for Message Passing
When Does Contrastive Visual Representation Learning Work?
One Step at a Time: Long-Horizon Vision-and-Language Navigation With Milestones
Node Representation Learning in Graph via Node-to-Neighbourhood Mutual Information Maximization
Point Cloud Pre-Training With Natural 3D Structures
Scene Consistency Representation Learning for Video Scene Segmentation
Two Coupled Rejection Metrics Can Tell Adversarial Examples Apart
Exploiting Explainable Metrics for Augmented SGD
Semi-Supervised Video Semantic Segmentation With Inter-Frame Feature Reconstruction
GenDR: A Generalized Differentiable Renderer
Improving Neural Implicit Surfaces Geometry With Patch Warping
XYLayoutLM: Towards Layout-Aware Multimodal Networks for Visually-Rich Document Understanding
Amodal Segmentation Through Out-of-Task and Out-of-Distribution Generalization With a Bayesian Model
How Well Do Sparse ImageNet Models Transfer?
REX: Reasoning-Aware and Grounded Explanation
Dynamic Dual-Output Diffusion Models
StyleT2I: Toward Compositional and High-Fidelity Text-to-Image Synthesis
JoinABLe: Learning Bottom-Up Assembly of Parametric CAD Joints
CaDeX: Learning Canonical Deformation Coordinate Space for Dynamic Surface Representation via Neural Homeomorphism
Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes
V-Doc: Visual Questions Answers With Documents
AEGNN: Asynchronous Event-Based Graph Neural Networks
Layer-Wised Model Aggregation for Personalized Federated Learning
Polarity Sampling: Quality and Diversity Control of Pre-Trained Generative Networks via Singular Values
Style-Structure Disentangled Features and Normalizing Flows for Diverse Icon Colorization
Object-Aware Video-Language Pre-Training for Retrieval
OSKDet: Orientation-Sensitive Keypoint Localization for Rotated Object Detection
MAT: Mask-Aware Transformer for Large Hole Image Inpainting
Exploring Geometric Consistency for Monocular 3D Object Detection
Neural Window Fully-Connected CRFs for Monocular Depth Estimation
CodedVTR: Codebook-Based Sparse Voxel Transformer With Geometric Guidance
Uncertainty-Aware Deep Multi-View Photometric Stereo
Coherent Point Drift Revisited for Non-Rigid Shape Matching and Registration
Unleashing Potential of Unsupervised Pre-Training With Intra-Identity Regularization for Person Re-Identification
Align and Prompt: Video-and-Language Pre-Training With Entity Prompts
A Unified Query-Based Paradigm for Point Cloud Understanding
It's About Time: Analog Clock Reading in the Wild
MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger Tokens
Cross Modal Retrieval With Querybank Normalisation
Contrastive Dual Gating: Learning Sparse Features With Contrastive Learning
Universal Photometric Stereo Network Using Global Lighting Contexts
Hire-MLP: Vision MLP via Hierarchical Rearrangement
Ray3D: Ray-Based 3D Human Pose Estimation for Monocular Absolute 3D Localization
Occluded Human Mesh Recovery
Multi-Object Tracking Meets Moving UAV
ASM-Loc: Action-Aware Segment Modeling for Weakly-Supervised Temporal Action Localization
Uncertainty-Guided Probabilistic Transformer for Complex Action Recognition
Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs
End-to-End Multi-Person Pose Estimation With Transformers
REGTR: End-to-End Point Cloud Correspondences With Transformers
Neural 3D Scene Reconstruction With the Manhattan-World Assumption
V2C: Visual Voice Cloning
Revisiting AP Loss for Dense Object Detection: Adaptive Ranking Pair Selection
3DeformRS: Certifying Spatial Deformations on Point Clouds
ElePose: Unsupervised 3D Human Pose Estimation by Predicting Camera Elevation and Learning Normalizing Flows on 2D Poses
MAD: A Scalable Dataset for Language Grounding in Videos From Movie Audio Descriptions
EvUnroll: Neuromorphic Events Based Rolling Shutter Image Correction
Gait Recognition in the Wild With Dense 3D Representations and a Benchmark
ArtiBoost: Boosting Articulated 3D Hand-Object Pose Estimation via Online Exploration and Synthesis
Temporal Context Matters: Enhancing Single Image Prediction With Disease Progression Representations
QueryDet: Cascaded Sparse Query for Accelerating High-Resolution Small Object Detection
IDEA-Net: Dynamic 3D Point Cloud Interpolation via Deep Embedding Alignment
UniCon: Combating Label Noise Through Uniform Selection and Contrastive Learning
Learning From All Vehicles
BEHAVE: Dataset and Method for Tracking Human Object Interactions
Disentangled3D: Learning a 3D Generative Model With Disentangled Geometry and Appearance From Monocular Images
Revisiting Random Channel Pruning for Neural Network Compression
One-Bit Active Query With Contrastive Pairs
Estimating Egocentric 3D Human Pose in the Wild With External Weak Supervision
Performance-Aware Mutual Knowledge Distillation for Improving Neural Architecture Search
Does Text Attract Attention on E-Commerce Images: A Novel Saliency Prediction Dataset and Method
Topologically-Aware Deformation Fields for Single-View 3D Reconstruction
HyperInverter: Improving StyleGAN Inversion via Hypernetwork
Sparse Non-Local CRF
Dataset Distillation by Matching Training Trajectories
Towards Driving-Oriented Metric for Lane Detection Models
EPro-PnP: Generalized End-to-End Probabilistic Perspective-N-Points for Monocular Object Pose Estimation
Rethinking Reconstruction Autoencoder-Based Out-of-Distribution Detection
XYDeblur: Divide and Conquer for Single Image Deblurring
Generating Diverse and Natural 3D Human Motions From Text
E-CIR: Event-Enhanced Continuous Intensity Recovery
Towards Robust Rain Removal Against Adversarial Attacks: A Comprehensive Benchmark Analysis and Beyond
STCrowd: A Multimodal Dataset for Pedestrian Perception in Crowded Scenes
Deep Decomposition for Stochastic Normal-Abnormal Transport
Global Context With Discrete Diffusion in Vector Quantised Modelling for Image Generation
Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation
AziNorm: Exploiting the Radial Symmetry of Point Cloud for Azimuth-Normalized 3D Perception
Towards Multimodal Depth Estimation From Light Fields
Learning To Recognize Procedural Activities With Distant Supervision
Multimodal Material Segmentation
Multi-Frame Self-Supervised Depth With Transformers
Weakly Supervised Rotation-Invariant Aerial Object Detection Network
Modeling Motion With Multi-Modal Features for Text-Based Video Segmentation
Surface Reconstruction From Point Clouds by Learning Predictive Context Priors
Deformable Video Transformer
Self-Supervised Keypoint Discovery in Behavioral Videos
IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering in Indoor Scenes
DynamicEarthNet: Daily Multi-Spectral Satellite Dataset for Semantic Change Segmentation
Connecting the Complementary-View Videos: Joint Camera Identification and Subject Association
End-to-End Trajectory Distribution Prediction Based on Occupancy Grid Maps
Fast, Accurate and Memory-Efficient Partial Permutation Synchronization
Quantization-Aware Deep Optics for Diffractive Snapshot Hyperspectral Imaging
Weakly Supervised Temporal Action Localization via Representative Snippet Knowledge Propagation
Parametric Scattering Networks
SketchEdit: Mask-Free Local Image Manipulation With Partial Sketches
ScaleNet: A Shallow Architecture for Scale Estimation
E2EC: An End-to-End Contour-Based Method for High-Quality High-Speed Instance Segmentation
Bounded Adversarial Attack on Deep Content Features
BatchFormer: Learning To Explore Sample Relationships for Robust Representation Learning
Self-Supervised Image-Specific Prototype Exploration for Weakly Supervised Semantic Segmentation
CAD: Co-Adapting Discriminative Features for Improved Few-Shot Classification
Fingerprinting Deep Neural Networks Globally via Universal Adversarial Perturbations
Learning Multi-View Aggregation in the Wild for Large-Scale 3D Semantic Segmentation
ManiTrans: Entity-Level Text-Guided Image Manipulation via Token-Wise Semantic Alignment and Generation
Improving Video Model Transfer With Dynamic Representation Learning
PIE-Net: Photometric Invariant Edge Guided Network for Intrinsic Image Decomposition
Clothes-Changing Person Re-Identification With RGB Modality Only
Chitransformer: Towards Reliable Stereo From Cues
Robust Image Forgery Detection Over Online Social Network Shared Images
QS-Attn: Query-Selected Attention for Contrastive Learning in I2I Translation
Physically Disentangled Intra- and Inter-Domain Adaptation for Varicolored Haze Removal
Modality-Agnostic Learning for Radar-Lidar Fusion in Vehicle Detection
A Re-Balancing Strategy for Class-Imbalanced Classification Based on Instance Difficulty
Representation Compensation Networks for Continual Semantic Segmentation
Adaptive Gating for Single-Photon 3D Imaging
Tracking People by Predicting 3D Appearance, Location and Pose
Text2Mesh: Text-Driven Neural Stylization for Meshes
Learning To Solve Hard Minimal Problems
H4D: Human 4D Modeling by Learning Neural Compositional Representation
FWD: Real-Time Novel View Synthesis With Forward Warping and Depth
Non-Generative Generalized Zero-Shot Learning via Task-Correlated Disentanglement and Controllable Samples Synthesis
C-CAM: Causal CAM for Weakly Supervised Semantic Segmentation on Medical Image
Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection
Forward Compatible Few-Shot Class-Incremental Learning
BaLeNAS: Differentiable Architecture Search via the Bayesian Learning Rule
Cannot See the Forest for the Trees: Aggregating Multiple Viewpoints To Better Classify Objects in Videos
Learning Canonical F-Correlation Projection for Compact Multiview Representation
DIFNet: Boosting Visual Information Flow for Image Captioning
Weakly Supervised Object Localization As Domain Adaption
Tencent-MVSE: A Large-Scale Benchmark Dataset for Multi-Modal Video Similarity Evaluation
Dynamic Prototype Convolution Network for Few-Shot Semantic Segmentation
Deep Orientation-Aware Functional Maps: Tackling Symmetry Issues in Shape Matching
Tree Energy Loss: Towards Sparsely Annotated Semantic Segmentation
Mr.BiQ: Post-Training Non-Uniform Quantization Based on Minimizing the Reconstruction Error
MatteFormer: Transformer-Based Image Matting via Prior-Tokens
Video Shadow Detection via Spatio-Temporal Interpolation Consistency Training
Ranking Distance Calibration for Cross-Domain Few-Shot Learning
Robust and Accurate Superquadric Recovery: A Probabilistic Approach
Zero-Shot Text-Guided Object Generation With Dream Fields
Learning Pixel Trajectories With Multiscale Contrastive Random Walks
Self-Supervised Correlation Mining Network for Person Image Generation
Grounding Answers for Visual Questions Asked by Visually Impaired People
Task Adaptive Parameter Sharing for Multi-Task Learning
Sparse Instance Activation for Real-Time Instance Segmentation
Automatic Color Image Stitching Using Quaternion Rank-1 Alignment
VisualGPT: Data-Efficient Adaptation of Pretrained Language Models for Image Captioning
ESCNet: Gaze Target Detection With the Understanding of 3D Scenes
Can You Spot the Chameleon? Adversarially Camouflaging Images From Co-Salient Object Detection
Finding Badly Drawn Bunnies
Point2Cyl: Reverse Engineering 3D Objects From Point Clouds to Extrusion Cylinders
All-Photon Polarimetric Time-of-Flight Imaging
MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation
Surface-Aligned Neural Radiance Fields for Controllable 3D Human Synthesis
Learning From Temporal Gradient for Semi-Supervised Action Recognition
Towards Implicit Text-Guided 3D Shape Generation
Audio-Driven Neural Gesture Reenactment With Video Motion Graphs
SoftCollage: A Differentiable Probabilistic Tree Generator for Image Collage
Transforming Model Prediction for Tracking
A Unified Framework for Implicit Sinkhorn Differentiation
DGECN: A Depth-Guided Edge Convolutional Network for End-to-End 6D Pose Estimation
Unsupervised Vision-Language Parsing: Seamlessly Bridging Visual Scene Graphs With Language Structures via Dependency Relationships
Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling
Locality-Aware Inter- and Intra-Video Reconstruction for Self-Supervised Correspondence Learning
A Versatile Multi-View Framework for LiDAR-Based 3D Object Detection With Guidance From Panoptic Segmentation
Query and Attention Augmentation for Knowledge-Based Explainable Reasoning
Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality
RFNet: Unsupervised Network for Mutually Reinforcing Multi-Modal Image Registration and Fusion
Progressive Attention on Multi-Level Dense Difference Maps for Generic Event Boundary Detection
Interactron: Embodied Adaptive Object Detection
3D Scene Painting via Semantic Image Synthesis
MeMOT: Multi-Object Tracking With Memory
Revisiting Weakly Supervised Pre-Training of Visual Perception Models
Semi-Supervised Semantic Segmentation With Error Localization Network
Meta Convolutional Neural Networks for Single Domain Generalization
Generalizing Gaze Estimation With Rotation Consistency
Anomaly Detection via Reverse Distillation From One-Class Embedding
Fine-Grained Object Classification via Self-Supervised Pose Alignment
Spatio-Temporal Gating-Adjacency GCN for Human Motion Prediction
CellTypeGraph: A New Geometric Computer Vision Benchmark
Clustering Plotted Data by Image Segmentation
Animal Kingdom: A Large and Diverse Dataset for Animal Behavior Understanding
Learning To Learn Across Diverse Data Biases in Deep Face Recognition
Back to Reality: Weakly-Supervised 3D Object Detection With Shape-Guided Label Enhancement
Long-Tail Recognition via Compositional Knowledge Transfer
EI-CLIP: Entity-Aware Interventional Contrastive Learning for E-Commerce Cross-Modal Retrieval
Multi-Dimensional, Nuanced and Subjective - Measuring the Perception of Facial Expressions
PyMiceTracking: An Open-Source Toolbox for Real-Time Behavioral Neuroscience Experiments
Self-Taught Metric Learning Without Labels
MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition
Fine-Grained Temporal Contrastive Learning for Weakly-Supervised Temporal Action Localization
Embracing Single Stride 3D Object Detector With Sparse Transformer
Multidimensional Belief Quantification for Label-Efficient Meta-Learning
UTC: A Unified Transformer With Inter-Task Contrastive Learning for Visual Dialog
Relieving Long-Tailed Instance Segmentation via Pairwise Class Balance
Online Convolutional Re-Parameterization
Mimicking the Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning
RIDDLE: Lidar Data Compression With Range Image Deep Delta Encoding
RelTransformer: A Transformer-Based Long-Tail Visual Relationship Recognition
HODEC: Towards Efficient High-Order DEcomposed Convolutional Neural Networks
RigidFlow: Self-Supervised Scene Flow Learning on Point Clouds by Local Rigidity Prior
Smooth Maximum Unit: Smooth Activation Function for Deep Networks Using Smoothing Maximum Technique
Learning Invisible Markers for Hidden Codes in Offline-to-Online Photography
Personalized Image Aesthetics Assessment With Rich Attributes
Task2Sim: Towards Effective Pre-Training and Transfer From Synthetic Data
Part-Based Pseudo Label Refinement for Unsupervised Person Re-Identification
Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation
HDNet: High-Resolution Dual-Domain Learning for Spectral Compressive Imaging
OW-DETR: Open-World Detection Transformer
Learning Deep Implicit Functions for 3D Shapes With Dynamic Code Clouds
Reversible Vision Transformers
Amodal Panoptic Segmentation
Gravitationally Lensed Black Hole Emission Tomography
3D-Aware Image Synthesis via Learning Structural and Textural Representations
Text-to-Image Synthesis Based on Object-Guided Joint-Decoding Transformer
Correlation Verification for Image Retrieval
Unsupervised Vision-and-Language Pre-Training via Retrieval-Based Multi-Granular Alignment
Protecting Facial Privacy: Generating Adversarial Identity Masks via Style-Robust Makeup Transfer
PONI: Potential Functions for ObjectGoal Navigation With Interaction-Free Learning
Noise Is Also Useful: Negative Correlation-Steered Latent Contrastive Learning
Temporal Feature Alignment and Mutual Information Maximization for Video-Based Human Pose Estimation
Spatially-Adaptive Multilayer Selection for GAN Inversion and Editing
Self-Supervised Transformers for Unsupervised Object Discovery Using Normalized Cut
Exploring Structure-Aware Transformer Over Interaction Proposals for Human-Object Interaction Detection
Towards Robust Adaptive Object Detection Under Noisy Annotations
Decoupled Multi-Task Learning With Cyclical Self-Regulation for Face Parsing
Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer
Learning To Memorize Feature Hallucination for One-Shot Image Generation
AUV-Net: Learning Aligned UV Maps for Texture Transfer and Synthesis
Open-Vocabulary One-Stage Detection With Hierarchical Visual-Language Knowledge Distillation
Glass: Geometric Latent Augmentation for Shape Spaces
COAP: Compositional Articulated Occupancy of People
Counterfactual Cycle-Consistent Learning for Instruction Following and Generation in Vision-Language Navigation
Evading the Simplicity Bias: Training a Diverse Set of Models Discovers Solutions With Superior OOD Generalization
Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities
Deterministic Point Cloud Registration via Novel Transformation Decomposition
Motion-Adjustable Neural Implicit Video Representation
Neural Prior for Trajectory Estimation
DPICT: Deep Progressive Image Compression Using Trit-Planes
Rethinking Depth Estimation for Multi-View Stereo: A Unified Representation
Long-Tailed Recognition via Weight Balancing
Text to Image Generation With Semantic-Spatial Aware GAN
The Norm Must Go On: Dynamic Unsupervised Domain Adaptation by Normalization
ShapeFormer: Transformer-Based Shape Completion via Sparse Representation
PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures
Eigencontours: Novel Contour Descriptors Based on Low-Rank Approximation
Generalizable Cross-Modality Medical Image Segmentation via Style Augmentation and Dual Normalization
Learning Optical Flow With Kernel Patch Attention
Learning To Prompt for Open-Vocabulary Object Detection With Vision-Language Model
TimeReplayer: Unlocking the Potential of Event Cameras for Video Interpolation
General Incremental Learning With Domain-Aware Categorical Representations
Interactive Segmentation and Visualization for Tiny Objects in Multi-Megapixel Images
ActiveZero: Mixed Domain Learning for Active Stereovision With Zero Annotation
DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers
Global-Aware Registration of Less-Overlap RGB-D Scans
RayMVSNet: Learning Ray-Based 1D Implicit Fields for Accurate Multi-View Stereo
ContrastMask: Contrastive Learning To Segment Every Thing
Efficient Deep Embedded Subspace Clustering
Neural MoCon: Neural Motion Control for Physically Plausible Human Motion Capture
Revisiting Temporal Alignment for Video Restoration
Scaling Vision Transformers to Gigapixel Images via Hierarchical Self-Supervised Learning
Neural Reflectance for Shape Recovery With Shadow Handling
Rep-Net: Efficient On-Device Learning via Feature Reprogramming
Surface Representation for Point Clouds
Implicit Motion Handling for Video Camouflaged Object Detection
OVE6D: Object Viewpoint Encoding for Depth-Based 6D Object Pose Estimation
DeepLIIF: An Online Platform for Quantification of Clinical Pathology Slides
Joint Video Summarization and Moment Localization by Cross-Task Sample Transfer
WALT: Watch and Learn 2D Amodal Representation From Time-Lapse Imagery
Learning With Twin Noisy Labels for Visible-Infrared Person Re-Identification
Optical Flow Estimation for Spiking Camera
MetaFormer Is Actually What You Need for Vision
GradViT: Gradient Inversion of Vision Transformers
Spatial-Temporal Space Hand-in-Hand: Spatial-Temporal Video Super-Resolution via Cycle-Projected Mutual Learning
InstaFormer: Instance-Aware Image-to-Image Translation With Transformer
Revisiting Near/Remote Sensing With Geospatial Attention
Joint Global and Local Hierarchical Priors for Learned Image Compression
Knowledge Distillation via the Target-Aware Transformer
Recurring the Transformer for Video Action Recognition
Subspace Adversarial Training
3D-VField: Adversarial Augmentation of Point Clouds for Domain Generalization in 3D Object Detection
Image Segmentation Using Text and Image Prompts
AutoMine: An Unmanned Mine Dataset
Neural Data-Dependent Transform for Learned Image Compression
Background Activation Suppression for Weakly Supervised Object Localization
How Many Observations Are Enough? Knowledge Distillation for Trajectory Forecasting
Evaluation-Oriented Knowledge Distillation for Deep Face Recognition
Improving Subgraph Recognition With Variational Graph Information Bottleneck
Slot-VPS: Object-Centric Representation Learning for Video Panoptic Segmentation
Motion-From-Blur: 3D Shape and Motion Estimation of Motion-Blurred Objects in Videos
Efficient Video Instance Segmentation via Tracklet Query and Proposal
Synthetic Generation of Face Videos With Plethysmograph Physiology
TransRAC: Encoding Multi-Scale Temporal Correlation With Transformers for Repetitive Action Counting
Hallucinated Neural Radiance Fields in the Wild
NeuralHDHair: Automatic High-Fidelity Hair Modeling From a Single Image Using Implicit Neural Representations
The Two Dimensions of Worst-Case Training and Their Integrated Effect for Out-of-Domain Generalization
Global Tracking Transformers
Backdoor Attacks on Self-Supervised Learning
Multimodal Token Fusion for Vision Transformers
Exploring Frequency Adversarial Attacks for Face Forgery Detection
GMFlow: Learning Optical Flow via Global Matching
Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation
FLAVA: A Foundational Language and Vision Alignment Model
Signing at Scale: Learning to Co-Articulate Signs for Large-Scale Photo-Realistic Sign Language Production
Explore Spatio-Temporal Aggregation for Insubstantial Object Detection: Benchmark Dataset and Baseline
OCSampler: Compressing Videos to One Clip With Single-Step Sampling
Learning Bayesian Sparse Networks With Full Experience Replay for Continual Learning
Graph-Based Spatial Transformer With Memory Replay for Multi-Future Pedestrian Trajectory Prediction
Scanline Homographies for Rolling-Shutter Plane Absolute Pose
TableFormer: Table Structure Understanding With Transformers
Exemplar-Based Pattern Synthesis With Implicit Periodic Field Network
Grounded Language-Image Pre-Training
Spectral Unsupervised Domain Adaptation for Visual Recognition
AdaInt: Learning Adaptive Intervals for 3D Lookup Tables on Real-Time Image Enhancement
PatchFormer: An Efficient Point Transformer With Patch Attention
Recurrent Glimpse-Based Decoder for Detection With Transformer
Generating 3D Bio-Printable Patches Using Wound Segmentation and Reconstruction To Treat Diabetic Foot Ulcers
SimMIM: A Simple Framework for Masked Image Modeling
OmniFusion: 360 Monocular Depth Estimation via Geometry-Aware Fusion
Label Matching Semi-Supervised Object Detection
RegionCLIP: Region-Based Language-Image Pretraining
Video Frame Interpolation Transformer
An MIL-Derived Transformer for Weakly Supervised Point Cloud Segmentation
Fast Light-Weight Near-Field Photometric Stereo
BCOT: A Markerless High-Precision 3D Object Tracking Benchmark
Omni-DETR: Omni-Supervised Object Detection With Transformers
Uniform Subdivision of Omnidirectional Camera Space for Efficient Spherical Stereo Matching
High-Resolution Image Synthesis With Latent Diffusion Models
Improving Adversarially Robust Few-Shot Image Classification With Generalizable Representations
Transferable Sparse Adversarial Attack
CREAM: Weakly Supervised Object Localization via Class RE-Activation Mapping
Semi-Weakly-Supervised Learning of Complex Actions From Instructional Task Videos
APRIL: Finding the Achilles' Heel on Privacy for Vision Transformers
Text Spotting Transformers
Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields
VALHALLA: Visual Hallucination for Machine Translation
StyleSDF: High-Resolution 3D-Consistent Image and Geometry Generation
Incorporating Semi-Supervised and Positive-Unlabeled Learning for Boosting Full Reference Image Quality Assessment
GLAMR: Global Occlusion-Aware Human Mesh Recovery With Dynamic Cameras
HINT: Hierarchical Neuron Concept Explainer
Capturing and Inferring Dense Full-Body Human-Scene Contact
Advancing High-Resolution Video-Language Representation With Large-Scale Video Transcriptions
Target-Aware Dual Adversarial Learning and a Multi-Scenario Multi-Modality Benchmark To Fuse Infrared and Visible for Object Detection
En-Compactness: Self-Distillation Embedding & Contrastive Generation for Generalized Zero-Shot Learning
Neural Face Identification in a 2D Wireframe Projection of a Manifold Object
LC-FDNet: Learned Lossless Image Compression With Frequency Decomposition Network
Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation
Deep Rectangling for Image Stitching: A Learning Baseline
PCL: Proxy-Based Contrastive Learning for Domain Generalization
SurfEmb: Dense and Continuous Correspondence Distributions for Object Pose Estimation With Learnt Surface Embeddings
Diverse Plausible 360-Degree Image Outpainting for Efficient 3DCG Background Creation
Learning 3D Object Shape and Layout Without 3D Supervision
An Empirical Study of End-to-End Temporal Action Detection
SimVP: Simpler Yet Better Video Prediction
Object Localization Under Single Coarse Point Supervision
Unsupervised Learning of Accurate Siamese Tracking
Bayesian Nonparametric Submodular Video Partition for Robust Anomaly Detection
Brain-Supervised Image Editing
3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces
Unified Transformer Tracker for Object Tracking
Non-Parametric Depth Distribution Modelling Based Depth Inference for Multi-View Stereo
Equalized Focal Loss for Dense Long-Tailed Object Detection
Generating High Fidelity Data From Low-Density Regions Using Diffusion Models
DeepDPM: Deep Clustering With an Unknown Number of Clusters
Spiking Transformers for Event-Based Single Object Tracking
FocalClick: Towards Practical Interactive Image Segmentation
ISDNet: Integrating Shallow and Deep Networks for Efficient Ultra-High Resolution Segmentation
Unsupervised Domain Adaptation for Nighttime Aerial Tracking
Balanced Multimodal Learning via On-the-Fly Gradient Modulation
RestoreFormer: High-Quality Blind Face Restoration From Undegraded Key-Value Pairs
Understanding Uncertainty Maps in Vision With Statistical Testing
CAFE: Learning To Condense Dataset by Aligning Features
Causality Inspired Representation Learning for Domain Generalization
Mask-Guided Spectral-Wise Transformer for Efficient Hyperspectral Image Reconstruction
A Variational Bayesian Method for Similarity Learning in Non-Rigid Image Registration
Not Just Selection, but Exploration: Online Class-Incremental Continual Learning via Dual View Consistency
PPDL: Predicate Probability Distribution Based Loss for Unbiased Scene Graph Generation
Block-NeRF: Scalable Large Scene Neural View Synthesis
Coupling Vision and Proprioception for Navigation of Legged Robots
Fine-Grained Predicates Learning for Scene Graph Generation
Generalized Few-Shot Semantic Segmentation
Exploiting Rigidity Constraints for LiDAR Scene Flow Estimation
Neural Head Avatars From Monocular RGB Videos
B-Cos Networks: Alignment Is All We Need for Interpretability
EMOCA: Emotion Driven Monocular Face Capture and Animation
Burst Image Restoration and Enhancement
What Makes Transfer Learning Work for Medical Images: Feature Reuse & Other Factors
Towards Diverse and Natural Scene-Aware 3D Human Motion Synthesis
Quarantine: Sparsity Can Uncover the Trojan Attack Trigger for Free
Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis
Localized Adversarial Domain Generalization
X-Trans2Cap: Cross-Modal Knowledge Transfer Using Transformer for 3D Dense Captioning
How Much Does Input Data Type Impact Final Face Model Accuracy?
Image-to-Lidar Self-Supervised Distillation for Autonomous Driving Data
HumanNeRF: Free-Viewpoint Rendering of Moving People From Monocular Video
PoseKernelLifter: Metric Lifting of 3D Human Pose Using Sound
Which Images To Label for Few-Shot Medical Landmark Detection?
Why Discard if You Can Recycle?: A Recycling Max Pooling Module for 3D Point Cloud Analysis
Explaining Deep Convolutional Neural Networks via Latent Visual-Semantic Filter Attention
AlignQ: Alignment Quantization With ADMM-Based Correlation Preservation
Self-Distillation From the Last Mini-Batch for Consistency Regularization
Interactive Multi-Class Tiny-Object Detection
Learning From Pixel-Level Noisy Label: A New Perspective for Light Field Saliency Detection
UBoCo: Unsupervised Boundary Contrastive Learning for Generic Event Boundary Detection
Multi-View Depth Estimation by Fusing Single-View Depth Probability With Multi-View Geometry
Learning To Collaborate in Decentralized Learning of Personalized Models
CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields
ART-Point: Improving Rotation Robustness of Point Cloud Classifiers via Adversarial Rotation
Ref-NeRF: Structured View-Dependent Appearance for Neural Radiance Fields
360-Attack: Distortion-Aware Perturbations From Perspective-Views
Targeted Supervised Contrastive Learning for Long-Tailed Recognition
Both Style and Fog Matter: Cumulative Domain Adaptation for Semantic Foggy Scene Understanding
Ev-TTA: Test-Time Adaptation for Event-Based Object Recognition
Balanced Contrastive Learning for Long-Tailed Visual Recognition
Slimmable Domain Adaptation
Bandits for Structure Perturbation-Based Black-Box Attacks To Graph Neural Networks With Theoretical Guarantees
NODEO: A Neural Ordinary Differential Equation Based Optimization Framework for Deformable Image Registration
DIP: Deep Inverse Patchmatch for High-Resolution Optical Flow
Few-Shot Object Detection With Fully Cross-Transformer
Pyramid Architecture for Multi-Scale Processing in Point Cloud Segmentation
Decoupling Makes Weakly Supervised Local Feature Better
Cross-Architecture Self-Supervised Video Representation Learning
High-Resolution Image Harmonization via Collaborative Dual Transformations
Homography Loss for Monocular 3D Object Detection
A Unified Model for Line Projections in Catadioptric Cameras With Rotationally Symmetric Mirrors
Dynamic Sparse R-CNN
MM-TTA: Multi-Modal Test-Time Adaptation for 3D Semantic Segmentation
Stable Long-Term Recurrent Video Super-Resolution
Dual-Generator Face Reenactment
Towards Bidirectional Arbitrary Image Rescaling: Joint Optimization and Cycle Idempotence
Self-Supervised Neural Articulated Shape and Appearance Models
A Hybrid Quantum-Classical Algorithm for Robust Fitting
Topology Preserving Local Road Network Estimation From Single Onboard Camera Image
Eigenlanes: Data-Driven Lane Descriptors for Structurally Diverse Lanes
Human Instance Matting via Mutual Guidance and Multi-Instance Refinement
TCTrack: Temporal Contexts for Aerial Tracking
SpaceEdit: Learning a Unified Editing Space for Open-Domain Image Color Editing
GAN-Supervised Dense Visual Alignment
SwinTextSpotter: Scene Text Spotting via Better Synergy Between Text Detection and Text Recognition
Multi-Level Feature Learning for Contrastive Multi-View Clustering
RendNet: Unified 2D/3D Recognizer With Latent Space Rendering
iPLAN: Interactive and Procedural Layout Planning
Video Frame Interpolation With Transformer
GIFS: Neural Implicit Function for General Shape Representation
Deblur-NeRF: Neural Radiance Fields From Blurry Images
Egocentric Prediction of Action Target in 3D
TemporalUV: Capturing Loose Clothing With Temporally Coherent UV Coordinates
Whose Track Is It Anyway? Improving Robustness to Tracking Errors With Affinity-Based Trajectory Prediction
DoubleField: Bridging the Neural Surface and Radiance Fields for High-Fidelity Human Reconstruction and Rendering
Towards Real-World Navigation With Deep Differentiable Planners
An Iterative Quantum Approach for Transformation Estimation From Point Sets
Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation
UnweaveNet: Unweaving Activity Stories
Balanced MSE for Imbalanced Visual Regression
Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning
PhysFormer: Facial Video-Based Physiological Measurement With Temporal Difference Transformer
Dimension Embeddings for Monocular 3D Object Detection
Look Closer To Supervise Better: One-Shot Font Generation via Component-Based Discriminator
NeRFReN: Neural Radiance Fields With Reflections
Blind Image Super-Resolution With Elaborate Degradation Modeling on Noise and Kernel
Finding Good Configurations of Planar Primitives in Unorganized Point Clouds
PhyIR: Physics-Based Inverse Rendering for Panoramic Indoor Images
SCS-Co: Self-Consistent Style Contrastive Learning for Image Harmonization
Beyond Fixation: Dynamic Window Visual Transformer
Progressive End-to-End Object Detection in Crowded Scenes
FMCNet: Feature-Level Modality Compensation for Visible-Infrared Person Re-Identification
Improving GAN Equilibrium by Raising Spatial Awareness
Neural Convolutional Surfaces
HyperSegNAS: Bridging One-Shot Neural Architecture Search With 3D Medical Image Segmentation Using HyperNet
A Comprehensive Study of Image Classification Model Sensitivity to Foregrounds, Backgrounds, and Visual Attributes
ConDor: Self-Supervised Canonicalization of 3D Pose for Partial Shapes
Source-Free Domain Adaptation via Distribution Estimation
Robust Combination of Distributed Gradients Under Adversarial Perturbations
Exploring Endogenous Shift for Cross-Domain Detection: A Large-Scale Benchmark and Perturbation Suppression Network
VisCUIT: Visual Auditor for Bias in CNN Image Classifier
Automatic Synthesis of Diverse Weak Supervision Sources for Behavior Analysis
Transferability Estimation Using Bhattacharyya Class Separability
DirecFormer: A Directed Attention in Transformer Approach to Robust Action Recognition
Hierarchical Self-Supervised Representation Learning for Movie Understanding
Robust Egocentric Photo-Realistic Facial Expression Transfer for Virtual Reality
Does Robustness on ImageNet Transfer to Downstream Tasks?
Propagation Regularizer for Semi-Supervised Learning With Extremely Scarce Labeled Samples
Bailando: 3D Dance Generation by Actor-Critic GPT With Choreographic Memory
Faithful Extreme Rescaling via Generative Prior Reciprocated Invertible Representations
Distillation Using Oracle Queries for Transformer-Based Human-Object Interaction Detection
Proto2Proto: Can You Recognize the Car, the Way I Do?
Learning Local-Global Contextual Adaptation for Multi-Person Pose Estimation
Learning Video Representations of Human Motion From Synthetic Data
TVConv: Efficient Translation Variant Convolution for Layout-Aware Visual Processing
Dual Adversarial Adaptation for Cross-Device Real-World Image Super-Resolution
FS6D: Few-Shot 6D Pose Estimation of Novel Objects
Habitat-Web: Learning Embodied Object-Search Strategies From Human Demonstrations at Scale
The Probabilistic Normal Epipolar Constraint for Frame-to-Frame Rotation Optimization Under Uncertain Feature Positions
Vision-Language Pre-Training for Boosting Scene Text Detectors
Reflection and Rotation Symmetry Detection via Equivariant Learning
BoostMIS: Boosting Medical Image Semi-Supervised Learning With Adaptive Pseudo Labeling and Informative Active Annotation
Simple but Effective: CLIP Embeddings for Embodied AI
NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition
HOI4D: A 4D Egocentric Dataset for Category-Level Human-Object Interaction
Collaborative Transformers for Grounded Situation Recognition
DyRep: Bootstrapping Training With Dynamic Re-Parameterization
Not All Labels Are Equal: Rationalizing the Labeling Costs for Training Object Detection
CPPF: Towards Robust Category-Level 9D Pose Estimation in the Wild
Interact Before Align: Leveraging Cross-Modal Knowledge for Domain Adaptive Action Recognition
Interactive Disentanglement: Learning Concepts by Interacting With Their Prototype Representations
CDGNet: Class Distribution Guided Network for Human Parsing
Recall@k Surrogate Loss With Large Batches and Similarity Mixup
Direct Voxel Grid Optimization: Super-Fast Convergence for Radiance Fields Reconstruction
Continual Test-Time Domain Adaptation
URetinex-Net: Retinex-Based Deep Unfolding Network for Low-Light Image Enhancement
Towards Multi-Domain Single Image Dehazing via Test-Time Training
Vox2Cortex: Fast Explicit Reconstruction of Cortical Surfaces From 3D MRI Scans With Geometric Deep Neural Networks
Deep Safe Multi-View Clustering: Reducing the Risk of Clustering Performance Degradation Caused by View Increase
Dynamic MLP for Fine-Grained Image Classification by Leveraging Geographical and Temporal Information
HP-Capsule: Unsupervised Face Part Discovery by Hierarchical Parsing Capsule Network
ScanQA: 3D Question Answering for Spatial Scene Understanding
MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-Based Visual Question Answering
Class-Incremental Learning by Knowledge Distillation With Adaptive Feature Consolidation
Learning Program Representations for Food Images and Cooking Recipes
Bending Graphs: Hierarchical Shape Matching Using Gated Optimal Transport
Transform-Retrieve-Generate: Natural Language-Centric Outside-Knowledge Visual Question Answering
Federated Learning With Position-Aware Neurons
Fair Contrastive Learning for Facial Attribute Classification
MDAN: Multi-Level Dependent Attention Network for Visual Emotion Analysis
Nested Hyperbolic Spaces for Dimensionality Reduction and Hyperbolic NN Design
BNUDC: A Two-Branched Deep Neural Network for Restoring Images From Under-Display Cameras
RGB-Depth Fusion GAN for Indoor Depth Completion
Training Object Detectors From Scratch: An Empirical Study in the Era of Vision Transformer
RCL: Recurrent Continuous Localization for Temporal Action Detection
C2SLR: Consistency-Enhanced Continuous Sign Language Recognition
Human Trajectory Prediction With Momentary Observation
FoggyStereo: Stereo Matching With Fog Volume Representation
Trajectory Optimization for Physics-Based Reconstruction of 3D Human Pose From Monocular Video
Directional Self-Supervised Learning for Heavy Image Augmentations
Lifelong Unsupervised Domain Adaptive Person Re-Identification With Coordinated Anti-Forgetting and Adaptation
No-Reference Point Cloud Quality Assessment via Domain Adaptation
Generating Representative Samples for Few-Shot Classification
Comprehending and Ordering Semantics for Image Captioning
Dynamic Scene Graph Generation via Anticipatory Pre-Training
A Large-Scale Comprehensive Dataset and Copy-Overlap Aware Evaluation Protocol for Segment-Level Video Copy Detection
GaTector: A Unified Framework for Gaze Object Prediction
ELIC: Efficient Learned Image Compression With Unevenly Grouped Space-Channel Contextual Adaptive Coding
CSWin Transformer: A General Vision Transformer Backbone With Cross-Shaped Windows
LaTr: Layout-Aware Transformer for Scene-Text VQA
Label Relation Graphs Enhanced Hierarchical Residual Network for Hierarchical Multi-Granularity Classification
ITSA: An Information-Theoretic Approach to Automatic Shortcut Avoidance and Domain Generalization in Stereo Matching Networks
Enhancing Face Recognition With Self-Supervised 3D Reconstruction
HeadNeRF: A Real-Time NeRF-Based Parametric Head Model
FvOR: Robust Joint Shape and Pose Optimization for Few-View Object Reconstruction
Reduce Information Loss in Transformers for Pluralistic Image Inpainting
Replacing Labeled Real-Image Datasets With Auto-Generated Contours
Cross-Modal Transferable Adversarial Attacks From Images to Videos
Few Could Be Better Than All: Feature Sampling and Grouping for Scene Text Detection
Do Explanations Explain? Model Knows Best
WebQA: Multihop and Multimodal QA
Occlusion-Robust Face Alignment Using a Viewpoint-Invariant Hierarchical Network Architecture
BasicVSR++: Improving Video Super-Resolution With Enhanced Propagation and Alignment
IDR: Self-Supervised Image Denoising via Iterative Data Refinement
MogFace: Towards a Deeper Appreciation on Face Detection
GuideFormer: Transformers for Image Guided Depth Completion
Multi-Label Iterated Learning for Image Classification With Label Ambiguity
Region-Aware Face Swapping
Towards Language-Free Training for Text-to-Image Generation
Learning Affinity From Attention: End-to-End Weakly-Supervised Semantic Segmentation With Transformers
Pushing the Envelope of Gradient Boosting Forests via Globally-Optimized Oblique Trees
Physical Simulation Layer for Accurate 3D Modeling
Deformable Sprites for Unsupervised Video Decomposition
CamLiFlow: Bidirectional Camera-LiDAR Fusion for Joint Optical Flow and Scene Flow Estimation
FERV39k: A Large-Scale Multi-Scene Dataset for Facial Expression Recognition in Videos
Learning To Detect Mobile Objects From LiDAR Scans Without Labels
BNV-Fusion: Dense 3D Reconstruction Using Bi-Level Neural Volume Fusion
Probabilistic Representations for Video Contrastive Learning
EnvEdit: Environment Editing for Vision-and-Language Navigation
Omnivore: A Single Model for Many Visual Modalities
Neural Shape Mating: Self-Supervised Object Assembly With Adversarial Shape Priors
Reflash Dropout in Image Super-Resolution
WildNet: Learning Domain Generalized Semantic Segmentation From the Wild
Auditing Privacy Defenses in Federated Learning via Generative Gradient Leakage
DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection
DECORE: Deep Compression With Reinforcement Learning
Time3D: End-to-End Joint Monocular 3D Object Detection and Tracking for Autonomous Driving
MonoJSG: Joint Semantic and Geometric Cost Volume for Monocular 3D Object Detection
Task Discrepancy Maximization for Fine-Grained Few-Shot Classification
FedDC: Federated Learning With Non-IID Data via Local Drift Decoupling and Correction
Efficient Classification of Very Large Images With Tiny Objects
SWEM: Towards Real-Time Video Object Segmentation With Sequential Weighted Expectation-Maximization
Point-to-Voxel Knowledge Distillation for LiDAR Semantic Segmentation
Leveling Down in Computer Vision: Pareto Inefficiencies in Fair Deep Classifiers
Generating Diverse 3D Reconstructions From a Single Occluded Face Image
RBGNet: Ray-Based Grouping for 3D Object Detection
Stand-Alone Inter-Frame Attention in Video Models
Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose Estimation
Open-Domain, Content-Based, Multi-Modal Fact-Checking of Out-of-Context Images via Online Resources
Memory-Augmented Deep Conditional Unfolding Network for Pan-Sharpening
Semi-Supervised Wide-Angle Portraits Correction by Multi-Scale Transformer
Large-Scale Pre-Training for Person Re-Identification With Noisy Labels
Adiabatic Quantum Computing for Multi Object Tracking
Feature Erasing and Diffusion Network for Occluded Person Re-Identification
Is Mapping Necessary for Realistic PointGoal Navigation?
Node-Aligned Graph Convolutional Network for Whole-Slide Image Representation and Classification
Represent, Compare, and Learn: A Similarity-Aware Framework for Class-Agnostic Counting
Masked Feature Prediction for Self-Supervised Visual Pre-Training
Critical Regularizations for Neural Surface Reconstruction in the Wild
EASE: Unsupervised Discriminant Subspace Learning for Transductive Few-Shot Learning
Object-Relation Reasoning Graph for Action Recognition
Semantic Segmentation by Early Region Proxy
GIQE: Generic Image Quality Enhancement via Nth Order Iterative Degradation
Instance Segmentation With Mask-Supervised Polygonal Boundary Transformers
FaceVerse: A Fine-Grained and Detail-Controllable 3D Face Morphable Model From a Hybrid Dataset
Bring Evanescent Representations to Life in Lifelong Class Incremental Learning
Single-Stage 3D Geometry-Preserving Depth Estimation Model Training on Dataset Mixtures With Uncalibrated Stereo Data
LD-ConGR: A Large RGB-D Video Dataset for Long-Distance Continuous Gesture Recognition
SimVQA: Exploring Simulated Environments for Visual Question Answering
Thin-Plate Spline Motion Model for Image Animation
Learning Local Displacements for Point Cloud Completion
Human Hands As Probes for Interactive Object Understanding
Understanding and Increasing Efficiency of Frank-Wolfe Adversarial Training
Certified Patch Robustness via Smoothed Vision Transformers
Look Back and Forth: Video Super-Resolution With Explicit Temporal Difference Modeling
UCC: Uncertainty Guided Cross-Head Co-Training for Semi-Supervised Semantic Segmentation
HVH: Learning a Hybrid Neural Volumetric Representation for Dynamic Hair Performance Capture
RADU: Ray-Aligned Depth Update Convolutions for ToF Data Denoising
Rethinking Visual Geo-Localization for Large-Scale Applications
Learning Based Multi-Modality Image and Video Compression
A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved Neural Network Calibration
The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy
Deep Image-Based Illumination Harmonization
ViM: Out-of-Distribution With Virtual-Logit Matching