Skip to content

Commit

Permalink
update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
abikaki committed Feb 29, 2024
1 parent 5c93b8e commit eedf2f3
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 10 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -295,8 +295,8 @@ CVPR 2023 Papers: Explore a comprehensive collection of cutting-edge research pa
<td>
<a href="https://github.com/DmitryRyumin/CVPR-2023-Papers/blob/main/sections/2023/main/vision-language-and-reasoning.md">Vision, Language, and Reasoning</a>
</td>
<!--20/118-->
<td colspan="4" align="center"><img src="https://geps.dev/progress/17?successColor=006600" alt="" /></td>
<!--28/118-->
<td colspan="4" align="center"><img src="https://geps.dev/progress/24?successColor=006600" alt="" /></td>
</tr>
<tr>
<td>
Expand Down
16 changes: 8 additions & 8 deletions sections/2023/main/vision-language-and-reasoning.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,14 +49,14 @@
| EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding | [![GitHub](https://img.shields.io/github/stars/yanmin-wu/EDA?style=flat)](https://github.com/yanmin-wu/EDA) | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com//content/CVPR2023/papers/Wu_EDA_Explicit_Text-Decoupling_and_Dense_Alignment_for_3D_Visual_Grounding_CVPR_2023_paper.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2209.14941-b31b1b.svg)](http://arxiv.org/abs/2209.14941) | [![YouTube](https://img.shields.io/badge/YouTube-%23FF0000.svg?style=for-the-badge&logo=YouTube&logoColor=white)](https://www.youtube.com/watch?v=YBpPqYU07Es) |
| RefTeacher: A Strong Baseline for Semi-Supervised Referring Expression Comprehension | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://refteacher.github.io/) <br /> [![GitHub](https://img.shields.io/github/stars/Disguiser15/RefTeacher?style=flat)](https://github.com/Disguiser15/RefTeacher) | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com//content/CVPR2023/papers/Sun_RefTeacher_A_Strong_Baseline_for_Semi-Supervised_Referring_Expression_Comprehension_CVPR_2023_paper.pdf) | :heavy_minus_sign: |
| Mobile User Interface Element Detection via Adaptively Prompt Tuning | [![GitHub](https://img.shields.io/github/stars/antmachineintelligence/MUI-zh?style=flat)](https://github.com/antmachineintelligence/MUI-zh) | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com//content/CVPR2023/papers/Gu_Mobile_User_Interface_Element_Detection_via_Adaptively_Prompt_Tuning_CVPR_2023_paper.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.09699-b31b1b.svg)](http://arxiv.org/abs/2305.09699) | [![YouTube](https://img.shields.io/badge/YouTube-%23FF0000.svg?style=for-the-badge&logo=YouTube&logoColor=white)](https://www.youtube.com/watch?v=dMC26H1DQWw) |
| Context-Aware Alignment and Mutual Masking for 3D-Language Pre-Training | | | |
| Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation | | | |
| Meta Compositional Referring Expression Segmentation | | | |
| VindLU: A Recipe for Effective Video-and-Language Pretraining | | | |
| Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning | | | |
| GIVL: Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods | | | |
| Learning Customized Visual Models with Retrieval-Augmented Knowledge | | | |
| LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling | | | |
| Context-Aware Alignment and Mutual Masking for 3D-Language Pre-Training <br/> [![CVPR - Highlight](https://img.shields.io/badge/CVPR-Highlight-FFFF00)]() | [![GitHub](https://img.shields.io/github/stars/leolyj/3D-VLP?style=flat)](https://github.com/leolyj/3D-VLP) | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com//content/CVPR2023/papers/Jin_Context-Aware_Alignment_and_Mutual_Masking_for_3D-Language_Pre-Training_CVPR_2023_paper.pdf) | :heavy_minus_sign: |
| Tell Me What Happened: Unifying Text-Guided Video Completion via Multimodal Masked Video Generation | [![GitHub](https://img.shields.io/github/stars/tsujuifu/pytorch_tvc?style=flat)](https://github.com/tsujuifu/pytorch_tvc) | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com//content/CVPR2023/papers/Fu_Tell_Me_What_Happened_Unifying_Text-Guided_Video_Completion_via_Multimodal_CVPR_2023_paper.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.12824-b31b1b.svg)](http://arxiv.org/abs/2211.12824) | [![YouTube](https://img.shields.io/badge/YouTube-%23FF0000.svg?style=for-the-badge&logo=YouTube&logoColor=white)](https://www.youtube.com/watch?v=dnBzUfsf9Cc) |
| Meta Compositional Referring Expression Segmentation | :heavy_minus_sign: | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com//content/CVPR2023/papers/Xu_Meta_Compositional_Referring_Expression_Segmentation_CVPR_2023_paper.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2304.04415-b31b1b.svg)](http://arxiv.org/abs/2304.04415) | :heavy_minus_sign: |
| VindLU: A Recipe for Effective Video-and-Language Pretraining | [![GitHub](https://img.shields.io/github/stars/klauscc/VindLU?style=flat)](https://github.com/klauscc/VindLU) | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com//content/CVPR2023/papers/Cheng_VindLU_A_Recipe_for_Effective_Video-and-Language_Pretraining_CVPR_2023_paper.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2212.05051-b31b1b.svg)](http://arxiv.org/abs/2212.05051) | [![YouTube](https://img.shields.io/badge/YouTube-%23FF0000.svg?style=for-the-badge&logo=YouTube&logoColor=white)](https://www.youtube.com/watch?v=9koWpSPcYBQ) |
| Super-CLEVR: A Virtual Benchmark To Diagnose Domain Robustness in Visual Reasoning <br/> [![CVPR - Highlight](https://img.shields.io/badge/CVPR-Highlight-FFFF00)]() | [![GitHub](https://img.shields.io/github/stars/Lizw14/Super-CLEVR?style=flat)](https://github.com/Lizw14/Super-CLEVR) | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com//content/CVPR2023/papers/Li_Super-CLEVR_A_Virtual_Benchmark_To_Diagnose_Domain_Robustness_in_Visual_CVPR_2023_paper.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2212.00259-b31b1b.svg)](https://arxiv.org/abs/2212.00259) | [![YouTube](https://img.shields.io/badge/YouTube-%23FF0000.svg?style=for-the-badge&logo=YouTube&logoColor=white)](https://www.youtube.com/watch?v=DWRp_70ypiA) |
| GIVL: Improving Geographical Inclusivity of Vision-Language Models With Pre-Training Methods| :heavy_minus_sign: | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com//content/CVPR2023/papers/Yin_GIVL_Improving_Geographical_Inclusivity_of_Vision-Language_Models_With_Pre-Training_Methods_CVPR_2023_paper.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2301.01893-b31b1b.svg)](http://arxiv.org/abs/2301.01893) | :heavy_minus_sign: |
| Learning Customized Visual Models With Retrieval-Augmented Knowledge <br/> [![CVPR - Highlight](https://img.shields.io/badge/CVPR-Highlight-FFFF00)]() | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://react-vl.github.io/) <br /> [![GitHub](https://img.shields.io/github/stars/microsoft/react?style=flat)](https://github.com/microsoft/react) | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com//content/CVPR2023/papers/Liu_Learning_Customized_Visual_Models_With_Retrieval-Augmented_Knowledge_CVPR_2023_paper.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2301.07094-b31b1b.svg)](http://arxiv.org/abs/2301.07094) | :heavy_minus_sign: |
| LAVENDER: Unifying Video-Language Understanding As Masked Language Modeling | [![GitHub](https://img.shields.io/github/stars/microsoft/LAVENDER?style=flat)](https://github.com/microsoft/LAVENDER) | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com//content/CVPR2023/papers/Li_LAVENDER_Unifying_Video-Language_Understanding_As_Masked_Language_Modeling_CVPR_2023_paper.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2206.07160-b31b1b.svg)](http://arxiv.org/abs/2206.07160) | [![YouTube](https://img.shields.io/badge/YouTube-%23FF0000.svg?style=for-the-badge&logo=YouTube&logoColor=white)](https://www.youtube.com/watch?v=f8scI82_caE) |
| An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling | | | |
| NS3D: Neuro-Symbolic Grounding of 3D Objects and Relations | | | |
| Clover: Towards a Unified Video-Language Alignment and Fusion Model | | | |
Expand Down

0 comments on commit eedf2f3

Please sign in to comment.