update README.md

DmitryRyumin · Feb 29, 2024 · eedf2f3 · eedf2f3
1 parent 5c93b8e
commit eedf2f3
Show file tree

Hide file tree

Showing 2 changed files with 10 additions and 10 deletions.
diff --git a/README.md b/README.md
@@ -295,8 +295,8 @@ CVPR 2023 Papers: Explore a comprehensive collection of cutting-edge research pa
             <td>
                 <a href="https://github.com/DmitryRyumin/CVPR-2023-Papers/blob/main/sections/2023/main/vision-language-and-reasoning.md">Vision, Language, and Reasoning</a>
             </td>
-             <!--20/118-->
-             <td colspan="4" align="center"><img src="https://geps.dev/progress/17?successColor=006600" alt="" /></td>
+             <!--28/118-->
+             <td colspan="4" align="center"><img src="https://geps.dev/progress/24?successColor=006600" alt="" /></td>
         </tr>
         <tr>
             <td>

diff --git a/sections/2023/main/vision-language-and-reasoning.md b/sections/2023/main/vision-language-and-reasoning.md
@@ -49,14 +49,14 @@
 | EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding | [![GitHub](https://img.shields.io/github/stars/yanmin-wu/EDA?style=flat)](https://github.com/yanmin-wu/EDA)  | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com//content/CVPR2023/papers/Wu_EDA_Explicit_Text-Decoupling_and_Dense_Alignment_for_3D_Visual_Grounding_CVPR_2023_paper.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2209.14941-b31b1b.svg)](http://arxiv.org/abs/2209.14941) | [![YouTube](https://img.shields.io/badge/YouTube-%23FF0000.svg?style=for-the-badge&logo=YouTube&logoColor=white)](https://www.youtube.com/watch?v=YBpPqYU07Es) |
 | RefTeacher: A Strong Baseline for Semi-Supervised Referring Expression Comprehension | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://refteacher.github.io/) <br /> [![GitHub](https://img.shields.io/github/stars/Disguiser15/RefTeacher?style=flat)](https://github.com/Disguiser15/RefTeacher) | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com//content/CVPR2023/papers/Sun_RefTeacher_A_Strong_Baseline_for_Semi-Supervised_Referring_Expression_Comprehension_CVPR_2023_paper.pdf) | :heavy_minus_sign: |
 | Mobile User Interface Element Detection via Adaptively Prompt Tuning | [![GitHub](https://img.shields.io/github/stars/antmachineintelligence/MUI-zh?style=flat)](https://github.com/antmachineintelligence/MUI-zh)  | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com//content/CVPR2023/papers/Gu_Mobile_User_Interface_Element_Detection_via_Adaptively_Prompt_Tuning_CVPR_2023_paper.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.09699-b31b1b.svg)](http://arxiv.org/abs/2305.09699) | [![YouTube](https://img.shields.io/badge/YouTube-%23FF0000.svg?style=for-the-badge&logo=YouTube&logoColor=white)](https://www.youtube.com/watch?v=dMC26H1DQWw) |
-| Context-Aware Alignment and Mutual Masking for 3D-Language Pre-Training |  |  |  |
-| Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation |  |  |  |
-| Meta Compositional Referring Expression Segmentation |  |  |  |
-| VindLU: A Recipe for Effective Video-and-Language Pretraining |  |  |  |
-| Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning |  |  |  |
-| GIVL: Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods |  |  |  |
-| Learning Customized Visual Models with Retrieval-Augmented Knowledge |  |  |  |
-| LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling |  |  |  |
+| Context-Aware Alignment and Mutual Masking for 3D-Language Pre-Training <br/> [![CVPR - Highlight](https://img.shields.io/badge/CVPR-Highlight-FFFF00)]()  | [![GitHub](https://img.shields.io/github/stars/leolyj/3D-VLP?style=flat)](https://github.com/leolyj/3D-VLP) | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com//content/CVPR2023/papers/Jin_Context-Aware_Alignment_and_Mutual_Masking_for_3D-Language_Pre-Training_CVPR_2023_paper.pdf) | :heavy_minus_sign: |
+| Tell Me What Happened: Unifying Text-Guided Video Completion via Multimodal Masked Video Generation | [![GitHub](https://img.shields.io/github/stars/tsujuifu/pytorch_tvc?style=flat)](https://github.com/tsujuifu/pytorch_tvc) | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com//content/CVPR2023/papers/Fu_Tell_Me_What_Happened_Unifying_Text-Guided_Video_Completion_via_Multimodal_CVPR_2023_paper.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.12824-b31b1b.svg)](http://arxiv.org/abs/2211.12824) | [![YouTube](https://img.shields.io/badge/YouTube-%23FF0000.svg?style=for-the-badge&logo=YouTube&logoColor=white)](https://www.youtube.com/watch?v=dnBzUfsf9Cc) |
+| Meta Compositional Referring Expression Segmentation |  :heavy_minus_sign:  | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com//content/CVPR2023/papers/Xu_Meta_Compositional_Referring_Expression_Segmentation_CVPR_2023_paper.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2304.04415-b31b1b.svg)](http://arxiv.org/abs/2304.04415) | :heavy_minus_sign: |
+| VindLU: A Recipe for Effective Video-and-Language Pretraining | [![GitHub](https://img.shields.io/github/stars/klauscc/VindLU?style=flat)](https://github.com/klauscc/VindLU) | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com//content/CVPR2023/papers/Cheng_VindLU_A_Recipe_for_Effective_Video-and-Language_Pretraining_CVPR_2023_paper.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2212.05051-b31b1b.svg)](http://arxiv.org/abs/2212.05051) | [![YouTube](https://img.shields.io/badge/YouTube-%23FF0000.svg?style=for-the-badge&logo=YouTube&logoColor=white)](https://www.youtube.com/watch?v=9koWpSPcYBQ) |
+| Super-CLEVR: A Virtual Benchmark To Diagnose Domain Robustness in Visual Reasoning <br/> [![CVPR - Highlight](https://img.shields.io/badge/CVPR-Highlight-FFFF00)]() | [![GitHub](https://img.shields.io/github/stars/Lizw14/Super-CLEVR?style=flat)](https://github.com/Lizw14/Super-CLEVR) | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com//content/CVPR2023/papers/Li_Super-CLEVR_A_Virtual_Benchmark_To_Diagnose_Domain_Robustness_in_Visual_CVPR_2023_paper.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2212.00259-b31b1b.svg)](https://arxiv.org/abs/2212.00259) | [![YouTube](https://img.shields.io/badge/YouTube-%23FF0000.svg?style=for-the-badge&logo=YouTube&logoColor=white)](https://www.youtube.com/watch?v=DWRp_70ypiA) |
+| GIVL: Improving Geographical Inclusivity of Vision-Language Models With Pre-Training Methods| :heavy_minus_sign: | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com//content/CVPR2023/papers/Yin_GIVL_Improving_Geographical_Inclusivity_of_Vision-Language_Models_With_Pre-Training_Methods_CVPR_2023_paper.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2301.01893-b31b1b.svg)](http://arxiv.org/abs/2301.01893) | :heavy_minus_sign: |
+| Learning Customized Visual Models With Retrieval-Augmented Knowledge <br/> [![CVPR - Highlight](https://img.shields.io/badge/CVPR-Highlight-FFFF00)]() | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://react-vl.github.io/) <br /> [![GitHub](https://img.shields.io/github/stars/microsoft/react?style=flat)](https://github.com/microsoft/react) | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com//content/CVPR2023/papers/Liu_Learning_Customized_Visual_Models_With_Retrieval-Augmented_Knowledge_CVPR_2023_paper.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2301.07094-b31b1b.svg)](http://arxiv.org/abs/2301.07094) | :heavy_minus_sign: |
+| LAVENDER: Unifying Video-Language Understanding As Masked Language Modeling | [![GitHub](https://img.shields.io/github/stars/microsoft/LAVENDER?style=flat)](https://github.com/microsoft/LAVENDER) | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com//content/CVPR2023/papers/Li_LAVENDER_Unifying_Video-Language_Understanding_As_Masked_Language_Modeling_CVPR_2023_paper.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2206.07160-b31b1b.svg)](http://arxiv.org/abs/2206.07160) | [![YouTube](https://img.shields.io/badge/YouTube-%23FF0000.svg?style=for-the-badge&logo=YouTube&logoColor=white)](https://www.youtube.com/watch?v=f8scI82_caE) |
 | An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling |  |  |  |
 | NS3D: Neuro-Symbolic Grounding of 3D Objects and Relations |  |  |  |
 | Clover: Towards a Unified Video-Language Alignment and Fusion Model |  |  |  |