update video-action-and-event-understanding.md

DmitryRyumin · Jan 19, 2024 · a57918f · a57918f
1 parent d12d804
commit a57918f
Showing 1 changed file with 14 additions and 14 deletions.
diff --git a/sections/video-action-and-event-understanding.md b/sections/video-action-and-event-understanding.md
@@ -40,7 +40,7 @@
 | Text-Visual Prompting for Efficient 2D Temporal Video Grounding | [![GitHub](https://img.shields.io/github/stars/intel/TVP?style=flat)](https://github.com/intel/TVP) | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com/content/CVPR2023/papers/Zhang_Text-Visual_Prompting_for_Efficient_2D_Temporal_Video_Grounding_CVPR_2023_paper.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2303.04995-b31b1b.svg)](http://arxiv.org/abs/2303.04995) | [![YouTube](https://img.shields.io/badge/YouTube-%23FF0000.svg?style=for-the-badge&logo=YouTube&logoColor=white)](https://www.youtube.com/watch?v=zj2s_G3066s) |
 | Enlarging Instance-Specific and Class-Specific Information for Open-Set Action Recognition | [![GitHub](https://img.shields.io/github/stars/Jun-CEN/PSL?style=flat)](https://github.com/Jun-CEN/PSL) | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com/content/CVPR2023/papers/Cen_Enlarging_Instance-Specific_and_Class-Specific_Information_for_Open-Set_Action_Recognition_CVPR_2023_paper.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2303.15467-b31b1b.svg)](http://arxiv.org/abs/2303.15467) | [![YouTube](https://img.shields.io/badge/YouTube-%23FF0000.svg?style=for-the-badge&logo=YouTube&logoColor=white)](https://www.youtube.com/watch?v=SofkzNeymP4) |
 | TimeBalance: Temporally-Invariant and Temporally-Distinctive Video Representations for Semi-Supervised Action Recognition | [![GitHub](https://img.shields.io/github/stars/DAVEISHAN/TimeBalance?style=flat)](https://github.com/DAVEISHAN/TimeBalance) | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com/content/CVPR2023/papers/Dave_TimeBalance_Temporally-Invariant_and_Temporally-Distinctive_Video_Representations_for_Semi-Supervised_Action_Recognition_CVPR_2023_paper.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2303.16268-b31b1b.svg)](http://arxiv.org/abs/2303.16268) | [![YouTube](https://img.shields.io/badge/YouTube-%23FF0000.svg?style=for-the-badge&logo=YouTube&logoColor=white)](https://www.youtube.com/watch?v=2c5LM6YqPKQ) |
-| Learning Video Representations from Large Language Models | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://facebookresearch.github.io/LaViLa/) <br /> [![GitHub](https://img.shields.io/github/stars/facebookresearch/LaViLa?style=flat)](https://github.com/facebookresearch/LaViLa) | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com/content/CVPR2023/papers/Zhao_Learning_Video_Representations_From_Large_Language_Models_CVPR_2023_paper.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2212.04501-b31b1b.svg)](http://arxiv.org/abs/2212.04501) | [![YouTube](https://img.shields.io/badge/YouTube-%23FF0000.svg?style=for-the-badge&logo=YouTube&logoColor=white)](https://www.youtube.com/watch?v=tbQaP07xQ4c) |
+| Learning Video Representations from Large Language Models <br/> [![CVPR - Highlight](https://img.shields.io/badge/CVPR-Highlight-FFFF00)]()  | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://facebookresearch.github.io/LaViLa/) <br /> [![GitHub](https://img.shields.io/github/stars/facebookresearch/LaViLa?style=flat)](https://github.com/facebookresearch/LaViLa) | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com/content/CVPR2023/papers/Zhao_Learning_Video_Representations_From_Large_Language_Models_CVPR_2023_paper.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2212.04501-b31b1b.svg)](http://arxiv.org/abs/2212.04501) | [![YouTube](https://img.shields.io/badge/YouTube-%23FF0000.svg?style=for-the-badge&logo=YouTube&logoColor=white)](https://www.youtube.com/watch?v=tbQaP07xQ4c) |
 | Fine-tuned CLIP Models are Efficient Video Learners | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://muzairkhattak.github.io/ViFi-CLIP/) <br /> [![GitHub](https://img.shields.io/github/stars/muzairkhattak/ViFi-CLIP?style=flat)](https://github.com/muzairkhattak/ViFi-CLIP) | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com/content/CVPR2023/papers/Rasheed_Fine-Tuned_CLIP_Models_Are_Efficient_Video_Learners_CVPR_2023_paper.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2212.03640-b31b1b.svg)](http://arxiv.org/abs/2212.03640) | [![YouTube](https://img.shields.io/badge/YouTube-%23FF0000.svg?style=for-the-badge&logo=YouTube&logoColor=white)](https://www.youtube.com/watch?v=uqPLPIyWBb0) |
 | Efficient Movie Scene Detection Using State-Space Transformers | [![GitHub](https://img.shields.io/github/stars/md-mohaiminul/TranS4mer?style=flat)](https://github.com/md-mohaiminul/TranS4mer) | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com//content/CVPR2023/papers/Islam_Efficient_Movie_Scene_Detection_Using_State-Space_Transformers_CVPR_2023_paper.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2212.14427-b31b1b.svg)](http://arxiv.org/abs/2212.14427) | [![YouTube](https://img.shields.io/badge/YouTube-%23FF0000.svg?style=for-the-badge&logo=YouTube&logoColor=white)](https://www.youtube.com/watch?v=EOmVAByPQbE) |
 | AdamsFormer for Spatial Action Localization in the Future | :heavy_minus_sign: | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com//content/CVPR2023/papers/Chi_AdamsFormer_for_Spatial_Action_Localization_in_the_Future_CVPR_2023_paper.pdf) | [![YouTube](https://img.shields.io/badge/YouTube-%23FF0000.svg?style=for-the-badge&logo=YouTube&logoColor=white)](https://www.youtube.com/watch?v=PK0O-ynPgr0) |
@@ -49,20 +49,20 @@
 | STMixer: A One-Stage Sparse Action Detector | [![GitHub](https://img.shields.io/github/stars/MCG-NJU/STMixer?style=flat)](https://github.com/MCG-NJU/STMixer) | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com//content/CVPR2023/papers/Wu_STMixer_A_One-Stage_Sparse_Action_Detector_CVPR_2023_paper.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2303.15879-b31b1b.svg)](http://arxiv.org/abs/2303.15879) | [![YouTube](https://img.shields.io/badge/YouTube-%23FF0000.svg?style=for-the-badge&logo=YouTube&logoColor=white)](https://www.youtube.com/watch?v=Sy4jozsQLM0) |
 | Revisiting Temporal Modeling for CLIP-Based Image-to-Video Knowledge Transferring | [![GitHub](https://img.shields.io/github/stars/farewellthree/STAN?style=flat)](https://github.com/farewellthree/STAN) | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com//content/CVPR2023/papers/Liu_Revisiting_Temporal_Modeling_for_CLIP-Based_Image-to-Video_Knowledge_Transferring_CVPR_2023_paper.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2301.11116-b31b1b.svg)](http://arxiv.org/abs/2301.11116) | [![YouTube](https://img.shields.io/badge/YouTube-%23FF0000.svg?style=for-the-badge&logo=YouTube&logoColor=white)](https://www.youtube.com/watch?v=kaDItcB1iFw) |
 | Distilling Vision-Language Pre-Training To Collaborate With Weakly-Supervised Temporal Action Localization | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://voide1220.github.io/distillation_collaboration/) <br /> [![GitHub](https://img.shields.io/github/stars/ju-chen/Efficient-Prompt?style=flat)](https://github.com/ju-chen/Efficient-Prompt) | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com//content/CVPR2023/papers/Ju_Distilling_Vision-Language_Pre-Training_To_Collaborate_With_Weakly-Supervised_Temporal_Action_Localization_CVPR_2023_paper.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2212.09335-b31b1b.svg)](http://arxiv.org/abs/2212.09335) | :heavy_minus_sign: |
-| Real-Time Multi-Person Eyeblink Detection in the Wild for Untrimmed Video | [![GitHub](https://img.shields.io/github/stars/wenzhengzeng/MPEblink?style=flat)](https://github.com/wenzhengzeng/MPEblink) | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com//content/CVPR2023/papers/Zeng_Real-Time_Multi-Person_Eyeblink_Detection_in_the_Wild_for_Untrimmed_Video_CVPR_2023_paper.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2303.16053-b31b1b.svg)](http://arxiv.org/abs/2303.16053) |[![YouTube](https://img.shields.io/badge/YouTube-%23FF0000.svg?style=for-the-badge&logo=YouTube&logoColor=white)](https://www.youtube.com/watch?v=ngME7dym0Uk) |
+| Real-Time Multi-Person Eyeblink Detection in the Wild for Untrimmed Video | [![GitHub](https://img.shields.io/github/stars/wenzhengzeng/MPEblink?style=flat)](https://github.com/wenzhengzeng/MPEblink) | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com//content/CVPR2023/papers/Zeng_Real-Time_Multi-Person_Eyeblink_Detection_in_the_Wild_for_Untrimmed_Video_CVPR_2023_paper.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2303.16053-b31b1b.svg)](http://arxiv.org/abs/2303.16053) | [![YouTube](https://img.shields.io/badge/YouTube-%23FF0000.svg?style=for-the-badge&logo=YouTube&logoColor=white)](https://www.youtube.com/watch?v=ngME7dym0Uk) |
 | Modeling Video As Stochastic Processes for Fine-Grained Video Representation Learning <br/> ![CVPR - Highlight](https://img.shields.io/badge/CVPR-Highlight-FFFF00) | [![GitHub](https://img.shields.io/github/stars/hengRUC/VSP?style=flat)](https://github.com/hengRUC/VSP) | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com//content/CVPR2023/papers/Zhang_Modeling_Video_As_Stochastic_Processes_for_Fine-Grained_Video_Representation_Learning_CVPR_2023_paper.pdf) | :heavy_minus_sign: |
-| Re<sup>2</sup>TAL: <u>Re</u>wiring Pretrained Video Backbones for <u>Re</u>versible <u>T</u>emporal <u>A</u>ction <u>L</u>ocalization |  |  |  |
-| Learning Discriminative Representations for Skeleton based Action Recognition |  |  |  |
-| Learning Procedure-Aware Video Representation from Instructional Videos and Their Narrations |  |  |  |
-| Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception |  |  |  |
-| PivoTAL: Prior-Driven Supervision for Weakly-Supervised Temporal Action Localization |  |  |  |
-| Cascade Evidential Learning for Open-World Weakly-Supervised Temporal Action Localization |  |  |  |
-| Soft-Landing Strategy for Alleviating the Task Discrepancy Problem in Temporal Action Localization Tasks |  |  |  |
-| SVFormer: Semi-Supervised Video Transformer for Action Recognition |  |  |  |
-| AutoAD: Movie Description in Context |  |  |  |
-| STMT: A Spatial-Temporal Mesh Transformer for MoCap-based Action Recognition |  |  |  |
-| Boosting Weakly-Supervised Temporal Action Localization with Text Information |  |  |  |
-| Aligning Step-by-Step Instructional Diagrams to Video Demonstrations |  |  |  |
+| Re<sup>2</sup>TAL: <u>Re</u>wiring Pretrained Video Backbones for <u>Re</u>versible <u>T</u>emporal <u>A</u>ction <u>L</u>ocalization | [![GitHub](https://img.shields.io/github/stars/coolbay/Re2TAL?style=flat)](https://github.com/coolbay/Re2TAL) | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com//content/CVPR2023/papers/Zhao_Re2TAL_Rewiring_Pretrained_Video_Backbones_for_Reversible_Temporal_Action_Localization_CVPR_2023_paper.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.14053-b31b1b.svg)](https://arxiv.org/abs/2211.14053) | [![YouTube](https://img.shields.io/badge/YouTube-%23FF0000.svg?style=for-the-badge&logo=YouTube&logoColor=white)](https://www.youtube.com/watch?v=Oa29cFo_nMY) |
+| Learning Discriminative Representations for Skeleton Based Action Recognition | [![GitHub](https://img.shields.io/github/stars/zhysora/FR-Head?style=flat)](https://github.com/zhysora/FR-Head) | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com//content/CVPR2023/papers/Zhou_Learning_Discriminative_Representations_for_Skeleton_Based_Action_Recognition_CVPR_2023_paper.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2303.03729-b31b1b.svg)](http://arxiv.org/abs/2303.03729) | [![YouTube](https://img.shields.io/badge/YouTube-%23FF0000.svg?style=for-the-badge&logo=YouTube&logoColor=white)](https://www.youtube.com/watch?v=ix6rADaCjNs) |
+| Learning Procedure-Aware Video Representation From Instructional Videos and Their Narrations | [![GitHub](https://img.shields.io/github/stars/facebookresearch/ProcedureVRL?style=flat)](https://github.com/facebookresearch/ProcedureVRL) | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com//content/CVPR2023/papers/Yu_Learning_Procedure-Aware_Video_Representation_From_Instructional_Videos_and_Their_Narrations_CVPR_2023_paper.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2303.17839-b31b1b.svg)](http://arxiv.org/abs/2303.17839) | [![YouTube](https://img.shields.io/badge/YouTube-%23FF0000.svg?style=for-the-badge&logo=YouTube&logoColor=white)](https://www.youtube.com/watch?v=YPq-rziL8Jo) |
+| Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception | [![GitHub](https://img.shields.io/github/stars/MengyuanChen21/CVPR2023-CMPAE?style=flat)](https://github.com/MengyuanChen21/CVPR2023-CMPAE) | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com//content/CVPR2023/papers/Gao_Collecting_Cross-Modal_Presence-Absence_Evidence_for_Weakly-Supervised_Audio-Visual_Event_Perception_CVPR_2023_paper.pdf) | :heavy_minus_sign: |
+| PivoTAL: Prior-Driven Supervision for Weakly-Supervised Temporal Action Localization | :heavy_minus_sign: | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com//content/CVPR2023/papers/Rizve_PivoTAL_Prior-Driven_Supervision_for_Weakly-Supervised_Temporal_Action_Localization_CVPR_2023_paper.pdf) | [![YouTube](https://img.shields.io/badge/YouTube-%23FF0000.svg?style=for-the-badge&logo=YouTube&logoColor=white)](https://www.youtube.com/watch?v=6kAoQjXfzio) |
+| Cascade Evidential Learning for Open-World Weakly-Supervised Temporal Action Localization | [![GitHub](https://img.shields.io/github/stars/zhenyingfang/Awesome-Temporal-Action-Detection-Temporal-Action-Proposal-Generation?style=flat)](https://github.com/zhenyingfang/Awesome-Temporal-Action-Detection-Temporal-Action-Proposal-Generation) | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com//content/CVPR2023/papers/Chen_Cascade_Evidential_Learning_for_Open-World_Weakly-Supervised_Temporal_Action_Localization_CVPR_2023_paper.pdf) | :heavy_minus_sign: |
+| Soft-Landing Strategy for Alleviating the Task Discrepancy Problem in Temporal Action Localization Tasks | :heavy_minus_sign: | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com//content/CVPR2023/papers/Kang_Soft-Landing_Strategy_for_Alleviating_the_Task_Discrepancy_Problem_in_Temporal_CVPR_2023_paper.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.06023-b31b1b.svg)](http://arxiv.org/abs/2211.06023) | :heavy_minus_sign: |
+| SVFormer: Semi-Supervised Video Transformer for Action Recognition | [![GitHub](https://img.shields.io/github/stars/ChenHsing/SVFormer?style=flat)](https://github.com/ChenHsing/SVFormer) | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com//content/CVPR2023/papers/Xing_SVFormer_Semi-Supervised_Video_Transformer_for_Action_Recognition_CVPR_2023_paper.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.13222-b31b1b.svg)](http://arxiv.org/abs/2211.13222) | [![YouTube](https://img.shields.io/badge/YouTube-%23FF0000.svg?style=for-the-badge&logo=YouTube&logoColor=white)](https://www.youtube.com/watch?v=6kAoQjXfzio) |
+| AutoAD: Movie Description in Context <br/> [![CVPR - Highlight](https://img.shields.io/badge/CVPR-Highlight-FFFF00)]() | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://www.robots.ox.ac.uk/~vgg/research/autoad/) | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com//content/CVPR2023/papers/Han_AutoAD_Movie_Description_in_Context_CVPR_2023_paper.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2303.16899-b31b1b.svg)](http://arxiv.org/abs/2303.16899) | [![YouTube](https://img.shields.io/badge/YouTube-%23FF0000.svg?style=for-the-badge&logo=YouTube&logoColor=white)](https://www.youtube.com/watch?v=gMQSoib6lSI) |
+| STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition | [![GitHub](https://img.shields.io/github/stars/zgzxy001/STMT?style=flat)](https://github.com/zgzxy001/STMT) | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com//content/CVPR2023/papers/Zhu_STMT_A_Spatial-Temporal_Mesh_Transformer_for_MoCap-Based_Action_Recognition_CVPR_2023_paper.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2303.18177-b31b1b.svg)](http://arxiv.org/abs/2303.18177) | :heavy_minus_sign: |
+| Boosting Weakly-Supervised Temporal Action Localization With Text Information | [![GitHub](https://img.shields.io/github/stars/lgzlIlIlI/Boosting-WTAL?style=flat)](https://github.com/lgzlIlIlI/Boosting-WTAL) | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com//content/CVPR2023/papers/Li_Boosting_Weakly-Supervised_Temporal_Action_Localization_With_Text_Information_CVPR_2023_paper.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.00607-b31b1b.svg)](http://arxiv.org/abs/2305.00607) | [![YouTube](https://img.shields.io/badge/YouTube-%23FF0000.svg?style=for-the-badge&logo=YouTube&logoColor=white)](https://www.youtube.com/watch?v=n8p4ZU85LXM) |
+| Aligning Step-by-Step Instructional Diagrams to Video Demonstrations| [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://academic.davidz.cn/en/publication/zhang-cvpr-2023/) <br /> [![GitHub](https://img.shields.io/github/stars/DavidZhang73/AssemblyVideoManualAlignment?style=flat)](https://github.com/DavidZhang73/AssemblyVideoManualAlignment) | [![thecvf](https://img.shields.io/badge/pdf-thecvf-7395C5.svg)](https://openaccess.thecvf.com//content/CVPR2023/papers/Zhang_Aligning_Step-by-Step_Instructional_Diagrams_to_Video_Demonstrations_CVPR_2023_paper.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2303.13800-b31b1b.svg)](http://arxiv.org/abs/2303.13800) | [![YouTube](https://img.shields.io/badge/YouTube-%23FF0000.svg?style=for-the-badge&logo=YouTube&logoColor=white)](https://www.youtube.com/watch?v=8iC5QyP8U6o) |
 | Improving Weakly Supervised Temporal Action Localization by Bridging Train-Test Gap in Pseudo Labels |  |  |  |
 | Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos |  |  |  |
 | Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline |  |  |  |