ai-bot-pro · weedge · Dec 27, 2024 · Jan 5, 2025 · Jan 5, 2025 · Jan 5, 2025
diff --git a/modules/CNN/ByteNet/model.py b/modules/CNN/ByteNet/model.py
diff --git a/modules/CNN/ConvNeXt/model.py b/modules/CNN/ConvNeXt/model.py
diff --git a/modules/CNN/ConvS2S/model.py b/modules/CNN/ConvS2S/model.py
diff --git a/modules/CNN/README.md b/modules/CNN/README.md
@@ -13,6 +13,7 @@
 - [Deep Voice: Real-time Neural Text-to-Speech](https://arxiv.org/abs/1702.07825) (see Appendix WaveNet detail arch)
 - [Neural Machine Translation in Linear Time](https://arxiv.org/abs/1610.10099) (BitNet in char-level NMT encoder-decoder arch)
 - [Convolutional Sequence to Sequence Learning](https://arxiv.org/abs/1705.03122) (ConvS2S in NMT with Multi-step Attention in decoder)
+- [2022. A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) (对ResNet-200进行改进，按照ViT(encoder)变体 Swin Transformer设计，替换MSA(multiheaded self-attention)-> 7x7 conv2d; MLP(linear->1x1 kernel conv2d,active function ReLU->GeLU), BN->LN, 纯 ConvNet 模型结构, 性能和Swin transformer相当， 但是模型结构轻巧，推理更快； 但是ConvNeXt 可能更适合某些任务，比如： 图像分类、对象检测、实例和语义分割任务； 而 Transformers 对于其他任务可能更灵活，泛化能力强，当用于需要离散、稀疏或结构化输出的任务时，Transformer 可能会更加灵活。所以架构选择应该满足手头任务的需求，同时力求简单。)
 
 
 ## CNN 局限

diff --git a/modules/CNN/WaveNet/model.py b/modules/CNN/WaveNet/model.py
diff --git a/modules/ViT/README.md b/modules/ViT/README.md
@@ -1,13 +1,15 @@
 # reference
-- [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929)
+- [2020. **An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale**](https://arxiv.org/abs/2010.11929)
 - https://github.com/google-research/vision_transformer
 - https://github.com/huggingface/pytorch-image-models/blob/main/timm/models/vision_transformer.py
-- https://github.com/lucidrains/vit-pytorch
+- https://github.com/lucidrains/vit-pytorch (ViTs)
 
+- [2021. **Swin Transformer: Hierarchical Vision Transformer using Shifted Windows**](https://arxiv.org/abs/2103.14030)
+- https://github.com/microsoft/Swin-Transformer
 
 ------
 
-use transformer encoder, BERT (for nlp task, predict)
+use transformer encoder, like BERT (for nlp task, predict)
 - [BERT: Pre-training of deep bidirectional transformers for language understanding](https://arxiv.org/abs/1810.04805)
 - https://github.com/google-research/bert
 - https://github.com/datawhalechina/learn-nlp-with-transformers/blob/main/docs/%E7%AF%87%E7%AB%A02-Transformer%E7%9B%B8%E5%85%B3%E5%8E%9F%E7%90%86/2.3-%E5%9B%BE%E8%A7%A3BERT.md