Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add CNN #6

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Empty file added modules/CNN/ByteNet/model.py
Empty file.
Empty file added modules/CNN/ConvNeXt/model.py
Empty file.
Empty file added modules/CNN/ConvS2S/model.py
Empty file.
1 change: 1 addition & 0 deletions modules/CNN/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
- [Deep Voice: Real-time Neural Text-to-Speech](https://arxiv.org/abs/1702.07825) (see Appendix WaveNet detail arch)
- [Neural Machine Translation in Linear Time](https://arxiv.org/abs/1610.10099) (BitNet in char-level NMT encoder-decoder arch)
- [Convolutional Sequence to Sequence Learning](https://arxiv.org/abs/1705.03122) (ConvS2S in NMT with Multi-step Attention in decoder)
- [2022. A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) (对ResNet-200进行改进,按照ViT(encoder)变体 Swin Transformer设计,替换MSA(multiheaded self-attention)-> 7x7 conv2d; MLP(linear->1x1 kernel conv2d,active function ReLU->GeLU), BN->LN, 纯 ConvNet 模型结构, 性能和Swin transformer相当, 但是模型结构轻巧,推理更快; 但是ConvNeXt 可能更适合某些任务,比如: 图像分类、对象检测、实例和语义分割任务; 而 Transformers 对于其他任务可能更灵活,泛化能力强,当用于需要离散、稀疏或结构化输出的任务时,Transformer 可能会更加灵活。所以架构选择应该满足手头任务的需求,同时力求简单。)


## CNN 局限
Expand Down
Empty file added modules/CNN/WaveNet/model.py
Empty file.
8 changes: 5 additions & 3 deletions modules/ViT/README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,15 @@
# reference
- [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929)
- [2020. **An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale**](https://arxiv.org/abs/2010.11929)
- https://github.com/google-research/vision_transformer
- https://github.com/huggingface/pytorch-image-models/blob/main/timm/models/vision_transformer.py
- https://github.com/lucidrains/vit-pytorch
- https://github.com/lucidrains/vit-pytorch (ViTs)

- [2021. **Swin Transformer: Hierarchical Vision Transformer using Shifted Windows**](https://arxiv.org/abs/2103.14030)
- https://github.com/microsoft/Swin-Transformer

------

use transformer encoder, BERT (for nlp task, predict)
use transformer encoder, like BERT (for nlp task, predict)
- [BERT: Pre-training of deep bidirectional transformers for language understanding](https://arxiv.org/abs/1810.04805)
- https://github.com/google-research/bert
- https://github.com/datawhalechina/learn-nlp-with-transformers/blob/main/docs/%E7%AF%87%E7%AB%A02-Transformer%E7%9B%B8%E5%85%B3%E5%8E%9F%E7%90%86/2.3-%E5%9B%BE%E8%A7%A3BERT.md