Skip to content

Commit

Permalink
Merge pull request #107 from llm-jp/f/japanese-stable-clip
Browse files Browse the repository at this point in the history
add Japanese Stable CLIP
  • Loading branch information
kaisugi authored Dec 3, 2023
2 parents 8b3ae02 + 7923914 commit 36be509
Show file tree
Hide file tree
Showing 3 changed files with 12 additions and 6 deletions.
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -219,8 +219,9 @@

| | モデル | 学習画像/テキスト | 開発元 | ライセンス | HuggingFace ですぐ使える? |
|:---|:---:|:---:|:---:|:---:|:---:|
| [日本語CLIP](https://rinna.co.jp/news/2022/05/20220512.html) | CLIP <br>(画像エンコーダは google/vit-base-patch16-224 で重みが初期化された ViT-B/16、<br>テキストエンコーダは rinna RoBERTa で重みが初期化された RoBERTa(base)) | CC12M のキャプションを日本語に翻訳したもの | rinna | Apache 2.0 | [](https://huggingface.co/rinna/japanese-clip-vit-b-16) |
| [日本語CLOOB](https://rinna.co.jp/news/2022/05/20220512.html) | CLOOB <br>(画像エンコーダは google/vit-base-patch16-224 で重みが初期化された ViT-B/16、<br>テキストエンコーダは rinna RoBERTa で重みが初期化された RoBERTa(base)) | CC12M のキャプションを日本語に翻訳したもの | rinna | Apache 2.0 | [](https://huggingface.co/rinna/japanese-cloob-vit-b-16) |
| [Japanese Stable CLIP](https://ja.stability.ai/blog/japanese-stable-clip) | SigLIP | CC12M のキャプションを日本語に翻訳したもの、STAIR Captions | Stability AI | STABILITY AI JAPANESE STABLE CLIP COMMUNITY LICENSE | [](https://huggingface.co/stabilityai/japanese-stable-clip-vit-l-16) |
| [rinna CLIP](https://rinna.co.jp/news/2022/05/20220512.html) | CLIP | CC12M のキャプションを日本語に翻訳したもの | rinna | Apache 2.0 | [](https://huggingface.co/rinna/japanese-clip-vit-b-16) |
| [rinna CLOOB](https://rinna.co.jp/news/2022/05/20220512.html) | CLOOB | CC12M のキャプションを日本語に翻訳したもの | rinna | Apache 2.0 | [](https://huggingface.co/rinna/japanese-cloob-vit-b-16) |
| [Japanese Stable Diffusion XL](https://ja.stability.ai/blog/japanese-stable-diffusion-xl) | Stable Diffusion | 不明 | Stability AI | STABILITY AI JAPANESE STABLE DIFFUSION XL COMMUNITY LICENSE | [](https://huggingface.co/stabilityai/japanese-stable-diffusion-xl) |
| [東北大Stable Diffusion](https://huggingface.co/cl-tohoku/stable-diffusion-xl-jp-base-1.0) | Stable Diffusion | WMT2023 Shared Task の日英対訳コーパス、laion2B-multi のキャプション約 1,300 万件 | 東北大<br>自然言語処理研究グループ | CreativeML OpenRAIL-M License | ◯ ([base](https://huggingface.co/cl-tohoku/stable-diffusion-xl-jp-base-1.0), [refiner](https://huggingface.co/cl-tohoku/stable-diffusion-xl-jp-refiner-1.0)) |
| [rinna Stable Diffusion](https://rinna.co.jp/news/2022/09/20220909.html) | Stable Diffusion | LAION-5B データセットのうちキャプションが日本語のもの(画像約 1 億枚)| rinna | CreativeML OpenRAIL-M License | [](https://huggingface.co/rinna/japanese-stable-diffusion) |
Expand Down Expand Up @@ -284,6 +285,7 @@
| BLIP-2 | 2023.01.30 | ICML 2023 | [BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models](https://arxiv.org/abs/2301.12597) |
| Llama | 2023.02.27 | - | [LLaMA: Open and Efficient Foundation Language Models](https://arxiv.org/abs/2302.13971) |
| GPT-4 | 2023.03.15 | - | [GPT-4 Technical Report](https://arxiv.org/abs/2303.08774) |
| SigLIP | 2023.03.27 | ICCV 2023 | [Sigmoid Loss for Language Image Pre-Training](https://arxiv.org/abs/2303.15343) |
| LLaVA | 2023.04.17 | NeurIPS 2023 | [Visual Instruction Tuning](https://arxiv.org/abs/2304.08485) |
| MiniGPT-4 | 2023.04.20 | - | [MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models](https://arxiv.org/abs/2304.10592) |
| InstructBLIP | 2023.05.11 | - | [InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning](https://arxiv.org/abs/2305.06500) |
Expand Down
6 changes: 4 additions & 2 deletions README_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -219,8 +219,9 @@ Please point out any errors on the [issues page](https://github.com/llm-jp/aweso

| | Architecture | Training Data | Developer | License | HuggingFace? |
|:---|:---:|:---:|:---:|:---:|:---:|
| [JapaneseCLIP](https://rinna.co.jp/news/2022/05/20220512.html) | CLIP <br>(Image encoding with google/vit-base-patch16-224 initialized ViT-B/16 model, <br>text encoding with rinna RoBERTa initialized RoBERTa(base) model) | CC12M translated to Japanese | rinna | Apache 2.0 | [](https://huggingface.co/rinna/japanese-clip-vit-b-16) |
| [JapaneseCLOOB](https://rinna.co.jp/news/2022/05/20220512.html) | CLOOB <br>(Image encoding with google/vit-base-patch16-224 initialized ViT-B/16 model, <br>text encoding with rinna RoBERTa initialized RoBERTa(base) model) | CC12M translated to Japanese | rinna | Apache 2.0 | [](https://huggingface.co/rinna/japanese-cloob-vit-b-16) |
| [Japanese Stable CLIP](https://ja.stability.ai/blog/japanese-stable-clip) | SigLIP | CC12M translated to Japanese, STAIR Captions | Stability AI | STABILITY AI JAPANESE STABLE CLIP COMMUNITY LICENSE | [](https://huggingface.co/stabilityai/japanese-stable-clip-vit-l-16) |
| [rinna CLIP](https://rinna.co.jp/news/2022/05/20220512.html) | CLIP | CC12M translated to Japanese | rinna | Apache 2.0 | [](https://huggingface.co/rinna/japanese-clip-vit-b-16) |
| [rinna CLOOB](https://rinna.co.jp/news/2022/05/20220512.html) | CLOOB | CC12M translated to Japanese | rinna | Apache 2.0 | [](https://huggingface.co/rinna/japanese-cloob-vit-b-16) |
| [Japanese Stable Diffusion XL](https://ja.stability.ai/blog/japanese-stable-diffusion-xl) | Stable Diffusion | Unknown | Stability AI | STABILITY AI JAPANESE STABLE DIFFUSION XL COMMUNITY LICENSE | [](https://huggingface.co/stabilityai/japanese-stable-diffusion-xl) |
| [TohokuUniversity Stable Diffusion](https://huggingface.co/cl-tohoku/stable-diffusion-xl-jp-base-1.0) | Stable Diffusion | WMT2023 Shared Task English-Japanese parallel corpus, about 13 million captions from laion2B-multi | Tohoku University NLP Group | CreativeML OpenRAIL-M License | ◯ ([base](https://huggingface.co/cl-tohoku/stable-diffusion-xl-jp-base-1.0), [refiner](https://huggingface.co/cl-tohoku/stable-diffusion-xl-jp-refiner-1.0)) |
| [rinna Stable Diffusion](https://rinna.co.jp/news/2022/09/20220909.html) | Stable Diffusion | LAION-5B Japanese Subset (100M images) | rinna | CreativeML OpenRAIL-M License | [](https://huggingface.co/rinna/japanese-stable-diffusion) |
Expand Down Expand Up @@ -284,6 +285,7 @@ Please point out any errors on the [issues page](https://github.com/llm-jp/aweso
| BLIP-2 | 2023.01.30 | ICML 2023 | [BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models](https://arxiv.org/abs/2301.12597) |
| Llama | 2023.02.27 | - | [LLaMA: Open and Efficient Foundation Language Models](https://arxiv.org/abs/2302.13971) |
| GPT-4 | 2023.03.15 | - | [GPT-4 Technical Report](https://arxiv.org/abs/2303.08774) |
| SigLIP | 2023.03.27 | ICCV 2023 | [Sigmoid Loss for Language Image Pre-Training](https://arxiv.org/abs/2303.15343) |
| LLaVA | 2023.04.17 | NeurIPS 2023 | [Visual Instruction Tuning](https://arxiv.org/abs/2304.08485) |
| MiniGPT-4 | 2023.04.20 | - | [MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models](https://arxiv.org/abs/2304.10592) |
| InstructBLIP | 2023.05.11 | - | [InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning](https://arxiv.org/abs/2305.06500) |
Expand Down
6 changes: 4 additions & 2 deletions README_fr.md
Original file line number Diff line number Diff line change
Expand Up @@ -219,8 +219,9 @@ N'hésitez pas à signaler les erreurs sur la page [issues](https://github.com/l

| | Architecture | Données d'entraînement | Développeur | Licence | HuggingFace? |
|:---|:---:|:---:|:---:|:---:|:---:|
| [JapaneseCLIP](https://rinna.co.jp/news/2022/05/20220512.html) | CLIP <br>(Encodage d'image avec google/vit-base-patch16-224 initialisé par modèle ViT-B/16, <br>Encodage textuelle avec rinna RoBERTa initialisé RoBERTa(base) model) | CC12M traduit en japonais | rinna | Apache 2.0 | [](https://huggingface.co/rinna/japanese-clip-vit-b-16) |
| [JapaneseCLOOB](https://rinna.co.jp/news/2022/05/20220512.html) | CLOOB <br>(Image encoding with google/vit-base-patch16-224 initialized ViT-B/16 model, <br>Encodage textuelle avec rinna RoBERTa initialisé RoBERTa(base) model) | CC12M traduit en japonais | rinna | Apache 2.0 | [](https://huggingface.co/rinna/japanese-cloob-vit-b-16) |
| [Japanese Stable CLIP](https://ja.stability.ai/blog/japanese-stable-clip) | SigLIP | CC12M traduit en japonais, STAIR Captions | Stability AI | STABILITY AI JAPANESE STABLE CLIP COMMUNITY LICENSE | [](https://huggingface.co/stabilityai/japanese-stable-clip-vit-l-16) |
| [rinna CLIP](https://rinna.co.jp/news/2022/05/20220512.html) | CLIP | CC12M traduit en japonais | rinna | Apache 2.0 | [](https://huggingface.co/rinna/japanese-clip-vit-b-16) |
| [rinna CLOOB](https://rinna.co.jp/news/2022/05/20220512.html) | CLOOB | CC12M traduit en japonais | rinna | Apache 2.0 | [](https://huggingface.co/rinna/japanese-cloob-vit-b-16) |
| [Japanese Stable Diffusion XL](https://ja.stability.ai/blog/japanese-stable-diffusion-xl) | Stable Diffusion | Inconnu | Stability AI | STABILITY AI JAPANESE STABLE DIFFUSION XL COMMUNITY LICENSE | [](https://huggingface.co/stabilityai/japanese-stable-diffusion-xl) |
| [TohokuUniversity Stable Diffusion](https://huggingface.co/cl-tohoku/stable-diffusion-xl-jp-base-1.0) | Stable Diffusion | Corpus parallèle anglais-japonais de la tâche partagée WMT2023, environ 13 millions de légendes de laion2B-multi | Université de Tohoku - Groupe TAL | CreativeML OpenRAIL-M License | ◯ ([base](https://huggingface.co/cl-tohoku/stable-diffusion-xl-jp-base-1.0), [refiner](https://huggingface.co/cl-tohoku/stable-diffusion-xl-jp-refiner-1.0)) |
| [rinna Stable Diffusion](https://rinna.co.jp/news/2022/09/20220909.html) | Stable Diffusion | LAION-5B Japanese Subset (100M images) | rinna | CreativeML OpenRAIL-M License | [](https://huggingface.co/rinna/japanese-stable-diffusion) |
Expand Down Expand Up @@ -284,6 +285,7 @@ N'hésitez pas à signaler les erreurs sur la page [issues](https://github.com/l
| BLIP-2 | 2023.01.30 | ICML 2023 | [BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models](https://arxiv.org/abs/2301.12597) |
| Llama | 2023.02.27 | - | [LLaMA: Open and Efficient Foundation Language Models](https://arxiv.org/abs/2302.13971) |
| GPT-4 | 2023.03.15 | - | [GPT-4 Technical Report](https://arxiv.org/abs/2303.08774) |
| SigLIP | 2023.03.27 | ICCV 2023 | [Sigmoid Loss for Language Image Pre-Training](https://arxiv.org/abs/2303.15343) |
| LLaVA | 2023.04.17 | NeurIPS 2023 | [Visual Instruction Tuning](https://arxiv.org/abs/2304.08485) |
| MiniGPT-4 | 2023.04.20 | - | [MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models](https://arxiv.org/abs/2304.10592) |
| InstructBLIP | 2023.05.11 | - | [InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning](https://arxiv.org/abs/2305.06500) |
Expand Down

0 comments on commit 36be509

Please sign in to comment.