-
Notifications
You must be signed in to change notification settings - Fork 276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add OpenFlamingo #2237
Merged
Merged
Add OpenFlamingo #2237
Changes from 20 commits
Commits
Show all changes
44 commits
Select commit
Hold shift + click to select a range
915f7cc
add openflamingo
michiyasunaga 367aa30
fix ver
michiyasunaga 799f41f
fix ver
michiyasunaga 0b7bcb4
fix ver
michiyasunaga 70a8e34
fix ver
michiyasunaga 1eef0a5
fix ver
michiyasunaga abe19e3
fix ver
michiyasunaga b221cbf
fix ver
michiyasunaga 63011e7
fix ver
michiyasunaga 7729aca
add openflamingo
michiyasunaga 1d23a9a
add openflamingo
michiyasunaga d27893a
add openflamingo
michiyasunaga c9fd286
add openflamingo
michiyasunaga 5367f05
add openflamingo
michiyasunaga 0afa051
add openflamingo
michiyasunaga 014aada
add openflamingo
michiyasunaga 1bee70c
add openflamingo
michiyasunaga 3613907
fix GHA build - define openflamingo dependencies
teetone 1affed4
Merge branch 'main' of https://github.com/stanford-crfm/benchmarking …
teetone 9c1ad0a
address code review
michiyasunaga 1ca8408
fix transformers version
michiyasunaga 292456e
Merge branch 'main' into michi_openflamingo
michiyasunaga 12d535e
merge main
JosselinSomervilleRoberts 84cc573
Add some parameters to the model deployment
JosselinSomervilleRoberts 3756295
Fixing einops dependency conflict
JosselinSomervilleRoberts 9dc70dc
Remove duplicated crfm-helm['image'] dependency
JosselinSomervilleRoberts 6f531ac
Merge branch 'main' of https://github.com/stanford-crfm/benchmarking …
teetone 8c44df7
more logging for model init
teetone 1fd7961
fix token init in openflamingo
teetone 472bacb
fix token init in openflamingo
teetone 5a64bb3
Merge branch 'main' of https://github.com/stanford-crfm/benchmarking …
teetone 65e2950
resolve
teetone 0ea9ced
fix tokenizer
teetone 6128ad2
update conf
teetone 163b13c
Merge branch 'main' of https://github.com/stanford-crfm/benchmarking …
teetone e51e550
Merge branch 'main' of https://github.com/stanford-crfm/benchmarking …
teetone 865b33a
disable temporarily
teetone a4f0586
resolve merge conflicts
teetone 8f0e763
undo
teetone e2404a9
fix paths
teetone 83eaefa
get in-context learning examples to work
teetone 1404de4
fix decoding
teetone ac19049
fix sequence construction
teetone 8c6cdcb
include num_completions in cache key
teetone File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -52,7 +52,7 @@ install_requires= | |
scikit-learn~=1.1.2 | ||
|
||
# Models and Metrics Extras | ||
transformers~=4.36.0 # For anthropic_client, vision_language.huggingface_vlm_client, huggingface_client, huggingface_tokenizer, test_openai_token_cost_estimator, model_summac (via summarization_metrics) | ||
transformers>=4.28.0 # For anthropic_client, vision_language.huggingface_vlm_client, huggingface_client, huggingface_tokenizer, test_openai_token_cost_estimator, model_summac (via summarization_metrics) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same, we need |
||
# TODO: Upgrade torch - we need > 2.0.0 for newer versions of transformers | ||
torch>=1.12.1,<3.0.0 # For huggingface_client, yalm_tokenizer, model_summac (via summarization_metrics) | ||
torchvision>=0.13.1,<3.0.0 # For huggingface_client, yalm_tokenizer, model_summac (via summarization_metrics) | ||
|
@@ -135,7 +135,13 @@ models = | |
crfm-helm[yandex] | ||
|
||
vlm = | ||
torch~=2.1.2 # For IDEFICS | ||
# For OpenFlamingo | ||
einops~=0.7.0 | ||
einops-exts~=0.0.4 | ||
open-clip-torch~=2.24.0 | ||
|
||
# For IDEFICS | ||
torch~=2.1.2 | ||
|
||
heim = | ||
# HEIM scenarios | ||
|
@@ -223,6 +229,7 @@ exclude = | |
venv/* | ||
src/helm/proxy/clients/image_generation/dalle_mini/* | ||
src/helm/proxy/clients/image_generation/mindalle/* | ||
src/helm/proxy/clients/vision_language/open_flamingo/* | ||
|
||
# Ignore completely: | ||
# E203 - White space before ':', (conflicts with black) | ||
|
@@ -240,7 +247,7 @@ check_untyped_defs = True | |
disable_error_code = annotation-unchecked | ||
# TODO: Change disallow_untyped_defs to True | ||
disallow_untyped_defs = False | ||
exclude = dalle_mini|mindalle | ||
exclude = dalle_mini|mindalle|open_flamingo | ||
|
||
[tool:pytest] | ||
addopts = | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -165,6 +165,12 @@ tokenizer_configs: | |
class_name: "helm.proxy.tokenizers.huggingface_tokenizer.HuggingFaceTokenizer" | ||
end_of_text_token: "</s>" | ||
prefix_token: "<s>" | ||
|
||
- name: anas-awadalla/mpt-7b | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this different from the pre-existing |
||
tokenizer_spec: | ||
class_name: "helm.proxy.tokenizers.huggingface_tokenizer.HuggingFaceTokenizer" | ||
end_of_text_token: "<|endoftext|>" | ||
prefix_token: "" | ||
|
||
# Huggingface | ||
- name: huggingface/gpt2 | ||
|
2 changes: 2 additions & 0 deletions
2
src/helm/proxy/clients/vision_language/open_flamingo/__init__.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
from .src.flamingo import Flamingo | ||
from .src.factory import create_model_and_transforms |
Empty file.
147 changes: 147 additions & 0 deletions
147
src/helm/proxy/clients/vision_language/open_flamingo/src/factory.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,147 @@ | ||
""" | ||
Source: https://github.com/mlfoundations/open_flamingo | ||
""" | ||
|
||
from typing import Optional | ||
|
||
from transformers import AutoModelForCausalLM, AutoTokenizer | ||
|
||
from helm.common.general import handle_module_not_found_error | ||
from .flamingo import Flamingo | ||
from .flamingo_lm import FlamingoLMMixin | ||
from .utils import extend_instance | ||
|
||
|
||
def create_model_and_transforms( | ||
clip_vision_encoder_path: str, | ||
clip_vision_encoder_pretrained: str, | ||
lang_encoder_path: str, | ||
tokenizer_path: str, | ||
cross_attn_every_n_layers: int = 1, | ||
use_local_files: bool = False, | ||
decoder_layers_attr_name: str = None, | ||
freeze_lm_embeddings: bool = False, | ||
cache_dir: Optional[str] = None, | ||
**flamingo_kwargs, | ||
): | ||
""" | ||
Initialize a Flamingo model from a pretrained vision encoder and language encoder. | ||
Appends special tokens to the tokenizer and freezes backbones. | ||
|
||
Args: | ||
clip_vision_encoder_path (str): path to pretrained clip model (e.g. "ViT-B-32") | ||
clip_vision_encoder_pretrained (str): name of pretraining dataset for clip model (e.g. "laion2b_s32b_b79k") | ||
lang_encoder_path (str): path to pretrained language encoder | ||
tokenizer_path (str): path to pretrained tokenizer | ||
cross_attn_every_n_layers (int, optional): determines how often to add a cross-attention layer. Defaults to 1. | ||
use_local_files (bool, optional): whether to use local files. Defaults to False. | ||
decoder_layers_attr_name (str, optional): name of the decoder layers attribute. Defaults to None. | ||
freeze_lm_embeddings (bool, optional): whether to freeze LM input embeddings when configuring Perceiver. | ||
cache_dir (str, optional): path to cache directory for downloading OpenClip/HF weights. | ||
Returns: | ||
Flamingo: Flamingo model from pretrained vision and language encoders | ||
Image processor: Pipeline to preprocess input images | ||
Tokenizer: A tokenizer for the language model | ||
""" | ||
try: | ||
import open_clip | ||
except ModuleNotFoundError as e: | ||
handle_module_not_found_error(e, ["vlm"]) | ||
|
||
vision_encoder, _, image_processor = open_clip.create_model_and_transforms( | ||
clip_vision_encoder_path, | ||
pretrained=clip_vision_encoder_pretrained, | ||
cache_dir=cache_dir, | ||
) | ||
# set the vision encoder to output the visual features | ||
vision_encoder.visual.output_tokens = True | ||
|
||
text_tokenizer = AutoTokenizer.from_pretrained( | ||
tokenizer_path, | ||
local_files_only=use_local_files, | ||
trust_remote_code=True, | ||
cache_dir=cache_dir, | ||
) | ||
# add Flamingo special tokens to the tokenizer | ||
text_tokenizer.add_special_tokens({"additional_special_tokens": ["<|endofchunk|>", "<image>"]}) | ||
if text_tokenizer.pad_token is None: | ||
# Issue: GPT models don't have a pad token, which we use to | ||
# modify labels for the loss. | ||
text_tokenizer.add_special_tokens({"pad_token": "<PAD>"}) | ||
|
||
lang_encoder = AutoModelForCausalLM.from_pretrained( | ||
lang_encoder_path, | ||
local_files_only=use_local_files, | ||
trust_remote_code=True, | ||
cache_dir=cache_dir, | ||
) | ||
|
||
# hacks for MPT-1B, which doesn't have a get_input_embeddings method | ||
if "mpt-1b-redpajama-200b" in lang_encoder_path: | ||
|
||
class EmbeddingFnMixin: | ||
def get_input_embeddings(self): | ||
return self.transformer.wte | ||
|
||
def set_input_embeddings(self, new_embeddings): | ||
self.transformer.wte = new_embeddings | ||
|
||
extend_instance(lang_encoder, EmbeddingFnMixin) | ||
|
||
# convert LM to FlamingoLM | ||
extend_instance(lang_encoder, FlamingoLMMixin) | ||
|
||
if decoder_layers_attr_name is None: | ||
decoder_layers_attr_name = _infer_decoder_layers_attr_name(lang_encoder) | ||
lang_encoder.set_decoder_layers_attr_name(decoder_layers_attr_name) | ||
lang_encoder.resize_token_embeddings(len(text_tokenizer)) | ||
|
||
model = Flamingo( | ||
vision_encoder, | ||
lang_encoder, | ||
text_tokenizer.encode("<|endofchunk|>")[-1], | ||
text_tokenizer.encode("<image>")[-1], | ||
vis_dim=open_clip.get_model_config(clip_vision_encoder_path)["vision_cfg"]["width"], | ||
cross_attn_every_n_layers=cross_attn_every_n_layers, | ||
**flamingo_kwargs, | ||
) | ||
|
||
# Freeze all parameters | ||
model.requires_grad_(False) | ||
assert sum(p.numel() for p in model.parameters() if p.requires_grad) == 0 | ||
|
||
# Unfreeze perceiver, gated_cross_attn_layers, and LM input embeddings | ||
model.perceiver.requires_grad_(True) | ||
model.lang_encoder.gated_cross_attn_layers.requires_grad_(True) | ||
if not freeze_lm_embeddings: | ||
model.lang_encoder.get_input_embeddings().requires_grad_(True) | ||
# TODO: investigate also training the output embeddings when untied | ||
|
||
print( | ||
f"Flamingo model initialized with {sum(p.numel() for p in model.parameters() if p.requires_grad)} trainable parameters" | ||
) | ||
|
||
return model, image_processor, text_tokenizer | ||
|
||
|
||
def _infer_decoder_layers_attr_name(model): | ||
for k in __KNOWN_DECODER_LAYERS_ATTR_NAMES: | ||
if k.lower() in model.__class__.__name__.lower(): | ||
return __KNOWN_DECODER_LAYERS_ATTR_NAMES[k] | ||
|
||
raise ValueError( | ||
"We require the attribute name for the nn.ModuleList in the decoder storing the transformer block layers. " | ||
"Please supply this string manually." | ||
) | ||
|
||
|
||
__KNOWN_DECODER_LAYERS_ATTR_NAMES = { | ||
"opt": "model.decoder.layers", | ||
"gptj": "transformer.h", | ||
"gpt-j": "transformer.h", | ||
"pythia": "gpt_neox.layers", | ||
"llama": "model.layers", | ||
"gptneoxforcausallm": "gpt_neox.layers", | ||
"mpt": "transformer.blocks", | ||
"mosaicgpt": "transformer.blocks", | ||
} |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you revert this? This is going to break Llava
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review! This was a bit tricky. OpenFlamingo seems to require 4.32.0; I tried 4.36.0, but encountered errors like
ImportError: cannot import name '_expand_mask' from 'transformers.models.bloom.modeling_bloom'
(similar to salesforce/LAVIS#571).