Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

colpali v1.3 by AndrewOgn #427

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

colpali v1.3 by AndrewOgn #427

wants to merge 13 commits into from

Conversation

joein
Copy link
Member

@joein joein commented Dec 18, 2024

it's a draft of second iteration of work on colpali #394

@joein joein changed the title wip: design draft wip: colpali design draft Dec 18, 2024
@I8dNLo
Copy link
Contributor

I8dNLo commented Dec 23, 2024

To check out values for tests I use code examples from here

Copy link
Member Author

@joein joein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fastembed/late_interaction_multimodal/__init__.py Outdated Show resolved Hide resolved
tests/test_late_interaction_multimodal.py Outdated Show resolved Hide resolved
tests/test_late_interaction_multimodal.py Outdated Show resolved Hide resolved
fastembed/late_interaction_multimodal/colpali.py Outdated Show resolved Hide resolved
fastembed/late_interaction_multimodal/colpali.py Outdated Show resolved Hide resolved
fastembed/late_interaction_multimodal/colpali.py Outdated Show resolved Hide resolved
PAD_TOKEN = "<pad>"
QUERY_MARKER_TOKEN_ID = [2, 9413]
IMAGE_PLACEHOLDER_SIZE = (3, 448, 448)
EMPTY_TEXT_PLACEHOLDER = np.array([257152] * 1024 + [2, 50721, 573, 2416, 235265, 108])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually token ids of the following string '<image>' * 1024 + '<bos>Describe the image.\n'
Could we make it nicer? It's not really readable at the moment

EVEN_ATTENTION_MASK is also not really readable, maybe instead of having this even_attention_mask we could assign 1030 to a constant which seems to be a bit more reasonable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's this tokens, right. But there are some slight difference between real text input and this placeholder.
As you can see here original preprocessor does not add '\n' token there, while it should be added everywhere else. So we should refactor logic of tokenize, trigger tokenizer each time (with constant output)

@I8dNLo I8dNLo changed the title wip: colpali design draft colpali v1.3 by AndrewOgn Jan 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants