Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VLM] Image resize model #1256

Draft
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

yatarkan
Copy link
Contributor

No description provided.

@github-actions github-actions bot added category: visual language Visual language pipeline category: GHA CI based on Github actions category: tokenizers Tokenizer class or submodule update labels Nov 26, 2024
src/cpp/src/visual_language/image_resize.cpp Outdated Show resolved Hide resolved
result,
ov::ParameterVector{input, sizes_param},
"image_resizer"
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have similar code for image to image / inpainting scenario https://github.com/ilya-lavrenov/openvino.genai/blob/ce3e1e3b095c2dc3d0e613c15bfa5fe778285d77/src/cpp/src/image_generation/image_processor.hpp#L105, does my implementation provide the same results? I'm not sure why some clamp, round operations are needed, my implementation is taken from translator of pytorch operation to OpenVINO in PT FE.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, it's a part of master

class ImageResizer {
public:
ImageResizer(const std::string& device, ov::element::Type type, ov::Layout layout, ov::op::v11::Interpolate::InterpolateMode interpolation_mode);
ov::Tensor execute(ov::Tensor image, int64_t dst_height, int64_t dst_width);
private:
size_t get_and_check_width_idx(const Layout& layout, const PartialShape& shape);
size_t get_and_check_height_idx(const Layout& layout, const PartialShape& shape);
ov::InferRequest m_request;
};

CC @Wovchena can it be tried in minicpm or other VLMs?

@ilya-lavrenov ilya-lavrenov self-assigned this Nov 26, 2024
@yatarkan yatarkan force-pushed the yt/image-resize-model branch from b0fb1d8 to 2216349 Compare November 27, 2024 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: GHA CI based on Github actions category: tokenizers Tokenizer class or submodule update category: visual language Visual language pipeline
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants