[VLM] Image resize model #1256

yatarkan · 2024-11-26T13:50:55Z

No description provided.

src/cpp/src/visual_language/image_resize.cpp

ilya-lavrenov · 2024-11-26T20:33:44Z

src/cpp/src/visual_language/image_resize.cpp

+        result,
+        ov::ParameterVector{input, sizes_param},
+        "image_resizer"
+    );


I have similar code for image to image / inpainting scenario https://github.com/ilya-lavrenov/openvino.genai/blob/ce3e1e3b095c2dc3d0e613c15bfa5fe778285d77/src/cpp/src/image_generation/image_processor.hpp#L105, does my implementation provide the same results? I'm not sure why some clamp, round operations are needed, my implementation is taken from translator of pytorch operation to OpenVINO in PT FE.

Now, it's a part of master

openvino.genai/src/cpp/src/image_generation/image_processor.hpp

Lines 35 to 47 in aef1591

class ImageResizer {

public:

ImageResizer(const std::string& device, ov::element::Type type, ov::Layout layout, ov::op::v11::Interpolate::InterpolateMode interpolation_mode);

ov::Tensor execute(ov::Tensor image, int64_t dst_height, int64_t dst_width);

private:

size_t get_and_check_width_idx(const Layout& layout, const PartialShape& shape);

size_t get_and_check_height_idx(const Layout& layout, const PartialShape& shape);

ov::InferRequest m_request;

};

CC @Wovchena can it be tried in minicpm or other VLMs?

github-actions bot added category: visual language Visual language pipeline category: GHA CI based on Github actions category: tokenizers Tokenizer class or submodule update labels Nov 26, 2024

ilya-lavrenov reviewed Nov 26, 2024

View reviewed changes

ilya-lavrenov self-assigned this Nov 26, 2024

yatarkan added 8 commits November 27, 2024 19:17

Move clip image convertion functions to clip.cpp

2e1ed08

Add image resize model

7b61c22

Switch bicubic resize function to image resize model

928fe72

Bump tokenizers to master

16f43ca

Add optional steps to compare llava output with reference

17e19c9

Enable llava 1.5 reference comparison

b48e0ae

Apply comment from code review for partial dynamic shape

471f55d

Fix generating llava reference in vlm job

2216349

yatarkan force-pushed the yt/image-resize-model branch from b0fb1d8 to 2216349 Compare November 27, 2024 15:25

Enable comparing llava next with ref in GH Actions

9c52dcf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[VLM] Image resize model #1256

[VLM] Image resize model #1256

yatarkan commented Nov 26, 2024

ilya-lavrenov Nov 26, 2024

ilya-lavrenov Dec 11, 2024


	class ImageResizer {
	public:
	ImageResizer(const std::string& device, ov::element::Type type, ov::Layout layout, ov::op::v11::Interpolate::InterpolateMode interpolation_mode);

	ov::Tensor execute(ov::Tensor image, int64_t dst_height, int64_t dst_width);

	private:
	size_t get_and_check_width_idx(const Layout& layout, const PartialShape& shape);
	size_t get_and_check_height_idx(const Layout& layout, const PartialShape& shape);

	ov::InferRequest m_request;
	};

[VLM] Image resize model #1256

Are you sure you want to change the base?

[VLM] Image resize model #1256

Conversation

yatarkan commented Nov 26, 2024

ilya-lavrenov Nov 26, 2024

Choose a reason for hiding this comment

ilya-lavrenov Dec 11, 2024

Choose a reason for hiding this comment