-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add performance statistics for image generation #1405
base: master
Are you sure you want to change the base?
add performance statistics for image generation #1405
Conversation
Could you please provide example of such prints? |
@xufang-lisa let's add a custom struct struct OPENVINO_GENAI_EXPORTS ImageGenerationPerfMetrics {
float load_time; // model load time (includes reshape & read_model time)
float generate_duration; // duration of method generate(...)
MeanStdPair iteration_duration; // Mean-Std time of one generation iteration
std::map<std::string, float> encoder_inference_duration; // inference durations for each encoder
MeanStdPair unet_inference_duration; // inference duration for unet model, should be filled with zeros if we don't have unet
MeanStdPair transformer_inference_duration; // inference duration for transformer model, should be filled with zeros if we don't have transformer
float vae_encoder_inference_duration; // inference duration of vae_encoder model, should be filled with zeros if we don't use it
float vae_decoder_inference_duration; // inference duration of vae_decoder model
bool m_evaluated = false;
RawImageGenerationPerfMetrics raw_metrics;
};
struct OPENVINO_GENAI_EXPORTS RawImageGenerationPerfMetrics {
std::vector<MicroSeconds> unet_inference_durations; // unet durations for each step
std::vector<MicroSeconds> transformer_inference_durations; // transformer durations for each step
std::vector<MicroSeconds> iteration_durations; // durations of each step
}; I'd also like to propose return |
updated |
could you please add benchmark for image generation similar to benchmark for GenAI LLM, which can print all these detailed statistic? And provide here an output of such sample. BTW, original samples should be kept as is and avoid printed timings. |
@ilya-lavrenov Do you mean to add support for image generation in benchmark_genai? |
no, create dedicated benchmark application for image generation, which is similar to VLM / LLM benchmarks |
cxxopts::Options options("benchmark_image_generation", "Help command"); | ||
|
||
options.add_options() | ||
("m,model", "Path to model and tokenizers base directory", cxxopts::value<std::string>()->default_value(".")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please, don't imply that model by default is located in current folder.
I would make this parameters as required as in other benchmark applications:
openvino.genai/samples/cpp/text_generation/benchmark_genai.cpp
Lines 10 to 11 in 4fb48de
options.add_options() | |
("m,model", "Path to model and tokenizers base directory", cxxopts::value<std::string>()) |
std::cout << std::fixed << std::setprecision(2); | ||
std::cout << "Load time: " << load_time << " ms" << std::endl; | ||
std::cout << "One generate avg time: " << generate_mean << " ms" << std::endl; | ||
std::cout << "Total inference for one generate avg time: " << inference_mean << " ms" << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we print more information? E.g. how much time is taken by text encoders, VAE encode / decode, first / other iterations for main denoising loop
Because currently printed information is non informative (about what happens in pipeline) and can be achieved by external benchmarking around generate method.
|
||
## benchmarking sample for text to image pipeline | ||
|
||
This `benchmark_text2image.cpp` sample script demonstrates how to benchmark the text to image pipeline. The script includes functionality for warm-up iterations, generating image, and calculating various performance metrics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we generalize this benchmark to support inpainting / image to image as well?
We can add an argument to specify pipeline type
m_decoder_request.infer(); | ||
infer_duration = ov::genai::PerfMetrics::get_microsec(std::chrono::steady_clock::now() - infer_start); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can external code (within Image generation pipeline) measure time of decode
function? W/ such approach we don't need extra output arguments for this method.
The same approach for all other models within this image_generation/models
folder.
private: | ||
std::shared_ptr<DiffusionPipeline> m_impl; | ||
ImageGenerationPerfMetrics m_perf_metrics; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this field should be hidden inside std::shared_ptr<DiffusionPipeline> m_impl;
@@ -477,6 +500,7 @@ class FluxPipeline : public DiffusionPipeline { | |||
std::shared_ptr<T5EncoderModel> m_t5_text_encoder = nullptr; | |||
std::shared_ptr<AutoencoderKL> m_vae = nullptr; | |||
ImageGenerationConfig m_custom_generation_config; | |||
ImageGenerationPerfMetrics m_perf_metrics; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it be moved to base DiffusionPipeline
class? All derived pipelines will inherit this field.
tickets: CVS-157338