Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reserving CPU resource in CPU inference #27321

Merged

Conversation

sunxiaoxia2022
Copy link
Contributor

@sunxiaoxia2022 sunxiaoxia2022 commented Oct 30, 2024

Details:

  • Add property ov::hint::enable_cpu_reservation to reserve CPU resource in CPU inference
  • ov::hint::enable_cpu_reservation defaults to false, user can explicitly set it to true to enable CPU reservation.
  • update proc_type_table before stream scheduling in compile_model()

Tickets:

@sunxiaoxia2022 sunxiaoxia2022 requested review from a team as code owners October 30, 2024 01:49
@github-actions github-actions bot added category: inference OpenVINO Runtime library - Inference category: GPU OpenVINO GPU plugin category: CPU OpenVINO CPU plugin category: CPP API OpenVINO CPP API bindings labels Oct 30, 2024
}
}
if (get_property(ov::hint::enable_cpu_reservation) && !get_property(ov::hint::enable_cpu_pinning)) {
set_property(ov::hint::enable_cpu_pinning(true));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need this logic here? User can set enable_cpu_reservation to true and enable_cpu_pinning to false.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if enable_cpu_reservation is true, enable_cpu_pinning must be true. Because reserving cpus means pinning tasks to fixed cpus firstly, then reserving these cpus.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User still can use enable_cpu_reservation true with enable_cpu_pinning false together. Then the next model only can use remained CPU cores for latency/throughput hit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, done.

@sunxiaoxia2022
Copy link
Contributor Author

@wangleis Could you please clarify why we need #28117 to be part of this PR?

@dmitry-gorokhov There is two changes in #28117 for CPU reservation.

  1. OV RT will identify the numa node of app thread and update proc_type_table to create inference stream on same numa node. In CPU reservation use case, user may load different models on different threads. So move identifying and updating work from loading OV core stage to compiling model stage.
  2. On multiple sockets platform, if same app thread load two models on same socket with latency hint, the first model will reserve all CPU core of this socket. The logic of master branch will continue loading second model to same socket and loading fail due to no CPU resource. Update proc_type_table in compile_model() #28117 will fix this issue and load second model to another socket which has CPU resource.

I am wondering how parallel model compilation will work in that case? Like if both compilation processes tries to update proc_table. Given proc_table is singleton how do we guarantee thread safity?

@dmitry-gorokhov I added _streams_executor_mutex to guarantee the thread safety when multiple models compilation are in progress at the same time. Please have a look.

@sunxiaoxia2022
Copy link
Contributor Author

We need to add some behavior tests that will cover cpu_reservation logic. Like we should validate that ov throws error if no available CPUs left during compilation. We also need to implement basic test that compiles to model with cpu_reservation=true and specific nthreads, and then check that compiled model parameters (like nthreads and cpu_reservation) via get_property API.

@dmitry-gorokhov These test cases are added in streams_e2e_test.cpp, please have a look, thank you!

@dmitry-gorokhov
Copy link
Contributor

@wangleis Could you please clarify why we need #28117 to be part of this PR?

@dmitry-gorokhov There is two changes in #28117 for CPU reservation.

  1. OV RT will identify the numa node of app thread and update proc_type_table to create inference stream on same numa node. In CPU reservation use case, user may load different models on different threads. So move identifying and updating work from loading OV core stage to compiling model stage.
  2. On multiple sockets platform, if same app thread load two models on same socket with latency hint, the first model will reserve all CPU core of this socket. The logic of master branch will continue loading second model to same socket and loading fail due to no CPU resource. Update proc_type_table in compile_model() #28117 will fix this issue and load second model to another socket which has CPU resource.

I am wondering how parallel model compilation will work in that case? Like if both compilation processes tries to update proc_table. Given proc_table is singleton how do we guarantee thread safity?

@dmitry-gorokhov I added _streams_executor_mutex to guarantee the thread safety when multiple models compilation are in progress at the same time. Please have a look.

get_proc_type_table() is called in several places, not only in get_num_streams(). So how the mutex here actually helps?
Like how we can guarantee correct behavior of IStreamsExecutor::Config::get_default_num_streams() in case proc_type_table is beign updated concurantely?

@dmitry-gorokhov
Copy link
Contributor

We need to add some behavior tests that will cover cpu_reservation logic. Like we should validate that ov throws error if no available CPUs left during compilation. We also need to implement basic test that compiles to model with cpu_reservation=true and specific nthreads, and then check that compiled model parameters (like nthreads and cpu_reservation) via get_property API.

@dmitry-gorokhov These test cases are added in streams_e2e_test.cpp, please have a look, thank you!

streams_e2e_test.cpp contains unit tests for proc_type_table. I was talking about behavior tests that validate whole plugin behavior.

@sunxiaoxia2022
Copy link
Contributor Author

We need to add some behavior tests that will cover cpu_reservation logic. Like we should validate that ov throws error if no available CPUs left during compilation. We also need to implement basic test that compiles to model with cpu_reservation=true and specific nthreads, and then check that compiled model parameters (like nthreads and cpu_reservation) via get_property API.

@dmitry-gorokhov These test cases are added in streams_e2e_test.cpp, please have a look, thank you!

streams_e2e_test.cpp contains unit tests for proc_type_table. I was talking about behavior tests that validate whole plugin behavior.

@dmitry-gorokhov Ok, added test case in src/plugins/intel_cpu/tests/functional/custom/behavior/ov_executable_network/properties.cpp. Please have a look.

@sunxiaoxia2022 sunxiaoxia2022 requested review from a team as code owners January 13, 2025 09:51
@github-actions github-actions bot added the category: IE Tests OpenVINO Test: plugins and common label Jan 13, 2025
@github-actions github-actions bot removed the category: IE Tests OpenVINO Test: plugins and common label Jan 14, 2025
@wangleis wangleis enabled auto-merge January 16, 2025 09:34
@wangleis wangleis added this pull request to the merge queue Jan 16, 2025
Merged via the queue into openvinotoolkit:master with commit c849f72 Jan 16, 2025
185 checks passed
@wangleis wangleis deleted the xiaoxia/cpu_reservation branch January 16, 2025 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: CPP API OpenVINO CPP API bindings category: CPU OpenVINO CPU plugin category: docs OpenVINO documentation category: GPU OpenVINO GPU plugin category: inference OpenVINO Runtime library - Inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants