Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xrt-smi validate test for spatial-sharing overhead (VITIS-13000) #8395

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

aktondak
Copy link
Collaborator

@aktondak aktondak commented Sep 5, 2024

Problem solved by the commit

VITIS-13000 : Multi-application spatial scheduling overhead
New micro-benchmark test for xrt-smi validate :

Bug / issue (if any) fixed, which PR introduced the bug, how it was discovered

New Enhancement.

How problem was solved, alternative solutions (if any) and why they were rejected

The enhancement was made via a 2-phase test comparison between single threaded single hardware context run and a multi-threaded spatially shared hardware contexts to calculate time overhead. More details : https://confluence.amd.com/display/~aktondak/spatial+sharing+overhead+via+xrt-smi+validate

Z:\Repos\XRT-MCDM-FORK\XRT-MCDM\build\WRelease\xilinx\xrt>xrt-smi validate --run spatial-sharing-overhead
Validate Device           : [00c5:00:01.1]
    Platform              : NPU
    Power Mode            : Default
-------------------------------------------------------------------------------
Verbose: Enabling Verbosity
Test 1 [00c5:00:01.1]     : spatial-sharing-overhead
    Description           : Run Spatial Sharing Overhead Test
    Xclbin                : C:\WINDOWS\System32\DriverStore\FileRepository\ipukmddrv.inf_amd64_eba19bf0f9f62474\validate_17f0_20.xclbin
    Details               : Kernel name is 'DPU_PDI_0'
                            Single context latency: '198.3' ms
                            Spatially shared multiple context latency: '225.1' ms
                            Overhead: '26.9' ms
    Test Status           : [PASSED]

Risks (if any) associated the changes in the commit

What has been tested and how, request additional testing if necessary

Needs further testing from DSV.

Documentation impact (if any)

https://confluence.amd.com/display/~aktondak/spatial+sharing+overhead+via+xrt-smi+validate

@gbuildx
Copy link
Collaborator

gbuildx commented Sep 5, 2024

Can one of the admins verify this patch?

Copy link
Collaborator

@AShivangi AShivangi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation looks good, just a few restructuring and minor comments!

@aktondak aktondak marked this pull request as ready for review September 11, 2024 23:09
@aktondak aktondak changed the title Spatial-sharing overhead implementation xrt-smi validate test for spatial-sharing overhead (VITIS-13000) Sep 12, 2024
Copy link
Member

@sonals sonals left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document the overall design and add comments to the key functions to explain the control flow logic.

@aktondak
Copy link
Collaborator Author

Please document the overall design and add comments to the key functions to explain the control flow logic.

Updated the PR with API level comments.
The Documentation for overall design is at https://confluence.amd.com/display/~aktondak/spatial+sharing+overhead+via+xrt-smi+validate

Copy link
Collaborator

@AShivangi AShivangi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Minor comments.
Thanks for the verbose documentation :)

@@ -0,0 +1,100 @@
// SPDX-License-Identifier: Apache-2.0
// Copyright (C) 2023-2024 Advanced Micro Devices, Inc. All rights reserved.

This comment was marked as resolved.

@@ -0,0 +1,55 @@
// SPDX-License-Identifier: Apache-2.0
// Copyright (C) 2023-2024 Advanced Micro Devices, Inc. All rights reserved.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Copyright (C) 2023-2024 Advanced Micro Devices, Inc. All rights reserved.
// Copyright (C) 2024 Advanced Micro Devices, Inc. All rights reserved.

src/runtime_src/core/tools/common/tests/TestHelper.h Outdated Show resolved Hide resolved
Akshay Tondak added 2 commits September 17, 2024 12:15
Signed-off-by: Akshay Tondak <[email protected]>
Signed-off-by: Akshay Tondak <[email protected]>
…mbda for thread signalling

Signed-off-by: Akshay Tondak <[email protected]>
@aktondak
Copy link
Collaborator Author

Please document the overall design and add comments to the key functions to explain the control flow logic.

@sonals Based on our offline sync, the updated patch moves thread synchronization variables to the TestSpatialSharing class and avoids passing them around with help from thread_ready_to_run lambda. Cleaner this way.
I plan to re-use the TestCase class for Temporal sharing test as well and I'll explore if the thread synchronization can be avoided completely with that code change.

bo_param = xrt::bo(device, buffer_size, XRT_BO_FLAGS_HOST_ONLY, kernel.group_id(2));
bo_ofm = xrt::bo(device, buffer_size, XRT_BO_FLAGS_HOST_ONLY, kernel.group_id(3));
bo_inter = xrt::bo(device, buffer_size, XRT_BO_FLAGS_HOST_ONLY, kernel.group_id(4));
bo_mc = xrt::bo(device, buffer_size, XRT_BO_FLAGS_HOST_ONLY, kernel.group_id(7));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use member initializer list to construct the bo objects

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea !
I have updated the latest patch with this change.
Thanks

bo_param (xrt::bo(device, buffer_size, XRT_BO_FLAGS_HOST_ONLY, kernel.group_id(2))),
bo_ofm (xrt::bo(device, buffer_size, XRT_BO_FLAGS_HOST_ONLY, kernel.group_id(3))),
bo_inter (xrt::bo(device, buffer_size, XRT_BO_FLAGS_HOST_ONLY, kernel.group_id(4))),
bo_mc (xrt::bo(device, buffer_size, XRT_BO_FLAGS_HOST_ONLY, kernel.group_id(7)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are first creating a temporary xrt::bo() and using to initialize the member class xrt::bo()
Please directly invoke the xrt::bo() ctor.

run_list[cnt].wait2();
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should move the for loop at line 80 into a thread function that you kick off as part of thread constructor in TestSpatialSharingOvd::run(). The timer can be started just before the threads are constructed.

You shouldn't need thread_ready_to_run();

TestCase(xrt::xclbin& xclbin, std::string& kernel, xrt::device& device)
: device(device), xclbin(xclbin), kernel_name(kernel), buffer_size(1024), itr_count(1000)
{
hw_ctx = xrt::hw_context(device, xclbin.get_uuid());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

member initializer list please.


public:
// Constructor to initialize the test case with xclbin and kernel name with hardware context creation
TestCase(xrt::xclbin& xclbin, std::string& kernel, xrt::device& device)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should some of the args be const ref?


// Wait for both threads to be ready to begin clocking
wait_for_threads_ready((int)threads.size());

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed.


/* Run 2 */
// Create a single test case and run it in a single thread
TestCase t(xclbin, kernelName, working_dev);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For run 2 use the same approach but call the loop with only one entry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants