-
Notifications
You must be signed in to change notification settings - Fork 27
CPUArch dispatching and unified shared library #113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
/intelci |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add Inner product and L2 distances as well. Thanks!
/intelci |
Thank you so much for the effort! How do we test if the correct supported instructions are called? Will we have a pipeline to test different architectures? |
I'm planning to write a testing script which disassembles dispatched functions and validates usage of correct instructions. |
bindings/python/setup.py
Outdated
# "icelake_server", | ||
"sapphirerapids", | ||
# "graniterapids", | ||
# "graniterapids_d", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@napetrov I remember we had a discussion earlier on how many microarchs to build the Python package with as the time/size increase with each new entry here. Do we need all of them? Also, we need to keep them in sync with the private repository for building the shared library
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that we should be careful with adding those - we not necessary need to build entire library for codepath to utilize some of the features.
We can call for bf16 even from sse codepath - the only thing that compilation for particular target would do is compiler code optimization and i doubt that there is perf difference for generic code despite specific functionality usage -this is usually works out to be just run.
Also we would need older isa set - prior to broadwell for clean run on systems that doesn't support AVX2
Can you please run clang formatting with the below commands from the repository's root?
|
cmake/options.cmake
Outdated
@@ -146,7 +146,7 @@ endif() | |||
|
|||
add_library(svs_native_options INTERFACE) | |||
add_library(svs::native_options ALIAS svs_native_options) | |||
target_compile_options(svs_native_options INTERFACE -march=native -mtune=native) | |||
target_compile_options(svs_native_options INTERFACE -DSVS_CPUARCH_NATIVE -march=native -mtune=native) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, SVS_CPUARCH_NATIVE
should be an option similar to the other options, such as SVS_NO_AVX512
. By default, it can be set to OFF
. In this case, it will compile for all x86_64
architectures if building on an x86 platform.
d25c704
to
b3e2ecc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you summarize how the dispatching will be used or configured? Maybe run me through an example where the library was built with all optimizations instantiated and is then executed on an M2 MacBook.
class MicroArchEnvironment { | ||
public: | ||
static MicroArchEnvironment& get_instance() { | ||
// TODO: ensure thread safety |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How/when will this TODO be addressed?
|
||
#if defined(__x86_64__) | ||
|
||
#define SVS_DISPATCH_CLASS_BY_MICROARCH(cls, method, args) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to avoid these macros and replace them with templated functions?
* limitations under the License. | ||
*/ | ||
|
||
#define SVS_PACK_ARGS(...) __VA_ARGS__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, just wondering if this could be handled differently & with templates in modern C++?
This reverts commit 9eb7a4b.
6ec0eb9
to
d07fead
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the way of defining and retrieving available microarchs can be improved.
return supported_archs_; | ||
} | ||
|
||
const std::vector<MicroArch>& get_compiled_microarchs() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The (to be) compiled microarchs are known at compile time and should therefore be constexpr
. The relevant bits here should be rewritten. This will make testing easier, as it should allow us to loop over the archs and test all available distance specializations.
out << std::endl; | ||
|
||
// Print all compiled microarchitectures | ||
const auto& compiled_archs = arch_env.get_compiled_microarchs(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As said in the other comments, this should be retrievable via
constexpr auto& compiled archs = /* whatever */;
@@ -139,9 +141,17 @@ float compute(DistanceCosineSimilarity distance, std::span<Ea, Da> a, std::span< | |||
assert(a.size() == b.size()); | |||
constexpr size_t extent = lib::extract_extent(Da, Db); | |||
if constexpr (extent == Dynamic) { | |||
return CosineSimilarity::compute(a.data(), b.data(), distance.norm_, a.size()); | |||
SVS_DISPATCH_CLASS_BY_MICROARCH( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't like the usage of preprocessor macros here. We lose the benefits of the type system here. compute
is processed as a string literal and downstream you invoke cls<uarch>::method
. That's very different from the style this library is written in. It's probably not trivial, but we should rewrite this and make use of the compiler as much as possible, i.e. provide templated classes. If you think this is not possible, please provide some justification for the implementation you chose.
The relevant piece of code that must be executed is svs::arch::MicroArchEnvironment::get_instance().get_microarch()
and there must be better ways of bringing it into the plumbing.
What I particularly dislike is that we do have some compile-time dispatching in this very function (if constexpr (extent == Dynamic)
), and the we clutter it with runtime dispatching. This will nullify all the benefits of the compile-time dispatching for extent.
Changes:
MicroArchEnvironment
singleton to get optimal supported arch for the machineCurrent limitations: