CPUArch dispatching and unified shared library #113

Alexsandruss · 2025-04-23T11:07:39Z

Changes:

CPUID (x86_64), MSR (Linux-ARM) and brand string (MacOS-ARM) readers to get supported instructions
MicroArchEnvironment singleton to get optimal supported arch for the machine
Dispatching macros to instantiate required template class instances and dispatch to optimal branch
CMake config changes to enable dispatcher for all build targets
Rework of python binding for compilation of unified shared library

Current limitations:

x86_64 architectures are limited to server platforms listed in archspec
ARM support has basic implementation only (2 targets with simple checks per Linux and MacOS)

napetrov · 2025-04-23T22:18:32Z

/intelci

ibhati

Please add Inner product and L2 distances as well. Thanks!

ibhati · 2025-04-23T23:19:54Z

/intelci

dian-lun-lin · 2025-04-24T17:26:38Z

Thank you so much for the effort! How do we test if the correct supported instructions are called? Will we have a pipeline to test different architectures?

Alexsandruss · 2025-04-24T18:59:57Z

Thank you so much for the effort! How do we test if the correct supported instructions are called? Will we have a pipeline to test different architectures?

I'm planning to write a testing script which disassembles dispatched functions and validates usage of correct instructions.

ibhati · 2025-04-24T22:42:24Z

bindings/python/setup.py

+        # "icelake_server",
+        "sapphirerapids",
+        # "graniterapids",
+        # "graniterapids_d",


@napetrov I remember we had a discussion earlier on how many microarchs to build the Python package with as the time/size increase with each new entry here. Do we need all of them? Also, we need to keep them in sync with the private repository for building the shared library

Agree that we should be careful with adding those - we not necessary need to build entire library for codepath to utilize some of the features.

We can call for bf16 even from sse codepath - the only thing that compilation for particular target would do is compiler code optimization and i doubt that there is perf difference for generic code despite specific functionality usage -this is usually works out to be just run.

Also we would need older isa set - prior to broadwell for clean run on systems that doesn't support AVX2

include/svs/lib/arch.h

ibhati · 2025-04-25T22:16:09Z

Can you please run clang formatting with the below commands from the repository's root?

sudo apt install clang-format-15
bash tools/clang-format.sh clang-format-15

include/svs/lib/arch.h

include/svs/lib/cpuid.h

ibhati · 2025-04-25T23:03:42Z

cmake/options.cmake

@@ -146,7 +146,7 @@ endif()

 add_library(svs_native_options INTERFACE)
 add_library(svs::native_options ALIAS svs_native_options)
-target_compile_options(svs_native_options INTERFACE -march=native -mtune=native)
+target_compile_options(svs_native_options INTERFACE -DSVS_CPUARCH_NATIVE -march=native -mtune=native)


If I understand correctly, SVS_CPUARCH_NATIVE should be an option similar to the other options, such as SVS_NO_AVX512. By default, it can be set to OFF. In this case, it will compile for all x86_64 architectures if building on an x86 platform.

bindings/python/setup.py

ahuber21

Can you summarize how the dispatching will be used or configured? Maybe run me through an example where the library was built with all optimizations instantiated and is then executed on an M2 MacBook.

.github/workflows/test-dispatcher.yml

.github/scripts/print_cpu_info.sh

.github/workflows/test-dispatcher.yml

ahuber21 · 2025-05-08T07:17:33Z

include/svs/lib/arch.h

+class MicroArchEnvironment {
+  public:
+    static MicroArchEnvironment& get_instance() {
+        // TODO: ensure thread safety


How/when will this TODO be addressed?

ahuber21 · 2025-05-08T07:19:20Z

include/svs/lib/arch.h

+
+#if defined(__x86_64__)
+
+#define SVS_DISPATCH_CLASS_BY_MICROARCH(cls, method, args)                                 \


Is it possible to avoid these macros and replace them with templated functions?

ahuber21 · 2025-05-08T07:20:29Z

include/svs/lib/arch_defines.h

+ * limitations under the License.
+ */
+
+#define SVS_PACK_ARGS(...) __VA_ARGS__


Again, just wondering if this could be handled differently & with templates in modern C++?

This reverts commit 9eb7a4b.

…py corrections

ahuber21

I think the way of defining and retrieving available microarchs can be improved.

ahuber21 · 2025-05-12T09:51:07Z

include/svs/lib/arch.h

+        return supported_archs_;
+    }
+
+    const std::vector<MicroArch>& get_compiled_microarchs() const {


The (to be) compiled microarchs are known at compile time and should therefore be constexpr. The relevant bits here should be rewritten. This will make testing easier, as it should allow us to loop over the archs and test all available distance specializations.

ahuber21 · 2025-05-12T09:52:11Z

examples/cpp/microarch_info.cpp

+    out << std::endl;
+
+    // Print all compiled microarchitectures
+    const auto& compiled_archs = arch_env.get_compiled_microarchs();


As said in the other comments, this should be retrievable via

constexpr auto& compiled archs = /* whatever */;

ahuber21 · 2025-05-12T12:24:08Z

include/svs/core/distance/cosine.h

@@ -139,9 +141,17 @@ float compute(DistanceCosineSimilarity distance, std::span<Ea, Da> a, std::span<
    assert(a.size() == b.size());
    constexpr size_t extent = lib::extract_extent(Da, Db);
    if constexpr (extent == Dynamic) {
-        return CosineSimilarity::compute(a.data(), b.data(), distance.norm_, a.size());
+        SVS_DISPATCH_CLASS_BY_MICROARCH(


I really don't like the usage of preprocessor macros here. We lose the benefits of the type system here. compute is processed as a string literal and downstream you invoke cls<uarch>::method. That's very different from the style this library is written in. It's probably not trivial, but we should rewrite this and make use of the compiler as much as possible, i.e. provide templated classes. If you think this is not possible, please provide some justification for the implementation you chose.

The relevant piece of code that must be executed is svs::arch::MicroArchEnvironment::get_instance().get_microarch() and there must be better ways of bringing it into the plumbing.

What I particularly dislike is that we do have some compile-time dispatching in this very function (if constexpr (extent == Dynamic)), and the we clutter it with runtime dispatching. This will nullify all the benefits of the compile-time dispatching for extent.

napetrov requested review from ahuber21, ethanglaser, mihaic, dian-lun-lin and ibhati and removed request for ahuber21 and ethanglaser April 23, 2025 19:32

ibhati requested changes Apr 23, 2025

View reviewed changes

Alexsandruss marked this pull request as ready for review April 24, 2025 14:18

ibhati reviewed Apr 24, 2025

View reviewed changes

napetrov reviewed Apr 25, 2025

View reviewed changes

include/svs/lib/arch.h Outdated Show resolved Hide resolved

ibhati requested changes Apr 25, 2025

View reviewed changes

Alexsandruss force-pushed the dev/unified-so branch 2 times, most recently from d25c704 to b3e2ecc Compare May 6, 2025 07:34

ahuber21 requested changes May 8, 2025

View reviewed changes

Alexsandruss added 8 commits May 8, 2025 08:15

Initial implementation of CPUArch dispatcher and unified shared library

9378b05

Fix cpuid header

d812859

Remove dynamic _svs_* loading

08a272e

Remove tests for _svs_* loader

c7343aa

TEMP: enable cmake verbosity options

56a84e7

Correct instantiation macros workflow

6cecfed

Extend support to L2 and IP distances

edd5c9b

Revert "TEMP: enable cmake verbosity options"

5ae4861

This reverts commit 9eb7a4b.

yuejiaointel and others added 16 commits May 8, 2025 08:15

fix: add compielrs and os to test dispatcher

6360464

fix: test x86 only

2d594f5

fix: test x86 only

bb397db

fix: use actual os name instead of self hosted runners

936635a

fix: checkout erros

37184df

fix: checkout erros

a54d187

fix: add comiler matric for x86 targets as well

b3ac750

add: add cpu name and avx support info print

1095355

fix: fix cpu info print for arm and macos

449d49c

fix: format

687a5fd

Add microarch_info example

aa5bf2b

Merge dispatcher testing with SDE into build-linux pipeline

bd4bcef

Fix typo in pipeline

658c770

Add describe static function to svs.microarchand fix pipelines

d520b37

Remove merged dispatcher testing; macos pipeline fix; test_microarch.…

f3f076b

…py corrections

Fix for SDE

d07fead

Alexsandruss force-pushed the dev/unified-so branch from 6ec0eb9 to d07fead Compare May 8, 2025 15:16

meiravgri mentioned this pull request May 9, 2025

remove flags RedisAI/VectorSimilarity#671

Merged

ahuber21 requested changes May 12, 2025

View reviewed changes

ahuber21 reviewed May 12, 2025

View reviewed changes

Alexsandruss added 10 commits May 12, 2025 08:23

Replace x86_64 base uarch (nehalem) with x86_64_v2

e9ee015

Fix for base uarch flags

ee2309c

Exp.: change base uarch to sandybridge

11c6c09

Exp.: change base uarch to haswell

9010819

Revert x86_64 base uarch changes

5d639a5

Change template args for DistanceImpl

ce48782

Extend macros to correct linking

b1a51c5

Linting

74dc321

Fix for ARM platforms and correct naming

aea0390

Fix for ARM platforms

e539cac

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CPUArch dispatching and unified shared library #113

CPUArch dispatching and unified shared library #113

Alexsandruss commented Apr 23, 2025 •

edited

Loading

napetrov commented Apr 23, 2025

ibhati left a comment

ibhati commented Apr 23, 2025

dian-lun-lin commented Apr 24, 2025

Alexsandruss commented Apr 24, 2025

ibhati Apr 24, 2025

napetrov Apr 25, 2025

ibhati commented Apr 25, 2025

ibhati Apr 25, 2025

ahuber21 left a comment

ahuber21 May 8, 2025

ahuber21 May 8, 2025

ahuber21 May 8, 2025

ahuber21 left a comment

ahuber21 May 12, 2025

ahuber21 May 12, 2025

ahuber21 May 12, 2025


		#if defined(__x86_64__)

		#define SVS_DISPATCH_CLASS_BY_MICROARCH(cls, method, args) \

CPUArch dispatching and unified shared library #113

Are you sure you want to change the base?

CPUArch dispatching and unified shared library #113

Conversation

Alexsandruss commented Apr 23, 2025 • edited Loading

napetrov commented Apr 23, 2025

ibhati left a comment

Choose a reason for hiding this comment

ibhati commented Apr 23, 2025

dian-lun-lin commented Apr 24, 2025

Alexsandruss commented Apr 24, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ibhati commented Apr 25, 2025

Choose a reason for hiding this comment

ahuber21 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahuber21 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Alexsandruss commented Apr 23, 2025 •

edited

Loading