Develop pre-processor for naive SQ8 [MOD-9238] #688

dor-forer · 2025-05-29T14:13:14Z

Describe the changes in the pull request

Change the preprocess call to use separate variables for storage_blob_size and query_blob_size, allowing the option of different vector sizes for each (like in the quantization preprocessor).
Added the QuantPreprocessor, which receives a dim in the constructor and calculates the storage_bytes_count.
The QuantPreprocessor takes a vector of float32 with size dim, and converts it to a uint8 vector with the following structure:
[dim x uint8, min_value, detla]
With a storage_bytes_count of: dim * sizeof(uint8_t) + 2 * sizeof(float).
In the preprocess function, there are 3 main cases:
a. Allocate storage_blob if it was never allocated.
b. If storage_blob and query_blob share the same memory, allocate new memory for storage_blob.
c. If storage_blob was allocated and they don't share memory, check if enough memory was allocated for the new vector, if not, reallocate.
Added tests to test the following scenarios:
QuantizationTest:
Tests only one preprocessor of type QuantPreprocessor, checks that the preprocessor works as expected.
QuantizationTestWithCosine:
Tests QuantPreprocessor after normalization using the CosinePreprocessor.
ReallocateVectorQuantizationTest:
Tests the scenario of QuantPreprocessor handling memory that was already allocated, and needs to be reallocated.
(using DummyStoragePreprocessor to allocate the storage_blob memory withouth allocating the query_blob.
ReallocateVectorCosineQuantizationTest:
Tests the relocating scenario in the case where query_blob and storage_blob share memory.
QuantizationInPlaceTest:
Tests the case where enough memory was already allocated, and needs to use the quantization in place.

Which issues this PR fixes
1. MOD-9238

Main objects this PR modified
1.Preprocessors
2. ...

Mark if applicable

This PR introduces API changes
This PR introduces serialization changes

…er file add logic to determine the cosine processed_bytes_count add processed_bytes_count to the cosine component change blob_size param to the original blob size instead of processed_bytes_count in all preprocessing functions.

…_size

add DummyChangeAllocSizePreprocessor class to test pp that changes the size add tests one will fail because the input blob size is not updated

…tes_count add valgrind macro value_to_add of DummyQueryPreprocessor and DummyMixedPreprocessor is now DataType instead of int

…to dorer-preprocessor-SQ8

codecov · 2025-05-29T15:55:15Z

Codecov Report

Attention: Patch coverage is 74.35897% with 20 lines in your changes missing coverage. Please review.

Project coverage is 96.00%. Comparing base (5ba2efa) to head (a8aee99).

Files with missing lines	Patch %	Lines
src/VecSim/spaces/computer/preprocessors.h	72.97%	20 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #688      +/-   ##
==========================================
- Coverage   96.29%   96.00%   -0.29%     
==========================================
  Files         111      111              
  Lines        6288     6365      +77     
==========================================
+ Hits         6055     6111      +56     
- Misses        233      254      +21

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull Request Overview

Adds support for dual-size preprocessing of storage and query blobs across existing preprocessors and updates tests to cover quantization and dummy preprocessor scenarios.
Key changes:

Introduce a new two-size preprocess overload in PreprocessorInterface and implement it in all preprocessors.
Update MultiPreprocessorsContainer to invoke the two-size overload and track separate storage/query sizes.
Add extensive unit tests for quantization and dummy preprocessors covering both single- and multi-preprocessor chains.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File	Description
tests/unit/test_hnsw_tiered.cpp	Add two-size `preprocess` override for `PreprocessorDoubleValue`.
tests/unit/test_components.cpp	Implement two-size overloads for dummy preprocessors and add tests.
src/VecSim/spaces/computer/preprocessors.h	Declare and implement two-size `preprocess` in interface and classes
src/VecSim/spaces/computer/preprocessor_container.h	Update container to call two-size `preprocess` and track sizes.

Comments suppressed due to low confidence (1)

src/VecSim/spaces/computer/preprocessor_container.h:175

Typo in comment: Sepreated should be Separate.

// Sepreated variables for the storage blob size and query_blob_size,

Copilot · 2025-06-03T14:22:20Z

tests/unit/test_hnsw_tiered.cpp

+        // i.e., both blobs are of the same size.
+        // In order to use different sizes, the preprocessor should be modified.
+        assert(storage_blob_size == query_blob_size);
+        preprocess(original_blob, storage_blob, query_blob, storage_blob_size, alignment);


After calling the single-size overload, query_blob_size is never updated. You should assign query_blob_size = storage_blob_size to keep sizes in sync.

Suggested change

preprocess(original_blob, storage_blob, query_blob, storage_blob_size, alignment);

preprocess(original_blob, storage_blob, query_blob, storage_blob_size, alignment);

query_blob_size = storage_blob_size;

I see that storage_blob is never updated. So does it affect it?

tests/unit/test_components.cpp

src/VecSim/spaces/computer/preprocessors.h

…to dorer-preprocessor-SQ8

meiravgri and others added 17 commits May 8, 2025 04:52

plan for the tests

cc4281a

Merge remote-tracking branch 'origin/main' into meiravg_fix_blob_copy…

74885a3

…_size

rename original_blob_size-> input_blob_size

86a44a9

add DummyChangeAllocSizePreprocessor class to test pp that changes the size add tests one will fail because the input blob size is not updated

preprocessors now change the blob size

3e15e76

fix test

1863722

fix tiered test

55837ba

add assert storage_blob == nullptr || input_blob_size == processed_by…

b1699ad

…tes_count add valgrind macro value_to_add of DummyQueryPreprocessor and DummyMixedPreprocessor is now DataType instead of int

enable assert only in debug

6dc543d

use constexpr for blob size

3e673b7

small docs changes

8967d40

review fixes

674b136

ש

d529f5e

notes and changes

af11142

Merge branch 'main' of https://github.com/RedisAI/VectorSimilarity in…

5461b97

…to dorer-preprocessor-SQ8

Added tests and changes to the PP

eacd40f

frmat

adec86b

Fix and add tests

31a0c7d

dor-forer requested a review from Copilot June 3, 2025 14:15

Copilot AI reviewed Jun 3, 2025

View reviewed changes

dor-forer requested a review from meiravgri June 3, 2025 15:01

dor-forer added 7 commits June 4, 2025 10:16

added tests for coverege

59fb16d

format

866d8cb

Remove the tests

ec4a3a7

Fix test

985c2c8

change to input output type

b7aeb2d

Merge branch 'main' of https://github.com/RedisAI/VectorSimilarity in…

b1fad81

…to dorer-preprocessor-SQ8

format

a8aee99

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Develop pre-processor for naive SQ8 [MOD-9238] #688

Develop pre-processor for naive SQ8 [MOD-9238] #688

Uh oh!

dor-forer commented May 29, 2025 •

edited

Loading

Uh oh!

codecov bot commented May 29, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jun 3, 2025

Uh oh!

dor-forer Jun 4, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

	preprocess(original_blob, storage_blob, query_blob, storage_blob_size, alignment);
	preprocess(original_blob, storage_blob, query_blob, storage_blob_size, alignment);
	query_blob_size = storage_blob_size;

Develop pre-processor for naive SQ8 [MOD-9238] #688

Are you sure you want to change the base?

Develop pre-processor for naive SQ8 [MOD-9238] #688

Uh oh!

Conversation

dor-forer commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

dor-forer Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dor-forer commented May 29, 2025 •

edited

Loading

codecov bot commented May 29, 2025 •

edited

Loading