-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-uniform vector quantization #374
base: main
Are you sure you want to change the base?
Conversation
# Conflicts: # jvector-examples/src/main/java/io/github/jbellis/jvector/example/distancesNVQ.java
# Conflicts: # jvector-base/src/main/java/io/github/jbellis/jvector/vector/VectorUtil.java # jvector-base/src/main/java/io/github/jbellis/jvector/vector/VectorUtilSupport.java # jvector-native/src/main/java/io/github/jbellis/jvector/vector/NativeVectorUtilSupport.java # jvector-twenty/src/main/java/io/github/jbellis/jvector/vector/PanamaVectorUtilSupport.java # jvector-twenty/src/main/java/io/github/jbellis/jvector/vector/SimdOps.java
…or debugging but is not truly exercized now. replace a couple of FMA patterns in the tail computations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor nits/questions. Overall, looks quite good.
var function = scorer.scoreFunctionFor(queryVector, vsf); | ||
|
||
return new ScoreFunction.ExactScoreFunction() { | ||
private final QuantizedVector scratch = NVQuantization.QuantizedVector.createEmpty(nvq.subvectorSizesAndOffsets, nvq.bitsPerDimension); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In most other places like this, we use a reusable thread-local scratch. I'm not sure if it's worth it, so consider this a possible deferred optimization.
@@ -324,6 +324,8 @@ public OnDiskGraphIndexWriter build() throws IOException { | |||
int dimension; | |||
if (features.containsKey(FeatureId.INLINE_VECTORS)) { | |||
dimension = ((InlineVectors) features.get(FeatureId.INLINE_VECTORS)).dimension(); | |||
} else if (features.containsKey(FeatureId.NVQ_VECTORS)) { | |||
dimension = ((NVQ) features.get(FeatureId.NVQ_VECTORS)).dimension(); | |||
} else { | |||
throw new IllegalArgumentException("Inline vectors must be provided."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should update this error message to indicate that either inline vectors or NVQ vectors much be provided.
*/ | ||
public LossFunction(int nDims) { | ||
if (nDims <= 0) { | ||
throw new IllegalArgumentException("The standard deviation initSigma must be positive"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bad error message copy/paste
minBounds = new float[nDims]; | ||
maxBounds = new float[nDims]; | ||
for (int d = 0; d < nDims; d++) { | ||
minBounds[d] = Float.NEGATIVE_INFINITY; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use Arrays.fill here
*/ | ||
public class NESOptimizer { | ||
public enum Distribution { | ||
MULTINORMAL, // This is for adding future support for the multinormal case (Algorithm 5 in [1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we think we'll get back to MULTINORMAL relatively soon? if not, is this worth cleaning up for now?
float[] nvqCosine8bit(VectorFloat<?> vector, ByteSequence<?> bytes, float growthRate, float midpoint, float minValue, float maxValue, VectorFloat<?> centroid); | ||
|
||
/** | ||
* When using 4-bit NVQ quantization and vector instructions, it is easier to unpack all even entries, and then all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the reference to 4-bit quantization in 8bit javadoc?
} | ||
} | ||
|
||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
introducing the new variable scopes here is perfectly fine, but IMO, would also be fine to put this all in one scope and re-use the variables. There are definitely codebases where this would be more idiomatic, but I can't think of other places we do this in JVector. Just thinking out loud -- don't feel like this has to change.
public void testSaveLoadNVQ() throws Exception { | ||
|
||
int[][] testsConfigAndResults = { | ||
//Tuples of: nDimensions, nSubvectors, number of bots per dimension, and the expected number of bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
number of bots per dimension can be removed here (and also, typo, but it will get removed anyway)
// NVQ quantization instructions start here | ||
//--------------------------------------------- | ||
|
||
static FloatVector const1f = FloatVector.broadcast(FloatVector.SPECIES_PREFERRED, 1.f); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same static final discussion here as in VectorSimdOps
|
||
/** | ||
* Quantize a subvector as an 8-bit quantized subvector. | ||
* All values of the vector must be in [0, 1]. For example, the input vector has been |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused by All values of the vector...
and then the references to bias/scale in this/nvqLoss/nvqUniformLoss. Are those remnants of a previous approach?
This PR adds a new feature that uses quantized vectors for re-ranking. The quantization is computed on subvectors of each vector in the index, using a non-uniform quantizer.