-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about MultipleNegativesRankingLoss and gradient accumulation steps #2916
Comments
Hello! Great question! It's the latter, only the negatives from the samples in the current batch, i.e. For that, I would recommend using the Cached losses, such as CachedMultipleNegativesRankingLoss. In short, this loss is equivalent to MultipleNegativesRankingLoss, but cleverly uses caches and mini-batches to reach very high
|
Thank you for the fast answer! |
Last question, how does BatchSamplers.NO_DUPLICATES work with gradient accumulation steps? |
The "no duplicates" works on a per-batch level, so with e.g. a per_device_train_batch_size of 16 and a gradient accumulation steps of 4, then you'll get 4 batches per loss propagation where each batch does not have duplicate samples in them. With other words, no issues due to duplicates. There's no "cross-batch communication" when doing gradient accumulation other than that the losses from each batch get added together. If you instead use CachedMNRL with no duplicates with e.g. a per_device_train_batch_size of 64 and a mini-batch size of 16, then you will get just 1 batch per loss propagation. Duplicates are also avoided in this batch, so there's no issues here either. For context for those who don't know why not having "no duplicates" can be problematic for in-batch negative losses: if you have e.g. question-answer pairs, and answer Y for an unrelated question Y is the same as answer X for question X, then that answer will both be considered a positive and a negative, negating the usefulness of this sample. Does that clear it up?
|
Yes. This raises another question, does the "no duplicates" checks for repeated anchors or positives? |
And suppose i use per_device_train_batch_size= size of training data. Will the "no duplicates" delete duplicates or divide the batch_size into N batches where there are no duplicates in each batch? |
Sorry for the question spam. If we use triplets instead of anchor-positive pairs, does the following still happen?
|
Hello! The following code section ensures that there are no duplicates among anchor, positive, and negative: sentence-transformers/sentence_transformers/sampler.py Lines 146 to 151 in 0a32ec8
When using anchor, positive, and negative instead of anchor-positive pairs, sample_values would be To illustrate with a specific example, in the following case, batch_values = {"anchor1", "positive1", "negative1", "anchor2", "positive2", "negative2"}
sample_values = {"anchor3", "positive1", "negative3"} In this way, it guarantees that there are no duplicates for all of anchor, positive, and negative samples. Therefore, I believe the answer to the following question would be Yes:
I also think the answer to the following question would be Yes:
|
How does the MultipleNegativesRankingLoss function when used with gradient accumulation steps?
According to the docs
Are the negatives from other steps used (during accumulation), or are only the negatives from the samples in the current batch (per_device_train_batch_size) used?
The text was updated successfully, but these errors were encountered: