forked from pytorch/torchrec
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add feature pools to torchrec OSS (pytorch#2126)
Summary: Pull Request resolved: pytorch#2126 We are open sourcing new TorchRec modules for fast, scalable and efficient indexing of tensors: TensorPool and KeyedJaggedTensorPool for dense and sparse tensors respectively. The proposed modules provide abstractions for reading and writing large tensor and KeyedJaggedTensor values, with support for sharding and flexible data emplacement (e.g. HBM, UVM, CPU, etc). They expose APIs to update and look up values based on arbitrary indices, and support sharding to distribute the tensors across multiple devices, abstracting away the collective communications for distributed lookup and updates. # Motivation When working with recommender systems, there is often a need to transform or augment the model’s feature inputs in various ways. For example, when training retrieval/candidate generation models, it is common to extend the training data with negative samples. In the context of video recommendation, negative samples might be the IDs of videos that the user did not click on (i.e. **hard negative samples**). Retrieval models are then trained to produce a list of positive samples as candidates for further ranking downstream. In such cases, it may not be practical to store all the necessary features in the batched data. For example, during candidate generation, extracting features for a large corpus of candidate items may be prohibitively expensive. Instead, auxiliary features can be stored in memory and indexed to efficiently lookup features when needed to augment the given samples during training or inference. These modules can also be used to implement a distributed cache for embeddings that supports index-based lookup and updates. Note: this is joint work from various technical contributors: xing-liu strisunshinewentingwang murphymatt Michael-JY-He YLGH jiayisuse gnahzg yanxia hongweitian SeanXiaohengMao cz171 Reviewed By: joshuadeng, gnahzg Differential Revision: D58355479
- Loading branch information
1 parent
a9a5c06
commit e9b1bc2
Showing
21 changed files
with
6,158 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.