Fix false alarm of Adagrad (pytorch#2601)

Summary: Pull Request resolved: pytorch#2601 Some non-Ads PyPer models use "Adagrad" for the sparse optimizer. However, this path is not supported. The reason for the it to work on current TTK sharder is because there's no sparse parameter at all. For Torchrec sharder to honor that, this diff re-examine this path and make the behavior on-par with TTK sharder. Differential Revision: D65957418 fbshipit-source-id: 70368b8a072b46a139834a9928f0e5f8f06a7fe0
duduyi2013 · Dec 3, 2024 · f0768a0 · f0768a0
1 parent 99162b5
commit f0768a0
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/torchrec/distributed/model_parallel.py b/torchrec/distributed/model_parallel.py
@@ -598,7 +598,7 @@ def named_buffers(
             yield key, param
 
     @property
-    def fused_optimizer(self) -> KeyedOptimizer:
+    def fused_optimizer(self) -> CombinedOptimizer:
         return self._optim
 
     @property