From f0768a0076dc0ff93c6708ac55fcaa9850cb808e Mon Sep 17 00:00:00 2001
From: Angel Yang <angely45@meta.com>
Date: Mon, 2 Dec 2024 18:47:46 -0800
Subject: [PATCH] Fix false alarm of Adagrad (#2601)

Summary:
Pull Request resolved: https://github.com/pytorch/torchrec/pull/2601

Some non-Ads PyPer models use "Adagrad" for the sparse optimizer. However, this path is not supported. The reason for the it to work on current TTK sharder is because there's no sparse parameter at all. For Torchrec sharder to honor that, this diff re-examine this path and make the behavior on-par with TTK sharder.

Differential Revision: D65957418

fbshipit-source-id: 70368b8a072b46a139834a9928f0e5f8f06a7fe0
---
 torchrec/distributed/model_parallel.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/torchrec/distributed/model_parallel.py b/torchrec/distributed/model_parallel.py
index 8b917cce1..0f60362b6 100644
--- a/torchrec/distributed/model_parallel.py
+++ b/torchrec/distributed/model_parallel.py
@@ -598,7 +598,7 @@ def named_buffers(
             yield key, param
 
     @property
-    def fused_optimizer(self) -> KeyedOptimizer:
+    def fused_optimizer(self) -> CombinedOptimizer:
         return self._optim
 
     @property