Glove cpu #57

boranhan · 2024-10-24T22:33:47Z

Issue #, if available:

Description of changes:

In this PR, when no cuda is detected, it will automatically use glove-twitter-100.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

AnirudhDagar · 2024-10-25T10:55:30Z

src/autogluon_assistant/transformer/feature_transformers/scentenceFT.py

+        if 0: #torch.cuda.is_available():
+            try:
+                self.model = SentenceTransformer(self.model_name)
+            except:
+                logger.warning(f"No model {self.model_name} is found.")
+


this is dead code. it will always evaluate to false

AnirudhDagar

please address the comments. looking at the metaflow run for benchmarking glove, i don't see the gains that were expected (seen previously here). Could it be due to the limited runtime?

AnirudhDagar · 2024-10-25T10:56:27Z

src/autogluon_assistant/transformer/feature_transformers/scentenceFT.py

+            if 0: #torch.cuda.is_available():
+                transformed_train_column = huggingface_run(self.model, np.transpose(train_X[series_name].to_numpy()).T)
+                transformed_test_column = huggingface_run(self.model, np.transpose(test_X[series_name].to_numpy()).T)


same here as above

AnirudhDagar · 2024-10-25T10:57:14Z

src/autogluon_assistant/transformer/feature_transformers/scentenceFT.py

@@ -62,5 +91,5 @@ def _transform_dataframes(self, train_X: pd.DataFrame, test_X: pd.DataFrame) ->
                ]
                train_X = pd.concat([train_X.drop([series_name], axis=1), transformed_train_column], axis=1)
                test_X = pd.concat([test_X.drop([series_name], axis=1), transformed_test_column], axis=1)
-
-        return train_X, test_X
+


Suggested change

boranhan · 2024-10-25T17:30:30Z

the limited runtime?
no, it's because we are using different pre-trained Glove models. HG glove is more advanced and more well pretrained. howver, it can't be used for GPU.

AnirudhDagar · 2024-10-25T17:35:31Z

the limited runtime?
no, it's because we are using different pre-trained Glove models. HG glove is more advanced and more well pretrained. >howver, it can't be used for GPU.

I don't think that's the case, since DBInfer also uses glove (see here) and our previous benchmarks with their preprocessing showed that specifically stumbleupon competition was heavily benefitted even with glove embeddings and them running on CPU.

Anyway, my benchmark that is running at the moment will get the 4hrs results for this as well, but i think glove embeddings are capable to push the scores in this competition based on previous benchmarks.

boranhan · 2024-10-25T18:31:08Z

the limited runtime?
no, it's because we are using different pre-trained Glove models. HG glove is more advanced and more well pretrained. >howver, it can't be used for GPU.

I don't think that's the case, since DBInfer also uses glove (see here) and our previous benchmarks with their preprocessing showed that specifically stumbleupon competition was heavily benefitted even with glove embeddings and them running on CPU.

Anyway, my benchmark that is running at the moment will get the 4hrs results for this as well, but i think glove embeddings are capable to push the scores in this competition based on previous benchmarks.

I'm using the same model as their setting. I think it's likely that you were previously looking at the public leaderboard. it was 100% for public one, and 60% on private.

boranhan · 2024-10-25T18:33:07Z

please focus on your task of benchmarking. I'll take the weekend to look at the results and I'll prioritize the tasks based on the results. Thanks! Depends on the results, I might priority on this task or not

AnirudhDagar · 2024-10-25T18:51:16Z

sorted_competitions_341.csv
previous DFS benchmark showing stumbleupon perf gains with glove on CPU.

boranhan · 2024-10-27T00:00:05Z

sorted_competitions_341.csv previous DFS benchmark showing stumbleupon perf gains with glove on CPU.

I appreciate the results. however, it don't have any context to make a good call on the results. What was the code that ran it? and why isn't that code being implemented instead?

Overall, all I wanted to say:

so far, it is unknown if FE is at the highest priority task.
making all of the efforts to further improve on single benchmark can ONLY happen when we don't have any other high priority task.

AnirudhDagar · 2024-10-28T16:46:05Z

I think it is not a single benchmark, but all the datasets in the future that will have text columns, all of them will benefit from this. As for the code used, here: #2

It is the code from dbinfer/tab2graph library.

boranhan added 2 commits October 24, 2024 15:22

added dedependency on gensim

8b97a38

added glove embeddings for CPU.

f77fc94

boranhan requested a review from AnirudhDagar October 24, 2024 22:34

muted chatting information.

2c85738

AnirudhDagar reviewed Oct 25, 2024

View reviewed changes

removed debug logic

7469329

boranhan merged commit 8ecdde6 into autogluon:main Oct 25, 2024

AnirudhDagar mentioned this pull request Oct 29, 2024

changed the default pretrained model. #59

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Glove cpu #57

Glove cpu #57

boranhan commented Oct 24, 2024 •

edited

Loading

AnirudhDagar Oct 25, 2024

boranhan Oct 25, 2024

AnirudhDagar left a comment

AnirudhDagar Oct 25, 2024

boranhan Oct 25, 2024

AnirudhDagar Oct 25, 2024

boranhan commented Oct 25, 2024

AnirudhDagar commented Oct 25, 2024 •

edited

Loading

boranhan commented Oct 25, 2024 •

edited

Loading

boranhan commented Oct 25, 2024 •

edited

Loading

AnirudhDagar commented Oct 25, 2024

boranhan commented Oct 27, 2024

AnirudhDagar commented Oct 28, 2024

Glove cpu #57

Glove cpu #57

Conversation

boranhan commented Oct 24, 2024 • edited Loading

AnirudhDagar Oct 25, 2024

Choose a reason for hiding this comment

boranhan Oct 25, 2024

Choose a reason for hiding this comment

AnirudhDagar left a comment

Choose a reason for hiding this comment

AnirudhDagar Oct 25, 2024

Choose a reason for hiding this comment

boranhan Oct 25, 2024

Choose a reason for hiding this comment

AnirudhDagar Oct 25, 2024

Choose a reason for hiding this comment

boranhan commented Oct 25, 2024

AnirudhDagar commented Oct 25, 2024 • edited Loading

boranhan commented Oct 25, 2024 • edited Loading

boranhan commented Oct 25, 2024 • edited Loading

AnirudhDagar commented Oct 25, 2024

boranhan commented Oct 27, 2024

AnirudhDagar commented Oct 28, 2024

boranhan commented Oct 24, 2024 •

edited

Loading

AnirudhDagar commented Oct 25, 2024 •

edited

Loading

boranhan commented Oct 25, 2024 •

edited

Loading

boranhan commented Oct 25, 2024 •

edited

Loading