Add comet scorer #118

anthdr · 2024-09-25T08:28:03Z

Added comet scorers for validation in NMT training.
I left progress_bar=True, num_workers=0, batch_size=64 and gpu=0 for cpu inference, should it be like that by default and should it be modifiable via config ?
I did not modify tests as it would greatly extend the tests duration (downloading comet models and inferencing on cpu).
I also took the liberty to lightly modify wmt17 recipe/doc to make it more beginner friendly.

Fix #34

vince62s · 2024-09-25T09:19:39Z

Thanks for your PR! I think it would be better to use a single block of code for all comet-like scorer and add a setting for the model path at start-up.

vince62s · 2024-10-02T09:05:58Z

In a real world, COMET requires a GPU so it would be great to unload the model being trained or/and specify another gpu for comet. What do you think ?

francoishernandez

👋 @anthdr !
Here are a few comments.

eole/scorers/__init__.py

eole/scorers/comet.py

francoishernandez · 2024-10-02T09:12:57Z

eole/scorers/comet.py

+        if self.model_name == "COMET-KIWI":
+            for _src, _hyp in zip(texts_srcs, preds):
+                current_segment = {"src": _src, "mt": _hyp}
+                data.append(current_segment)
+        else:
+            for _src, _tgt, _hyp in zip(texts_srcs, texts_refs, preds):
+                current_segment = {"src": _src, "mt": _hyp, "ref": _tgt}
+                data.append(current_segment)


Not super fan of this condition, but I guess that works. Not sure we could do much cleaner without defining subclasses for each metric, which would probably be overkill here.

francoishernandez · 2024-10-02T09:14:27Z

eole/scorers/comet.py

+                data.append(current_segment)
+        if len(preds) > 0:
+            score = self.comet_model.predict(
+                data, batch_size=64, gpus=0, num_workers=0, progress_bar=True


batch_size is a bit hidden here. Do we expect to need to change this at some point?
Might not be necessary to have a dedicated flag for this, but maybe set this explicitly in the init so that it's a bit more explicit.

Maybe we should let these 3 (4?) option to be customized indeed

anthdr · 2024-10-14T15:02:27Z

Continuing the discussion here:
There are multiple way for doing this, off-loading current model to run comet scoring on gpu as @vince62s suggested, we could run it parallel of the training (subprocessing a comet cpu scoring), dedicating a second gpu to score.
That's why I first implemented it on cpu during eval (to not mess with vram) and have it working transparently for a naiver user.
What would you recommend ?

vince62s · 2024-10-18T15:30:32Z

I think the easiest for 1) user convenience 2) speed is to unload the model being trained, load the comet model on the same gpu, and relaod the trianed model after scoring. Scoring on cpu will be way too slow and dedicating a second gpu will be a nightmare to handle from the user standpoint

anthdr · 2024-12-27T13:07:54Z

Tested how vram was impacted with this method:
Loading a comet model does not seem to directly load vram
self.comet_model = load_from_checkpoint(comet_model_path)
but after running prediction

            score = self.comet_model.predict(
                data, batch_size=64, gpus=1, num_workers=0, progress_bar=True
            )

This leaves a 300mib trace in vram.

anthdr added 4 commits September 24, 2024 17:58

score with multiples comet models

a9ae402

update docstring

181898a

missed a comma

dc4911a

black formatting

f86d112

comet-like model now handled with a single CometScorer class

ae10624

francoishernandez reviewed Oct 2, 2024

View reviewed changes

using metric argument within class to load comet models

07e5198

anthdr added 2 commits December 27, 2024 12:24

unload model to cpu before comet scoring

6fdcd78

merge

a3288ca

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add comet scorer #118

Add comet scorer #118

anthdr commented Sep 25, 2024 •

edited by francoishernandez

Loading

vince62s commented Sep 25, 2024

vince62s commented Oct 2, 2024

francoishernandez left a comment

francoishernandez Oct 2, 2024

francoishernandez Oct 2, 2024

anthdr Oct 14, 2024

anthdr commented Oct 14, 2024

vince62s commented Oct 18, 2024

anthdr commented Dec 27, 2024

Add comet scorer #118

Are you sure you want to change the base?

Add comet scorer #118

Conversation

anthdr commented Sep 25, 2024 • edited by francoishernandez Loading

vince62s commented Sep 25, 2024

vince62s commented Oct 2, 2024

francoishernandez left a comment

Choose a reason for hiding this comment

francoishernandez Oct 2, 2024

Choose a reason for hiding this comment

francoishernandez Oct 2, 2024

Choose a reason for hiding this comment

anthdr Oct 14, 2024

Choose a reason for hiding this comment

anthdr commented Oct 14, 2024

vince62s commented Oct 18, 2024

anthdr commented Dec 27, 2024

anthdr commented Sep 25, 2024 •

edited by francoishernandez

Loading