model config manager class #73

Vela-zz · 2023-12-10T13:31:06Z

Implement a model manager class for centralized type model management.

The model manager can manage transformer models used in different metric for different language. The manager support list_model_in_use, set_model_for_metric, reset, load_from_file, and save_to_disk, by using this, I think langcheck can

a centralized evalution model management
allow user to use the model they perfered, or let them built the eval pipeline with their own build.
evaluation version control, allow user save evaluation result align with the models evaluate them for better evaluation experinment management and repeated.
replace the pydoc in same metric in different language that only has a little differernce with the same one.

As it is a prototype, so I base it on zh-support branch to make it do not influence any other in-progress feature's updating.

It can be used like this,

Changes to be committed: modified: src/langcheck/metrics/__init__.py new file: src/langcheck/metrics/_model_management.py new file: src/langcheck/metrics/modelconfig.ini modified: src/langcheck/metrics/zh/reference_based_text_quality.py Changes to be committed: modified: src/langcheck/metrics/__init__.py new file: src/langcheck/metrics/_model_management.py new file: src/langcheck/metrics/modelconfig.ini modified: src/langcheck/metrics/zh/reference_based_text_quality.py

…into zh_model_config

yosukehigashi · 2023-12-12T01:53:58Z

Implement a model manager class for centralized type model management.

The model manager can manage transformer models used in different metric for different language. The manager support list_model_in_use, set_model_for_metric, reset, load_from_file, and save_to_disk, by using this, I think langcheck can

a centralized evalution model management

allow user to use the model they perfered, or let them built the eval pipeline with their own build.

evaluation version control, allow user save evaluation result align with the models evaluate them for better evaluation experinment management and repeated.

replace the pydoc in same metric in different language that only has a little differernce with the same one.

Thank you for writing this up!! I like the idea of a Model Manager, and agree that the properties you listed are all nice to have. I feel like it's not quite a "Model Manager" yet though, since it mostly just manages the model name/path.

Concretely, I think that it would be helpful to make the Model Manager return an actual model, rather than just the model name. Here’s an example of what I have in mind:

# Define various model loaders
def load_sentence_transformer(name):
    return SentenceTransformer(name)

def load_auto_model_for_sequence_classification(name):
    tokenizer = AutoTokenizer.from_pretrained(name)
    model = AutoModelForSequenceClassification.from_pretrained(name)
    return tokenizer, model

# Other loaders

# Model Manager loads a model based on the config
class ModelManager:
    def __init__(self, config):
        # Read config

    def get_model(self, language, metric):
        # NOTE: will also need to ensure that we don't reload models unnecessarily
        model_config = self.config[language][metric]
        if model_config['type'] == 'sentence_transformer':
            return load_sentence_transformer(model_config['name'])
        # Handle other types

    def validate_config(self, language='all', metric='all'):
        # Check that the model(s) can be loaded for the specified language and metric given the config

    # Other methods like set, reset, etc.

A couple additional benefits that I think this could bring

This makes it easy for us to add validation code. Otherwise, the user will need to actually run the metric to see if their config is valid or not
This unifies LangCheck code around loading local models

Let me know what you think!

…into zh_model_config

yosukehigashi

Thanks for the change, this looks great overall!

I committed a few cleanup changes directly, and made a few minor comments in this initial review. I'll play around with the model manager a bit more and try out the caching / config interface and send another follow-up review

src/langcheck/metrics/config/metric_config.ini

src/langcheck/metrics/_model_management.py

yosukehigashi

Thanks for the updates @Vela-zz!! This is looking really good now 😄

(FYI I made a bunch of cleanup commits, docstrings and comments mostly)

src/langcheck/metrics/__init__.py

src/langcheck/metrics/model_manager/__init__.py

src/langcheck/metrics/model_manager/_model_management.py

src/langcheck/metrics/zh/reference_free_text_quality.py

src/langcheck/metrics/zh/source_based_text_quality.py

…into zh_model_config

yosukehigashi

Basically LGTM! Couple last questions, but looks good to merge after that 😄

tests/metrics/model_manager/test_model_loader.py

tests/metrics/model_manager/test_model_manager.py

src/langcheck/metrics/model_manager/_model_loader.py

yosukehigashi

LGTM, thanks for all of your work on this @Vela-zz!! I'll merge this to the zh branch once tests pass 🎉

Vela-zz added 12 commits December 10, 2023 17:26

implent a model config manager class

87016a2

add test case for model management

2bc3b75

apply format suggestion

34e49fc

apply format suggestion

083c612

pydoc update & fix test case

2cdf43c

add test case for model management

2843c88

apply format suggestion

49983cd

pydoc update & fix test case

63aa6e6

apply format suggestion

66ef0c6

Merge branch 'zh_model_config' of https://github.com/Vela-zz/langcheck …

6cd3755

…into zh_model_config

pydoc update & fix test case

b89ed30

Vela-zz marked this pull request as draft December 10, 2023 13:31

Vela-zz changed the title ~~model config Manager class~~ model config manager class Dec 10, 2023

Vela-zz and others added 12 commits December 23, 2023 23:58

Merge branch 'zh_model_config' of https://github.com/Vela-zz/langcheck …

1a2b720

…into zh_model_config

add model loader

2506ece

re-implent a model manager class

57ea217

add update_metrics_for_model method

3bf196c

apply format suggestion

bb70f64

clean up model loader docstrings

db15318

fix format

ddac3cf

clean up docstrings in model management

6b2a382

make self.config not None

0e23c39

remove unnecessary noqa tags

a1fd972

clean up comments

e80285d

fix fetch_model format

d841711

yosukehigashi reviewed Dec 27, 2023

View reviewed changes

yosukehigashi added 2 commits December 27, 2023 05:13

fix ref based and source based format

2aacbf2

fix format in ref free

6d0a530

yosukehigashi added 3 commits February 9, 2024 03:17

minor cleanup

8b247fd

clean validate_config

42ccc66

remove unused import

f4bf665

yosukehigashi requested changes Feb 9, 2024

View reviewed changes

Vela-zz added 8 commits February 9, 2024 13:55

Merge branch 'zh_model_config' of https://github.com/Vela-zz/langcheck …

6f33c0a

…into zh_model_config

add test case for model manager class

e7d42e9

remove global value in metrics

1086826

apply format suggestions

89d9aaa

make jp char and zh char show formally in test, not unicode

a09dbd3

fix import error in en detoxify raised by pyright

daeb706

apply format check suggestion

64f7e95

apply format check suggestions and remove useless import

6155623

Vela-zz marked this pull request as ready for review February 19, 2024 10:00

yosukehigashi added 3 commits February 20, 2024 01:57

remove unused imports

7f45a8c

remove unused import

0f15528

cleanup and docstrings

5371f86

yosukehigashi reviewed Feb 20, 2024

View reviewed changes

Vela-zz and others added 7 commits February 26, 2024 15:41

add tokenizer_revision for fine grained control

57b864f

apply tokenizer_revision update to test case

160a0d3

clean up docstring and comments

10593fc

fix typo

7c770b0

specify which fields are optional in config

2ce534f

removed unnecessary noqa

86f55fe

fix yapf format

f540159

yosukehigashi approved these changes Feb 28, 2024

View reviewed changes

yosukehigashi added 2 commits February 28, 2024 10:04

fix yapf format

fdd353e

maximize disk space

b1d5d76

yosukehigashi merged commit 14fa90e into citadel-ai:Vela-zz/zh-support Feb 28, 2024
2 checks passed

Vela-zz deleted the zh_model_config branch March 5, 2024 13:20

yosukehigashi mentioned this pull request Mar 8, 2024

Refactoring of the local metrics #94

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model config manager class #73

model config manager class #73

Vela-zz commented Dec 10, 2023 •

edited

Loading

yosukehigashi commented Dec 12, 2023

yosukehigashi left a comment

yosukehigashi left a comment

yosukehigashi left a comment

yosukehigashi left a comment

model config manager class #73

model config manager class #73

Conversation

Vela-zz commented Dec 10, 2023 • edited Loading

yosukehigashi commented Dec 12, 2023

yosukehigashi left a comment

Choose a reason for hiding this comment

yosukehigashi left a comment

Choose a reason for hiding this comment

yosukehigashi left a comment

Choose a reason for hiding this comment

yosukehigashi left a comment

Choose a reason for hiding this comment

Vela-zz commented Dec 10, 2023 •

edited

Loading