Reduce package size #251

Goldziher · 2024-11-24T09:54:55Z

Hi and thanks for this great library.

I am encountering a common problem with ML libraries using KeyBERT - namely, the package is very large due to its dependencies. E.g. torch is a gigantic library (used in the sentence-transformers li), scikit-learn is very large etc. This makes it very difficult to use this library in a serverless context due to cloud function size limitations and cold start issues.

I would like to suggest working to reduce the package size. This can be done by making some dependencies optional and adding guards against them.

The text was updated successfully, but these errors were encountered:

MaartenGr · 2024-11-25T05:17:37Z

Thank you for the suggestion! Reducing package size would be helpful but I'm not quite sure what the suggested implementation would look like. Take scikit-learn for example, when I look through the source code I cannot find any way to make it optional as it is a necessary dependency. The same can be said for sentence-transformers as most users are using that as a backend.

How would you suggest removing those packages but still keep functionality of KeyBERT? Also, if these packages are needed but for some reason need a separate installation, how would you suggest making it possible that pip install keybert remains unchanged? For instance, pip install keybert[minimum] is not supported by pip.

Goldziher · 2024-11-25T09:14:47Z

Im glad you are looking positvely on this suggestion.

To make dependencies optional, there are a few elements that can be used:

import blocks

try:
    from fast_query_parsers import parse_query_string as parse_qsl
except ImportError:
    from urllib.parse import parse_qsl as _parse_qsl

In this example (see source here) we try to import an optional dependency. If there is an ImportError is raised, a fallback is assigned instead.

This could be used for example to implement alternative logic in utils etc.

validation:

The simplest approach is to validate that at least one backend is installed on the library load. I.e. runtime validation.
Another approach is to raise an error during installation using a setup.py post-install script. See for example this StackOverflow thread.

how to allow backend selection?

The answer is to switch to runtime backend selection. This is a breaking change, and thus it will need to be implemented in a v1.0.0 to work. Basically, the user has to install a backend - either using a extra dependency group, or by installing it separately.

MaartenGr · 2024-11-25T10:15:56Z

In this example (see source here) we try to import an optional dependency. If there is an ImportError is raised, a fallback is assigned instead.

How would something like this be relevant for KeyBERT? I believe this is already implemented. There are many optional installations that you can do outside of the main package for different backends: https://maartengr.github.io/KeyBERT/guides/embeddings.html

validation:
The simplest approach is to validate that at least one backend is installed on the library load. I.e. runtime validation.
Another approach is to raise an error during installation using a setup.py post-install script. See for example this StackOverflow thread.

The thing is, nearly all users will make use of sentence-transformers as that is typically not ony the most performant backend but also something that I'm 99% of users will use as a backend. I believe it's the industry standard.

how to allow backend selection?
The answer is to switch to runtime backend selection. This is a breaking change, and thus it will need to be implemented in a v1.0.0 to work. Basically, the user has to install a backend - either using a extra dependency group, or by installing it separately.

I'm not sure whether this is ideal as this would mean that pip install keybert will result in a package that cannot be used since it does not come with a necessary backend. You would always have to run something like pip install keybert[sbert]. One of the most important components to my packages is ease of use, and I believe adding more steps would make the user experience less pleasant.

Let me rephrase my initial question. I believe you cannot remove sentence-transformers or scikit-learn since the former is a backend that almost all users will use and the package really cannot work without the latter. Thus, how do you propose reducing the package size when these dependencies are necessary?

If these packages are not necessary for the functionality of KeyBERT, could you explain why?

Goldziher · 2024-11-25T11:04:06Z

Let me rephrase my initial question. I believe you cannot remove sentence-transformers or scikit-learn since the former is a backend that almost all users will use and the package really cannot work without the latter. Thus, how do you propose reducing the package size when these dependencies are necessary?

in this case it is impossible.

This though makes it difficult to use this library in contexts where the size of the library is an issue.

MaartenGr · 2024-11-25T11:14:29Z

in this case it is impossible.

That's too bad. I hoped that since you specially mentioned sentence-transformers and scikit-learn, you know of a specific way to remove these dependencies that relate to the internals of KeyBERT.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce package size #251

Reduce package size #251

Goldziher commented Nov 24, 2024 •

edited

Loading

MaartenGr commented Nov 25, 2024

Goldziher commented Nov 25, 2024 •

edited

Loading

MaartenGr commented Nov 25, 2024

Goldziher commented Nov 25, 2024

MaartenGr commented Nov 25, 2024

Reduce package size #251

Reduce package size #251

Comments

Goldziher commented Nov 24, 2024 • edited Loading

MaartenGr commented Nov 25, 2024

Goldziher commented Nov 25, 2024 • edited Loading

MaartenGr commented Nov 25, 2024

Goldziher commented Nov 25, 2024

MaartenGr commented Nov 25, 2024

Goldziher commented Nov 24, 2024 •

edited

Loading

Goldziher commented Nov 25, 2024 •

edited

Loading