support user specified language only installation #97

Vela-zz · 2024-03-07T10:05:01Z

split the dependices into several optional dependencies accroding to which language use it.
now user can install like

pip install langcheck #equals to install [en]
pip install langcheck[en]
pip install langcheck[ja]
pip install langcheck[all]
...

TODOs (added by Kenny):

Test recursive optional dependencies with an older version of pip
Decide on lazy loading or not
Decide what to do about [optional]
Decide what to do about optional pytests
Update installation documentation and README
~~Add custom error message~~ (not possible)

Vela-zz · 2024-03-07T10:06:37Z

#88
@kennysong please check if this update align with the requirements and update the GH Action.

Vela-zz · 2024-03-09T13:03:00Z

It seems we should update init in metrics first. langcheck init all language at once but not check which language was installed.

…trics

kennysong · 2024-03-09T14:41:54Z

Yeah, I had the exact same conclusion. I'm wondering if we should lazy load the language sub-packages so they're still visible at the package level, but won't actually import any dependencies until a function is called.

I'll explore this a bit more tomorrow

…all languages

kennysong · 2024-03-10T07:29:48Z

Okay, I've implemented lazy loading of language-specific packages in langcheck.augment and langcheck.metrics.

Without lazy loading, the user is required to explicitly load each language, such as import langcheck.metrics.ja. Otherwise, they'll see this error which IMO is a bit mysterious:

>>> import langcheck
>>> langcheck.metrics.ja
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'langcheck.metrics' has no attribute 'ja'

With lazy loading, the user will see the exact same behavior as current LangCheck:

>>> import langcheck
>>> langcheck.metrics.ja
<module 'langcheck.metrics.ja' from '/usr/local/lib/python3.8/site-packages/langcheck/metrics/ja/__init__.py'>

If the user didn't install Japanese dependencies with pip install langcheck[ja], then they'll see:

>>> import langcheck  # The import error doesn't happen here due to lazy loading! 
>>> langcheck.metrics.ja
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<frozen importlib._bootstrap>", line 271, in _module_repr
  File "/usr/local/lib/python3.8/importlib/util.py", line 245, in __getattribute__
    self.__spec__.loader.exec_module(self)
  File "<frozen importlib._bootstrap_external>", line 843, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/usr/local/lib/python3.8/site-packages/langcheck/metrics/ja/__init__.py", line 1, in <module>
    from langcheck.metrics.ja._tokenizers import JanomeTokenizer, MeCabTokenizer
  File "/usr/local/lib/python3.8/site-packages/langcheck/metrics/ja/_tokenizers.py", line 6, in <module>
    from janome.tokenizer import Tokenizer
ModuleNotFoundError: No module named 'janome'

I think this is the best UX, but it also adds some complexity to metrics/__init__.py and augment/__init__.py. I've never used lazy loading before so I don't know if it'll cause unexpected problems.

Also, since this is a non-trivial change, I think we should merge this PR after cutting the 0.5.0 release so we have extra time to test it.

Would appreciate your thoughts @Vela-zz and @liwii!

Vela-zz · 2024-03-10T14:31:20Z

Okay, I've implemented lazy loading of language-specific packages in langcheck.augment and langcheck.metrics.

Without lazy loading, the user is required to explicitly load each language, such as import langcheck.metrics.ja. Otherwise, they'll see this error which IMO is a bit mysterious:
>>> import langcheck
>>> langcheck.metrics.ja
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'langcheck.metrics' has no attribute 'ja'
With lazy loading, the user will see the exact same behavior as current LangCheck:
>>> import langcheck
>>> langcheck.metrics.ja
<module 'langcheck.metrics.ja' from '/usr/local/lib/python3.8/site-packages/langcheck/metrics/ja/__init__.py'>
If the user didn't install Japanese dependencies with pip install langcheck[ja], then they'll see:
>>> import langcheck  # The import error doesn't happen here due to lazy loading! 
>>> langcheck.metrics.ja
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<frozen importlib._bootstrap>", line 271, in _module_repr
  File "/usr/local/lib/python3.8/importlib/util.py", line 245, in __getattribute__
    self.__spec__.loader.exec_module(self)
  File "<frozen importlib._bootstrap_external>", line 843, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/usr/local/lib/python3.8/site-packages/langcheck/metrics/ja/__init__.py", line 1, in <module>
    from langcheck.metrics.ja._tokenizers import JanomeTokenizer, MeCabTokenizer
  File "/usr/local/lib/python3.8/site-packages/langcheck/metrics/ja/_tokenizers.py", line 6, in <module>
    from janome.tokenizer import Tokenizer
ModuleNotFoundError: No module named 'janome'
I think this is the best UX, but it also adds some complexity to metrics/__init__.py and augment/__init__.py. I've never used lazy loading before so I don't know if it'll cause unexpected problems.

Also, since this is a non-trivial change, I think we should merge this PR after cutting the 0.5.0 release so we have extra time to test it.

Would appreciate your thoughts @Vela-zz and @liwii!

FYI @kennysong SPEC1 mentioned about lazy loading and the latent risk by using lazy_loader library.
I followed the instructions in SPEC1 and change all attrs in metrics into lazy loading mode, but loading time saved is not noticable.

kennysong · 2024-03-11T13:36:34Z

FYI @kennysong SPEC1 mentioned about lazy loading and the latent risk by using lazy_loader library.
I followed the instructions in SPEC1 and change all attrs in metrics into lazy loading mode, but loading time saved is not noticable.

Ah nice find, I hadn't heard of SPEC1 before!

The SPEC1 lazy_loader highlights an important difference with the standard library's LazyLoader (which I used).

The stdlib LazyLoader doesn't import the actual contents of the package, so while langcheck.metrics.ja is visible, langcheck.metrics.ja.factual_consistency() is not visible.

This means that code completion doesn't work:

And more importantly, type checking doesn't work:

Note that type checks do work if you import the full path:

Given this, I think that the current implementation with LazyLoader isn't a good solution. We have two alternatives:

Don't import language-specific packages in langcheck.metrics and langcheck.augment
Use a more sophisticated and complex option like SPEC1's lazy_loader

However, option 2 is fairly complex, not under active development (only 96 stars, last release in July 2023), and IMO beyond our team's current bandwidth to properly understand and maintain.

I think we should proceed with option 1. This will require users to explicitly import the full path like import langcheck.metrics.ja, which is annoying and a breaking change, but I think it's the only reasonable option right now.

Let me know what you think!

Vela-zz · 2024-03-11T15:25:38Z

FYI @kennysong SPEC1 mentioned about lazy loading and the latent risk by using lazy_loader library.
I followed the instructions in SPEC1 and change all attrs in metrics into lazy loading mode, but loading time saved is not noticable.

Ah nice find, I hadn't heard of SPEC1 before!

The SPEC1 lazy_loader highlights an important difference with the standard library's LazyLoader (which I used).

The stdlib LazyLoader doesn't import the actual contents of the package, so while langcheck.metrics.ja is visible, langcheck.metrics.ja.factual_consistency() is not visible.

This means that code completion doesn't work:

And more importantly, type checking doesn't work:

Note that type checks do work if you import the full path:

Given this, I think that the current implementation with LazyLoader isn't a good solution. We have two alternatives:

Don't import language-specific packages in langcheck.metrics and langcheck.augment

Use a more sophisticated and complex option like SPEC1's lazy_loader

However, option 2 is fairly complex, not under active development (only 96 stars, last release in July 2023), and IMO beyond our team's current bandwidth to properly understand and maintain.

I think we should proceed with option 1. This will require users to explicitly import the full path like import langcheck.metrics.ja, which is annoying and a breaking change, but I think it's the only reasonable option right now.

Let me know what you think!

I upload my experiment (it can support type checking/ static check). I think through the example below, it can show that the lazy loader used in SPEC1 is quite flexible. they can mix eager import with lazy import, just eager import the ja subpackage but make func under ja package lazy import...so if abused, when times goes on... no one would know which one is lazy import and which one is eager.
so I make a slight change to two options,

Don't import language-specific packages in langcheck.metrics and langcheck.augment
Apply lazy import to all func under language package.

As PEP690 was rejected, it seems python would not have a standard way to do lazy import in recent future.
So, I think option 1 seems more reasonable, it only requires the user import language package explicitly, may influence backward compatibility (would not be a problem as the version is only 0.4.0 ?).

kennysong · 2024-03-13T07:15:04Z

Thanks for the analysis! I agree that lazy_loader is more flexible than LazyLoader but hard to maintain. Let's go with option 1.

@Vela-zz could you then remove all of the LazyLoader code and fix tests if needed?

I can work on updating all of the documentation to reflect the new explicit imports.

…z/langcheck into pr/Vela-zz/97

kennysong · 2024-03-22T06:59:13Z

Thanks @Vela-zz! I think this PR is good to go. I was playing around with a bunch of tweaks, and was able to get import langcheck working the same as before by wrapping the language imports in a try/except. This way users don't need to import langcheck.metrics.ja anymore.

kennysong · 2024-03-22T06:59:47Z

@yosukehigashi or @liwii – Since this is a big change, could one of you could test this PR before merging?

The key workflow should look like:

# Open a clean Python environment
docker run -it -v "$(pwd)":/langcheck python:3.8 bash -c "cd /langcheck && bash"

# Install english support
pip install .

# Test LangCheck
python
>>> import langcheck
>>> langcheck.metrics.toxicity  # Works
>>> langcheck.metrics.en.toxicity  # Works
>>> langcheck.metrics.ja.toxicity  # Doesn't work
>>> import langcheck.metrics  # Works
>>> import langcheck.metrics.en  # Works
>>> import langcheck.metrics.ja  # Doesn't work

# Install Japanese support
pip install .[ja]

# Test LangCheck
python
>>> import langcheck
>>> langcheck.metrics.toxicity  # Works
>>> langcheck.metrics.en.toxicity  # Works
>>> langcheck.metrics.ja.toxicity  # Works
>>> import langcheck.metrics  # Works
>>> import langcheck.metrics.en  # Works
>>> import langcheck.metrics.ja  # Works

Also, test in VSCode. Install only one language in a venv. Enable that venv in VSCode. Then create a test.py file in the root directory and make sure that VSCode's static analysis features still work for all languages.

Vela-zz · 2024-03-23T10:24:44Z

Thanks @Vela-zz! I think this PR is good to go. I was playing around with a bunch of tweaks, and was able to get import langcheck working the same as before by wrapping the language imports in a try/except. This way users don't need to import langcheck.metrics.ja anymore.

Thanks, learn a lot.

liwii · 2024-04-02T06:31:10Z

Confirmed that

In a clean docker environment, import langcheck.metrics.ja works if and only if langcheck[ja] is installed
In a venv without langcheck[ja] installed, VSCode extensions properly recognizes the langcheck.metrics.ja module.

kennysong · 2024-04-02T07:15:36Z

Thanks for testing! I'll merge this now. The only test that's failing is because the HanLP server is down and is unrelated.

support only specified language install

9c6a2f8

Vela-zz marked this pull request as draft March 7, 2024 10:05

kennysong added 6 commits March 9, 2024 13:01

Merge branch 'main' into lang_spcific_installation

25e9f29

Add more japanese dependencies

b922119

Fix quotes [no ci]

5821ce3

Try to add langauges into pip_install_matrix.yml

aa4f844

Fix variable name

d5d106f

Fix import in pip_install_matrix

7444425

kennysong added 2 commits March 9, 2024 13:30

Try not importing all languages in langcheck.augment and langcheck.me…

0b3c41e

…trics

Fix formatting.yml

280a21b

kennysong added 2 commits March 10, 2024 07:18

Use lazy import

9a50b12

Don't import langcheck.augment in tests since it's not available for …

c7907b9

…all languages

Fix flake8

38902e8

Vela-zz and others added 6 commits March 18, 2024 23:25

remove lazy loading

3194459

notebook & readme content fixing

1acaad9

Fix typo [no ci]

4d710c8

Remove unnecessary imports

a3abc62

Merge branch 'main' into lang_spcific_installation

06fb24d

Merge branch 'lang_spcific_installation' of https://github.com/vela-z…

b69a889

…z/langcheck into pr/Vela-zz/97

Vela-zz marked this pull request as ready for review March 21, 2024 14:56

kennysong added 2 commits March 21, 2024 15:28

Update pip install instructions

958fa0c

Bash highlighting

fd3ace2

kennysong added 3 commits March 21, 2024 15:57

Bring back [optional] as [ja_optional]

78698a4

Put back pytestmark

5a7c016

Import inside try/except

bffcffb

kennysong mentioned this pull request Mar 21, 2024

Update documentation #103

Merged

kennysong added 5 commits March 21, 2024 17:35

Fix issue with underscores and dashes

5f0b750

Revert README_de

c461db9

Revert notebook changes

56a5fdb

Rename notebook

4a67330

Add FAQ

ba07374

kennysong requested review from liwii and yosukehigashi March 22, 2024 06:59

Merge branch 'main' into lang_spcific_installation

19e9def

kennysong approved these changes Apr 2, 2024

View reviewed changes

kennysong merged commit c750d2d into citadel-ai:main Apr 2, 2024
37 of 38 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support user specified language only installation #97

support user specified language only installation #97

Vela-zz commented Mar 7, 2024 •

edited by kennysong

Loading

Vela-zz commented Mar 7, 2024

Vela-zz commented Mar 9, 2024 •

edited

Loading

kennysong commented Mar 9, 2024

kennysong commented Mar 10, 2024

Vela-zz commented Mar 10, 2024 •

edited

Loading

kennysong commented Mar 11, 2024 •

edited

Loading

Vela-zz commented Mar 11, 2024

kennysong commented Mar 13, 2024 •

edited

Loading

kennysong commented Mar 22, 2024

kennysong commented Mar 22, 2024 •

edited

Loading

Vela-zz commented Mar 23, 2024

liwii commented Apr 2, 2024

kennysong commented Apr 2, 2024

support user specified language only installation #97

support user specified language only installation #97

Conversation

Vela-zz commented Mar 7, 2024 • edited by kennysong Loading

Vela-zz commented Mar 7, 2024

Vela-zz commented Mar 9, 2024 • edited Loading

kennysong commented Mar 9, 2024

kennysong commented Mar 10, 2024

Vela-zz commented Mar 10, 2024 • edited Loading

kennysong commented Mar 11, 2024 • edited Loading

Vela-zz commented Mar 11, 2024

kennysong commented Mar 13, 2024 • edited Loading

kennysong commented Mar 22, 2024

kennysong commented Mar 22, 2024 • edited Loading

Vela-zz commented Mar 23, 2024

liwii commented Apr 2, 2024

kennysong commented Apr 2, 2024

Vela-zz commented Mar 7, 2024 •

edited by kennysong

Loading

Vela-zz commented Mar 9, 2024 •

edited

Loading

Vela-zz commented Mar 10, 2024 •

edited

Loading

kennysong commented Mar 11, 2024 •

edited

Loading

kennysong commented Mar 13, 2024 •

edited

Loading

kennysong commented Mar 22, 2024 •

edited

Loading