Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove NLTK dependency #85

Open
feanil opened this issue Jul 15, 2024 · 0 comments
Open

Remove NLTK dependency #85

feanil opened this issue Jul 15, 2024 · 0 comments
Labels
code health Proactive technical investment via refactorings, removals, etc. enhancement Relates to new features or improvements to existing features help wanted Ready to be picked up by anyone in the community maintenance Routine upkeep necessary for the health of the platform performance Relates to improving latency/throughput or reducing resource usage

Comments

@feanil
Copy link
Contributor

feanil commented Jul 15, 2024

An LMS web worker process uses over 300 MB of RAM once initialized. Over 10% of that is from loading nltk 1, which we only use in one place–to parse chemical equation inputs in ProblemBlocks (see chemcalc.py 1 in the openedx-chem repo).

It's not clear why the grammar is specified using NLTK instead of pyparsing. It could have been to get around some limitation that pyparsing had twelve years ago, or it could have just been that the author was just more familiar with NLTK and could hack the code out faster that way.

Task: Remove our dependency on NLTK by changing the parser implementation in this repo? It would likely involve some digging, and exact backwards compatibility would be extremely important.

More details about the process of getting this info: https://discuss.openedx.org/t/reducing-memory-usage-nltk/13406

@feanil feanil added enhancement Relates to new features or improvements to existing features maintenance Routine upkeep necessary for the health of the platform code health Proactive technical investment via refactorings, removals, etc. good first issue A good task for a newcomer to start with help wanted Ready to be picked up by anyone in the community and removed good first issue A good task for a newcomer to start with labels Jul 15, 2024
@ormsbee ormsbee added the performance Relates to improving latency/throughput or reducing resource usage label Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code health Proactive technical investment via refactorings, removals, etc. enhancement Relates to new features or improvements to existing features help wanted Ready to be picked up by anyone in the community maintenance Routine upkeep necessary for the health of the platform performance Relates to improving latency/throughput or reducing resource usage
Projects
None yet
Development

No branches or pull requests

2 participants