Remove NLTK dependency #85
Labels
code health
Proactive technical investment via refactorings, removals, etc.
enhancement
Relates to new features or improvements to existing features
help wanted
Ready to be picked up by anyone in the community
maintenance
Routine upkeep necessary for the health of the platform
performance
Relates to improving latency/throughput or reducing resource usage
An LMS web worker process uses over 300 MB of RAM once initialized. Over 10% of that is from loading nltk 1, which we only use in one place–to parse chemical equation inputs in ProblemBlocks (see chemcalc.py 1 in the openedx-chem repo).
It's not clear why the grammar is specified using NLTK instead of pyparsing. It could have been to get around some limitation that pyparsing had twelve years ago, or it could have just been that the author was just more familiar with NLTK and could hack the code out faster that way.
Task: Remove our dependency on NLTK by changing the parser implementation in this repo? It would likely involve some digging, and exact backwards compatibility would be extremely important.
More details about the process of getting this info: https://discuss.openedx.org/t/reducing-memory-usage-nltk/13406
The text was updated successfully, but these errors were encountered: