basics:
- BM25
- https://umap-learn.readthedocs.io/en/latest/basic_usage.html
- https://github.com/bmeaut/python_nlp_2018_spring/blob/master/course_material/14_Semantics_II/14_Semantics_2_lab.ipynb
DL: - https://www.tensorflow.org/tutorials/text/word2vec - Embedding!!! - ** https://github.com/bmeaut/python_nlp_2020_fall/blob/master/labs/08_Deep_learning_nlp_lab.ipynb - corresponding lecture: https://github.com/bmeaut/python_nlp_2020_fall/blob/master/lectures/08_Deep_learning_nlp.ipynb - https://github.com/bmeaut/python_nlp_2020_fall/blob/master/lectures/09_Sequence_modeling.ipynb - fastai Text - https://github.com/fastai/fastbook/blob/master/10_nlp.ipynb - https://github.com/fastai/fastbook/blob/master/12_nlp_dive.ipynb - stemming/lemmatization: löscht viele Inhalte. Besser passend encoden mit eigenen tokens - https://github.com/huggingface/transformers/tree/master/examples/text-classification
trees:
- lightgbm
- xgboost
- catboost
useful stuff:
- https://allennlp.org/ (demo)
- https://github.com/allenai/kb/issues/13
- https://www.tensorflow.org/tutorials/text/classify_text_with_bert
- https://github.com/fastai/fastbook