Throughout this course, we will explore many aspects of natural language processing, starting with the very latest developments within language models - specifically large language models. From there on, we go back to learn more fundamental topics such as part-of-speech tagging, grammars, dependency parsing and tasks like sentiment analysis and topic modeling.
All labs will be provided as Jupyter Notebooks (.ipynb). The first lab will only consist of questions-answers in markdown-cells, to get familiar with the format. The remaining labs will require you to properly use the environment with a mix of markdown and code cells.
You must pass all labs to be eligible for the exam.
Each lab will have files starting with the prefix lab{N}
,
lab{N}_description.md
- a description of the lablab{N}_exercises.ipynb
- the main notebook with the exercises- you will submit this file to blackboard
By the deadline for each lab, you will submit your lab{N}_exercises_{your-username}.ipynb
file to Blackboard. You can submit as many times as you want - only the last submission will be considered.
Lab | Link | Published | Deadline | Topic | Libraries | Chapters |
---|---|---|---|---|---|---|
1 | Lab1 | Jan. 8 | Jan. 22 | Large language models | transformers | - |
2 | Lab2 | Jan. 22 | Feb. 5 | Tokenization, introduction to word vectors and language modeling | NLTK | 2, 3 |
3 | Lab3 | Feb. 5 | Feb. 19 | Part-of-speech tagging, stemming/lemmatization, TF-IDF | NLTK, spaCy | 4, 5, 6 |
4 | Lab4 | Feb. 19 | Mar. 4 | Wordnet and SentiWordNet, dependency parsing, POS chunking | spaCy, Scikit-learn | 7, 8 |
5 | Lab5 | Mar. 4 | Mar. 18 | Unsupervised topic modeling and named entities | Gensim | 9, 10, 11 |
The course curriculum is mostly based around the 2022 book by Ekaterina Kochmar - Getting Started with Natural Language Processing. It is available on Akademika.