University of Michigan: Milestone Project 2
Project Description: Applied supervised and unsupervised learning techniques on Wikipedia text to predict sentences which will need to be simplified for readers to make it easier to understand. Readers may include students, children, adults with learning/reading disability, and non-native English speakers.
Project Workflow: This project contains 5 jupyter notebooks. It begins with extracting features from the original text and then goes on to implementing supervised and unsupervised learning models using extracted features and text tokenizers such as TFIDF, Sentence Piece, and Keras Tokenizer. The goal of doing this was to assess the effectiveness of feature representation in classifying text difficulty as well understand which steps in manual feature extraction worked well Vs could be improved in future.
Please refer to following jupyter notebooks for code implementation.
- Text Difficulty-Feature Extraction-Final
- Text Difficulty-Supervised Models-Final
- Text Difficulty- Deep Learning-Final
- Text Difficulty-Unsupervised Models-Final
- Text Difficulty-Topic Modelling-Final Features extracted from the first notebook “Text Difficulty-Feature Extraction-Final” has been used extensively in all notebooks to save computational time.
Please click on the dataset to view the file.