text-difficulty-prediction

University of Michigan: Milestone Project 2

Project Description: Applied supervised and unsupervised learning techniques on Wikipedia text to predict sentences which will need to be simplified for readers to make it easier to understand. Readers may include students, children, adults with learning/reading disability, and non-native English speakers.

Project Workflow: This project contains 5 jupyter notebooks. It begins with extracting features from the original text and then goes on to implementing supervised and unsupervised learning models using extracted features and text tokenizers such as TFIDF, Sentence Piece, and Keras Tokenizer. The goal of doing this was to assess the effectiveness of feature representation in classifying text difficulty as well understand which steps in manual feature extraction worked well Vs could be improved in future.

Please refer to following jupyter notebooks for code implementation.

Text Difficulty-Feature Extraction-Final
Text Difficulty-Supervised Models-Final
Text Difficulty- Deep Learning-Final
Text Difficulty-Unsupervised Models-Final
Text Difficulty-Topic Modelling-Final Features extracted from the first notebook “Text Difficulty-Feature Extraction-Final” has been used extensively in all notebooks to save computational time.

Please click on the dataset to view the file.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
Dataset		Dataset
Project Notebook		Project Notebook
Project Report		Project Report
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

text-difficulty-prediction

About

Releases

Packages

Languages

License

psanghal/text-difficulty-prediction

Folders and files

Latest commit

History

Repository files navigation

text-difficulty-prediction

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages