Skip to content

Labelled data set of ShitMyReviewersSay quotes

Notifications You must be signed in to change notification settings

rordi/shitmyreviewerssay-data-set

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

"Shit My Reviewers Say" Data Set

Data set of 780 unique quotes collected from the ShitMyReviewersSay Tumblr blog (source: https://shitmyreviewerssay.tumblr.com/).

All quotes have been labeled with either one of the following categories:

  • unprofessional: The reviewer's comment is unprofessional, impolite or excessively harsh.
  • unconstructive: The reviewer's comment is unconstructive or very vague.

Certain quotes were duplicated or irrelevant and are removed from the dataset. Additionally, some quotes referred to English errors in papers but themselves contained English errors. Since these examples are evidently posted for irony, they are discarded from the dataset. This is because vector representations of similar comments without such English errors may be semantically very close. As the data set may use for training a model to detect unprofessional or unconstructive comments, it is important to avoid such examples.

Labelling was performed by Dietrich Rordorf, 2023 as part of a Master's thesis at the University of Applied Sciences Northwestern Switzerland (FHNW).

Somewhat related data sets can be found in:

About

Labelled data set of ShitMyReviewersSay quotes

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published