Coding-Assignment-Data-Bias

Examine a dataset of internet comments and their scores, in addition to formulating your own queries and using the Perspective API client to score the toxicity of each comment. This assignment was designed to investigate the concept of bias by interrogating an established natural language processing model, specifically the Perspective API released by Google Jigsaw.

It is clear that body shaming is a common form of online harassment, with fat shaming often receiving more attention than skinny shaming. Skinny shaming tends to go unnoticed and is usually disregarded, despite both of them being equally harmful to individuals. Based on harmful online comments, I chose this as my research topic and gathered data of body shaming comments from various sources, including TikTok, online comment sections, and popular media like movies and TV shows.

To conduct this study, I made an equal number of six harmful comments for both fat-shaming and skinny-shaming from the sources and developed a hypothesis based on this data. My hypothesis says that the API would identify fat-shaming comments as more toxic due to the associations between certain words and harm, while skinny-shaming comments may be seen as less toxic, even though they both could cause mental stress.

I wrote all 12 comments, starting with the six fat-shaming comments followed by the six skinny-shaming comments, and ran the toxicity assessment. The results showed significantly higher toxicity scores for the fat-shaming comments, all passing the point 0.5, meaning that they are toxic. And only two of the skinny-shaming comments pass the 0.5 point, even though they both had a similar level of harshness which we can see in offensive online comments. After the analysis, the skinny shaming comments only had 2 toxic comments.

After this, I calculated the average toxicity scores for both fat-shaming and skinny-shaming comments. The results showed a lot of differences, with fat-shaming comments being significantly more toxic than the skinny-shaming comments. These findings support the hypothesis, saying that the API displays fat-shaming comments as more toxic than skinny-shaming comments, despite both being harmful.

When it comes to the biases that may exist in the model, it is a possibility that the model reflects biases associated with societal norms. In this context, the bias appears to come from the common belief that being “fat” is often associated with unhealthiness and laziness, even though this is not true. And being “skinny” on the other hand is often taken as a compliment or as the normal beauty standard. These common biases within society may have influenced the model's results.

For theories explaining the results, one theory is that fat shaming being more offensive than skinny shaming is a reflection of what people think and feel in society. It is very important to see that both forms of body shaming are equally harmful, and the varying results from the model likely come from these associations and societal norms. The results, with fat shaming comments being considered toxic and skinny shaming comments being considered non-toxic, could be attributed to these societal biases.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Coding_Assignment_Data_Bias.ipynb		Coding_Assignment_Data_Bias.ipynb
Final Body Shaming Data List.csv		Final Body Shaming Data List.csv
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Coding-Assignment-Data-Bias

About

Releases

Packages

Languages

License

RiyaPottapinjara/Coding-Assignment-Data-Bias

Folders and files

Latest commit

History

Repository files navigation

Coding-Assignment-Data-Bias

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages