Skip to content

This is a combined workshop paper repo for The First Bangla Language Processing Workshop (BLP 2023) Co-located with EMNLP 2023 in Singapore

License

Notifications You must be signed in to change notification settings

rudro12356/nlp-blp-23-paper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

NLP Bangla Workshop Paper

This is a combined workshop paper repo for The First Bangla Language Processing Workshop (BLP 2023) Co-located with EMNLP 2023 in Singapore.

Link to the information on the workshop: https://blp-workshop.github.io/

Paper Abstract

Bangla music is a treasure trove of cultural her- itage that has been prevalent and thriving for more than a century. This paper presents a new Bangla music dataset with unique features that reflect the thematic, phonemic and stylometric evolution of Bangla music from the 20th to the 21st century. The dataset is accompanied by a thorough exploratory analysis to unfold the ever evolving elements of Bangla music from a temporal and lyrical perspective. Addition- ally, we show that our dataset is a good fit for various classification tasks using deep neural classifiers. We have strategically fine-tuned the BanglaBERT model to achieve an average accu- racy of 60% for various phonemic classification and artist identification from the lyrics.

Word cloud

image

Zipf's Law

image

Heatmap

image

KMEANS cluster

image

Note

Our paper has not been accepted. We have decided to make minor changes as suggested by the reviewers and upload to the Archive. Overall, it was a good hands on practise for me to get exposure to NLP with the help of my mentor Ishrak Hayet, who guided me throughout the journey of the work. Some places and things that I will take forward is that research is an iterative process. The clustering algorithm I used needed better data. I could have played more with the feature engineering and feature selection process. Our data collection process was a bit hectic and for which it was only around 300 observations, but it was a lot in terms of the token number considering the music lyrics were long. The overall field of AI is extremely vast, and there is no way to get better at it other than tring and trying and trying. The experience has been fruitful as I learnt how to nurture patience and also keep digging with the problems.

Achievement

We have been able to generate phonemic bangla datasets which are all cited in our paper. They are all open source and free to use under the MIT License. Hopefully, they will help someone in the future.

Collaborators: Ishraq Hayet, RokunuzJahan Rudro

About

This is a combined workshop paper repo for The First Bangla Language Processing Workshop (BLP 2023) Co-located with EMNLP 2023 in Singapore

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published