NLP Bangla Workshop Paper

This is a combined workshop paper repo for The First Bangla Language Processing Workshop (BLP 2023) Co-located with EMNLP 2023 in Singapore.

Link to the information on the workshop: https://blp-workshop.github.io/

Paper Abstract

Bangla music is a treasure trove of cultural her- itage that has been prevalent and thriving for more than a century. This paper presents a new Bangla music dataset with unique features that reflect the thematic, phonemic and stylometric evolution of Bangla music from the 20th to the 21st century. The dataset is accompanied by a thorough exploratory analysis to unfold the ever evolving elements of Bangla music from a temporal and lyrical perspective. Addition- ally, we show that our dataset is a good fit for various classification tasks using deep neural classifiers. We have strategically fine-tuned the BanglaBERT model to achieve an average accu- racy of 60% for various phonemic classification and artist identification from the lyrics.

Word cloud

Zipf's Law

Heatmap

KMEANS cluster

Note

Our paper has not been accepted. We have decided to make minor changes as suggested by the reviewers and upload to the Archive. Overall, it was a good hands on practise for me to get exposure to NLP with the help of my mentor Ishrak Hayet, who guided me throughout the journey of the work. Some places and things that I will take forward is that research is an iterative process. The clustering algorithm I used needed better data. I could have played more with the feature engineering and feature selection process. Our data collection process was a bit hectic and for which it was only around 300 observations, but it was a lot in terms of the token number considering the music lyrics were long. The overall field of AI is extremely vast, and there is no way to get better at it other than tring and trying and trying. The experience has been fruitful as I learnt how to nurture patience and also keep digging with the problems.

Achievement

We have been able to generate phonemic bangla datasets which are all cited in our paper. They are all open source and free to use under the MIT License. Hopefully, they will help someone in the future.

Collaborators: Ishraq Hayet, RokunuzJahan Rudro

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
EDA_BNLP.ipynb		EDA_BNLP.ipynb
LICENSE		LICENSE
MusicDataset_EMNLP_BNLP_Workshop.pdf		MusicDataset_EMNLP_BNLP_Workshop.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP Bangla Workshop Paper

Paper Abstract

Word cloud

Zipf's Law

Heatmap

KMEANS cluster

Note

Achievement

About

Releases

Packages

Languages

License

rudro12356/nlp-blp-23-paper

Folders and files

Latest commit

History

Repository files navigation

NLP Bangla Workshop Paper

Paper Abstract

Word cloud

Zipf's Law

Heatmap

KMEANS cluster

Note

Achievement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages