Beautiful words from humans, for humans.
I started this project back in 2015 -- then it was just a single Python spider and a dump of poems from poetryfoundation.org. (You can still find the original code on the legacy
branch if you are interested.)
But now the year is 2025. The world has changed. And I feel we need poetry now more than ever before.
My mission is to create an open and refined poetry dataset:
-
Automated Collection
- Design a system to continuously scour the web for poetry
- Extract poems with full metadata
- Append findings to Avro files
- Prioritize poems from established authors
-
Community Collaboration
- Setup a bot to manually ingest poems submitted via Github issues
- Create a publicly available poetry store
- Develop a public API for the collection
-
Developer Resources
- Build Jupyter Notebooks to jumpstart developer exploration
- Publish a Python module for easy dataset interaction
LLM generated poems are not allowed in this dataset.
While AI can generate poems, this project celebrates human creativity. My reasons for excluding AI-generated content:
- Soulful Expression: My favorite part of poetry is catching a glimpse of the author's soul in their words.
- Data Management: Preventing endless AI-generated content from overwhelming our public dataset.
Want AI poems? Check out friuns2/BlackFriday-GPTs-Prompts