I have done the following sub-tasks using LLMs and "prompt engineering" which won't be trivial at all using traditional NLP tools like SpaCy.
- Character identification and name clustering (e.g., "Tom", "Tom Sawyer", "Mr. Sawyer", "Thomas Sawyer" -> TOM_SAWYER)
- Referential gender inference (TOM_SAWYER -> he/him/his)
- Dialogue attribution or quotation speaker identification with coreference resolution
- Producing the audio outputs of dialogues by converting Speech Synthesis Markup Language (SSML) input into audio data and finally sticthing them together as one audiobook.
I have used Gemini 1.5 model for the LLM part inspired by GCP's $150 credits and their integrated TTS API.