This is an AI-based audio analyzer app.
Within the scope of this technical specification, you need to implement the logic for audio analysis—breaking the audio into prompts that can then be used to create images in MidJourney or another image generation service. You can use any APIs; I used ASSEMBLYAI for converting audio to text and Langchain for generating prompts.
-
First, you need to fill the
.env
file with your data. TheAPI_KEY_ASSEMBLYAI
can be obtained for free by visiting their website: AssemblyAI. -
Example of data if you want to run it with Docker Compose.
-
Then fill the
docker-compose.env
file (if you want it to run via Docker Compose). -
Finally, run
docker-compose up --build
and wait for it to start.
- Full reliable JWT authentication with login, logout, and registration with reliable validation.
- Protected endpoints only for authorized users.
- Ability to create prompt tasks both with audio files and text.
- Optimized queryset to avoid the N+1 problem and a reliable database schema.
- Pagination to avoid large querysets.
- Docker and Docker Compose files.
- CRUD operations for the analyzer task, including bulk update/destroy, optimized for large datasets.
- Created Swagger documentation for endpoints via this link: Swagger Documentation.
- I did not manage to implement bulk update from prompts through nested serialization, so it is currently read-only.
- I planned to put the logic of converting audio and creating LLM prompts into Celery tasks (you may even see its configuration), but I decided not to do it for now.