EchoMind is a Dockerized Python project that extracts audio from video files, transcribes the audio using OpenAI's Whisper model, and sends the transcription to the Ollama LLM for further processing based on a provided prompt. The transcription and LLM results are saved in files named after the input file.
- Extracts audio from video files using
ffmpeg
. - Transcribes audio files using OpenAI's Whisper model inside the Docker container.
- Sends the transcription to the Ollama LLM API for further processing based on a provided prompt.
- Dynamically names the transcription and LLM output files based on the input filename.
- Docker
- A valid Ollama API URL and token.
First, you need to build the Docker image:
docker build -t echomind .
This project uses a config.yml
file to store important API details like the Ollama server URL and API token. Here’s an example of the config.yml
:
ollama_api_url: "http://10.0.0.242:3000/ollama/api/generate"
ollama_api_token: "your-api-token-here"
model_name: "smollm:1.7b"
- ollama_api_url: The URL for the Ollama API endpoint.
- ollama_api_token: Your Ollama API token.
- model_name: The name of the model used by Ollama for processing (e.g.,
smollm:1.7b
).
To run the project, you can pass the necessary input files (video or audio), prompt, and configuration as environment variables. The Docker container will handle the audio extraction, transcription, and LLM interaction.
You can run the Docker container with the following command, mapping the necessary files into the container:
docker run -it --rm \
-v /path/to/your/local/files:/app \
-e INPUT_FILE="file.mp4" \
-e PROMPT_FILE="meeting.md" \
-e CONFIG_FILE="config.yml" \
echomind
This command:
- Mounts the directory containing your files to the
/app
directory inside the Docker container. - Sets environment variables for the input file, prompt file, and config file.
The script will generate the following output files based on the input filename:
- Transcription File:
<input_filename>_transcription.txt
(e.g.,meeting_transcription.txt
formeeting.mp4
). - LLM Output File:
<input_filename>_llm_output.txt
(e.g.,meeting_llm_output.txt
formeeting.mp4
).
For an input file called meeting.mp4
, the following files will be generated:
meeting_transcription.txt
(for the transcription)meeting_llm_output.txt
(for the LLM output)
The script logs its activities into a file called echomind.log
. This log can be helpful for debugging if anything goes wrong during the process.
If the script encounters an error during audio extraction, transcription, or interaction with the Ollama API, appropriate error messages will be logged, and execution will halt.
https://www.youtube.com/watch?v=lBVtvOpU80Q
About 5 to 7 seconds
4 minutes
About 1 minute Model:mistral-small:22b
https://github.com/nothingmn/echomind/blob/main/meeting.md (needs work for sure) Mostly borrowed from fabric https://github.com/danielmiessler/fabric/blob/main/patterns/transcribe_minutes/system.md
This project is open source under the MIT License.