CleverCaption is a Python tool that processes images in subfolders of a given directory, generates captions using a remote API, and saves the results in text files corresponding to each image.
- Processes multiple images in bulk from nested folder structures.
- Utilizes a remote API to generate captions based on image content.
- Presents a progress UI using Tkinter.
- Handles concurrent API requests and manages timeouts.
- Converts images to base64 for API submission.
- Saves caption results as
.txt
files alongside images.
Before you begin, ensure you have met the following requirements:
- Python 3.x installed
- Required Python packages installed:
- requests
- Pillow
- httpx
- asyncio
If conda is installed simply run the install.bat to create a conda environment.
To install CleverCaption manually, follow these steps:
- Clone or download the repository to your local machine.
- Use pip to install the necessary packages:
pip install requests Pillow httpx asyncio
If using the conda install method, simply double-click run.bat.
To use CleverCaption, follow these steps:
-
Ensure your images are organized into subfolders within a master folder.
-
Run the main script with the path to the master folder:
python CleverCaption.py --folder "path/to/your/master/folder"
If you don't provide a folder path, a GUI will prompt you to select a folder.
-
The progress of the captioning process can be monitored through the GUI that pops up.
For best results I recommend modifying the prompt
and caption_start_template
in config.json
to suit your needs.
Follow these steps to configure and use CleverCaption with OOBA BOOGA WebUI and the LLAVA multimodal model:
- Set up the OOBA BOOGA WebUI from its GitHub repository.
- Run OOBA BOOGA with the multimodal model using the following switches:
If using the ooba 1-click install/run the flags can be added to
--multimodal-pipeline llava-llama-2-13b --extensions multimodal --api
text-generation-webui\CMD_FLAGS.txt
- Access the LLAVA model on Hugging Face.
- Modify the
config.json
file within the LLAVA model directory:- Change
"model_type": "llava"
to"model_type": "llama"
.
- Change
Ensure all configurations are set before running the tool. The instructions above should work alongside the provided CleverCaption documentation and OOBA BOOGA's guidelines.
- Devise a best method to allow multi-processing (ooba bottleneck)
- Update UI
- Update Console Logging
- Single Folder Mode
Enhance and document runtime text replacement (folder in prompt, image name in prompt and caption start)- Processing Character and Details, and concept text files for increased information.
- Semi - automatic Character Tagging Module.
Contributions to CleverCaption are welcome. If you have a suggestion that would make this better, please fork the repo and create a pull request.
Distributed under the MIT License. See LICENSE
for more information.