Image_caption

In this project, a video caption system that is able to generate image captions on the keyframe and detect sound events of each story unit is developed. For image caption, a simple model, ClipCap based on CLIP is used, which seems to be part of a new image captioning paradigm. The objective is to focus on utilizing current models while just training a small mapping network. Rather than learning new semantic entities, this strategy simply learns to adjust the pre-trained model's current semantic knowledge towards the style of the target dataset. We use the pre-trained models trained on different datasets offered in this Github repository: https://github.com/rmokady/CLIP_prefix_caption. The sound event detection with ASR is also implemented in this system to get more information about a video. The SED system can detect specific sound events defined here. Evaluation metrics like BLEU-1 to BLEU-4, CIDEr, SPICE, METEOR, and ROUGE-L will be used to evaluate the caption results.

The link to the final report is here.

Set up

$ pip install -r requirement.txt

For PySceneDetect, we use ffmpeg to split video. You can download ffmpeg from: https://ffmpeg.org/download.html

Note:

Linux users should use a package manager (e.g. sudo apt-get install ffmpeg).
Windows users may require additional steps for PySceneDetect to detect ffmpeg - see the section Manually Enabling split-video Support below for details.
macOS users can use Homebrew to install ffmpeg as below:
- $ brew uninstall ffmpeg
- $ brew tap homebrew-ffmpeg/ffmpeg
- $ brew install homebrew-ffmpeg/ffmpeg/ffmpeg
- $ brew install homebrew-ffmpeg/ffmpeg/ffmpeg --with-openh264

Preparation

Move the video you want to caption into the folder named video_uploads;
Download the COCO pre-trained model(Transformers), COCO pre-trained model(MLP+finetuning), Conceptual Caption pre-trained model(MLP+finetuning) and move it to the pretrained_models folder.

Run

Type the following code to start captioning. $ python caption_video.py -i covid.mp4 --model mlp -k

--input, -i: The name of the input video.
--model, -m: The type of CLIPCAP model used. The value can be either "mlp" or "transformer".
--keepframes, -k: Keep the keyframe images after image captioning or not.

You can also run keyframe extraction section, image captioning section, sound event detection section seperately. Please refer to the report's Implementation section.

Result

The image captioning results and sound events detected is stored in the "covid.mp4-OUTPUT-SED.json".

For more specific SED with ASR results, you can go to ./SEDwithASR/predict_results folder to see the .xml files for each story unit.

Evaluation

The evaluation of the image captioning results is based on custom reference captions. Ground truth captions should be referenced for video annotations. The annotations for every keyframe of a video are stored in a text file named 'referencen.txt' and put it into the Image_caption folder. The number of custom references has no limits.

Firstly, make sure the Python dependencies of nlg-eval are installed. If not, run:

$ pip install git+https://github.com/Maluuba/nlg-eval.git@master

Then set up the nlg-eval package

$ nlg-eval --setup

Run the following command to get the evaluation metrics:

$ nlg-eval --hypothesis=examples/hyp.txt --references=examples/ref1.txt --references=examples/ref2.txt

--hypothesis: The path to the files where stores the generated captions.
--references: The path to the reference files.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
SEDwithASR		SEDwithASR
clips		clips
frames		frames
video_uploads		video_uploads
README.md		README.md
SCSE21_0061_Liu_Yanli.pdf		SCSE21_0061_Liu_Yanli.pdf
caption.py		caption.py
caption_transformer.py		caption_transformer.py
caption_video.py		caption_video.py
covid.mp4-OUTPUT-SED.json		covid.mp4-OUTPUT-SED.json
image_caption.txt		image_caption.txt
reference1.txt		reference1.txt
reference2.txt		reference2.txt
reference3.txt		reference3.txt
reference4.txt		reference4.txt
requirement.txt		requirement.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Image_caption

Set up

Preparation

Run

Result

Evaluation

About

Uh oh!

Releases

Packages

Languages

yanli1215/Image_caption

Folders and files

Latest commit

History

Repository files navigation

Image_caption

Set up

Preparation

Run

Result

Evaluation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages