Video-Captioning-CS5422

Automatic video captioning, a final project for CS5422 Neural Networks and Deep Learning. This project uses neural network to produce a simple fixed 3-words caption (<noun> <verb> <noun>) for each sequence of video frames.

Neural network architecture

The neural network is composed of 4 main components:

Feature extractor using pretrained EfficientNet
Object classifier, a linear layer which sole purpose is to capture the presence of objects in each frame
Encoder, a linear layer which helps capture the action happening in each frame
Decoder, a RNN which produces the caption

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Video-Captioning-CS5422

Neural network architecture

About

Releases

Packages

Languages

ardiankr/Video-Captioning-CS5422

Folders and files

Latest commit

History

Repository files navigation

Video-Captioning-CS5422

Neural network architecture

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages