Skip to content

Commit

Permalink
Major update codebase, adding CSN models, with IG65M pre-trained models
Browse files Browse the repository at this point in the history
  • Loading branch information
dutran committed Aug 1, 2019
1 parent f93bea5 commit a5235b3
Show file tree
Hide file tree
Showing 30 changed files with 2,035 additions and 547 deletions.
4 changes: 2 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Contributing to R2Plus1D
# Contributing to VMZ
We want to make contributing to this project as easy and transparent as
possible.

Expand Down Expand Up @@ -35,5 +35,5 @@ outlined on that page and do not file a public issue.
* ...

## License
By contributing to R2Plus1D, you agree that your contributions will be licensed
By contributing to VMZ, you agree that your contributions will be licensed
under the LICENSE file in the root directory of this source tree.
68 changes: 34 additions & 34 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,47 +1,47 @@
# R(2+1)D and Mixed-Convolutions for Action Recognition
# VMZ: Model Zoo for Video Modeling

![r2plus1d1](https://raw.githubusercontent.com/dutran/R2Plus1D/master/r2plus1d.png)
VMZ is a Caffe2 codebase for video modeling developed by the Computer Vision team at Facebook AI. The aim of this codebase is to help other researchers and industry practitioners:
+ reproduce some of our research results and
+ leverage our very strong pre-trained models.

[[project page](https://dutran.github.io/R2Plus1D/)] [[paper](https://arxiv.org/abs/1711.11248)]
Currently, this codebase supports the following models:
+ R(2+1)D, MCx models [[1]](https://research.fb.com/wp-content/uploads/2018/04/a-closer-look-at-spatiotemporal-convolutions-for-action-recognition.pdf).
+ CSN models [[2]](https://arxiv.org/pdf/1904.02811.pdf).
+ R(2+1)D and CSN models pre-trained on large-scale (65 million!) weakly-supervised public Instagram videos (**IG-65M**) [[3]](https://research.fb.com/wp-content/uploads/2019/05/Large-scale-weakly-supervised-pre-training-for-video-action-recognition.pdf).

If you find this work helpful for your research, please cite our following paper:
## Main Models

D. Tran, H. Wang, L. Torresani, J. Ray, Y. LeCun and M. Paluri. **A Closer Look at Spatiotemporal Convolutions for Action Recognition.** IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
We provide our latest video models including R(2+1)D, ir-CSN, ip-CSN (all with 152 layers) which are pre-trained on Sports-1M or **IG-65M**, then fine-tuned on Kinetics-400. Both pre-trained and fine-tuned models are provided in the table below. We hope these models will serve as valuable baselines and feature extractors for the related video modeling tasks such as action detection, video captioning, and video Q&A. More models, e.g. shallower or with shorter clip input are also provided in the [model zoo](tutorials/model_zoo.md).

```
@inproceedings{r2plus1d_cvpr18,
title = {A Closer Look at Spatiotemporal Convolutions for Action Recognition},
author = {Du Tran and Heng Wang and Lorenzo Torresani and Jamie Ray and Yann LeCun and
Manohar Paluri},
booktitle = {CVPR},
year = 2018
}
```
If you have any question or feedback about the code, please contact: [email protected], [email protected].
### R(2+1)D-152

## Requirements
R2Plus1D requires the following dependencies:
* [OpenCV](https://opencv.org) (tested on 3.4.1) and [ffmpeg](https://trac.ffmpeg.org).
* [Caffe2](https://caffe2.ai) and its dependencies.
* You will need to build from source and install with `USE_OPENCV=1 USE_FFMPEG=1 USE_LMDB=1 python setup.py install` for OpenCV, ffmpeg, and lmdb support.
* And lmdb, python-lmdb, and pandas.
| Input size | Pretrained dataset | Pretrained model | Video@1 Kinetics | Video@5 Kinetics | Finetuned model | GFLOPs | params(M) |
| ---------- | --------| ---- | ------- | ------- | -------- | ----- | ------ |
| 32x112x112 | Sports1M | [link](https://www.dropbox.com/s/w5cdqeyqukuaqt7/r2plus1d_152_sports1m_from_scratch_f127111290.pkl?dl=0) | 79.5 | 94.0 | [link](https://www.dropbox.com/s/twvcpe30rxuaf45/r2plus1d_152_ft_kinetics_from_sports1m_f128957437.pkl?dl=0) | 329.1 | 118.0 |
| 32x112x112 | IG-65M | [link](https://www.dropbox.com/s/oqdg176p7nqc84v/r2plus1d_152_ig65m_from_scratch_f106380637.pkl?dl=0) | 81.6 | 95.3 | [link](https://www.dropbox.com/s/tmxuae8ubo5gipy/r2plus1d_152_ft_kinetics_from_ig65m_f107107466.pkl?dl=0) | 329.1 | 118.0 |

## Installation
* You need to install ffmpeg, OpenCV, and caffe2. Caffe2 source build instructions can be found [here](https://caffe2.ai/docs/getting-started.html?configuration=compile) but make sure you install with `USE_OPENCV=1 USE_FFMPEG=1 USE_LMDB=1 python setup.py install`. You also need to install lmdb, python-lmdb, and pandas.

### ir-CSN-152
| Input size | Pretrained dataset | Pretrained model | Video@1 Kinetics | Video@5 Kinetics | Finetuned model | GFLOPS | params(M) |
| ---------- | ------| ------ | ------- | ------- | -------- | ----- | ------ |
| 32x224x224 | Sports1M | [link](https://www.dropbox.com/s/woh99y2hll1mlqv/irCSN_152_Sports1M_from_scratch_f99918785.pkl?dl=0) | 78.2 | 93.0 | [link](https://www.dropbox.com/s/zuoj1aqouh6bo6k/irCSN_152_ft_kinetics_from_Sports1M_f101599884.pkl?dl=0) | 96.7 | 29.6 |
| 32x224x224 | IG-65M | [link](https://www.dropbox.com/s/r0kppq7ox6c57no/irCSN_152_ig65m_from_scratch_f125286141.pkl?dl=0) | 82.6 | 95.3 | [link](https://www.dropbox.com/s/gmd8r87l3wmkn3h/irCSN_152_ft_kinetics_from_ig65m_f126851907.pkl?dl=0) | 96.7 | 29.6 |

## Tutorials
We provide some basic tutorials for you to get familar with the code and tools.
* [Installation Guide](tutorials/Installation_guide.md)
* [Training Kinetics from scratch](tutorials/kinetics_train.md)
* [Finetuning R(2+1)D on HMDB51](tutorials/hmdb51_finetune.md)
* [Dense prediction](tutorials/dense_prediction.md)
* [Feature extraction](tutorials/feature_extraction.md)
* [Download and evaluate pre-trained models](tutorials/models.md)
### ip-CSN-152
| Input size | Pretrained dataset | Pretrained model | Video@1 Kinetics | Video@5 Kinetics | Finetuned model | GFLOPS | params(M) |
| ---------- | ------ | ------ | ------- | ------- | -------- | ----- | ------ |
| 32x224x224 | Sports1M | [link](https://www.dropbox.com/s/70di7o7qz6gjq6x/ipCSN_152_Sports1M_from_scratch_f111018543.pkl?dl=0) | 78.8 | 93.5 | [link](https://www.dropbox.com/s/ir7cr0hda36knux/ipCSN_152_ft_kinetics_from_Sports1M_f111279053.pkl?dl=0) | 108.8 | 32.8 |
| 32x224x224 | IG-65M | TBD | TBD | TBD | TBD | 108.8 | 32.8 |


## References
1. D. Tran, H. Wang, L. Torresani, J. Ray, Y. LeCun and M. Paluri. **A Closer Look at Spatiotemporal Convolutions for Action Recognition.** CVPR 2018.
2. D. Tran, H. Wang, L. Torresani and M. Feiszli. **Video Classification with Channel-Separated Convolutional Networks.** ICCV 2019.
3. D. Ghadiyaram, M. Feiszli, D. Tran, X. Yan, H. Wang and D. Mahajan, **Large-scale weakly-supervised pre-training for video action recognition.** CVPR 2019.


## License
R2Plus1D is Apache 2.0 licensed, as found in the LICENSE file.
VMZ is Apache 2.0 licensed, as found in the LICENSE file.

### Acknowledgements
The authors would like to thank Ahmed Taei, Aarti Basant, Aapo Kyrola, and the Facebook Caffe2 team for their help in implementing ND-convolution, in optimizing video I/O, and in providing support for distributed training. We are grateful to Joao Carreira for sharing I3D results on the Kinetics validation set.
## Suporting Team
This codebase is actively supported by some members of CV team (Facebook AI): @CHJoanna, @weiyaowang, @bjuncek, @hengcv, @deeptigp, and @dutran.
19 changes: 10 additions & 9 deletions data/create_video_db.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import, division, print_function, unicode_literals

import lmdb
import pandas
import sys
Expand All @@ -37,7 +39,6 @@ def create_video_db(
# read csv list file
list = pandas.read_csv(list_file)


# checking necessary fields of the provided csv file
assert 'org_video' in list, \
"The input list does not have org_video column"
Expand Down Expand Up @@ -121,17 +122,17 @@ def main():
parser.add_argument("--output_file", type=str, default=None,
help="Path to output lmdb data")
parser.add_argument("--use_list", type=int, default=0,
help="0: write video encoded data to lmdb, " +
"1: write only full path to local video files")
help="0: write video encoded data to lmdb, "
+ "1: write only full path to local video files")
parser.add_argument("--use_video_id", type=int, default=0,
help="0: does not use video_id, " +
"1: write also video_id to lmdb")
help="0: does not use video_id, "
+ "1: write also video_id to lmdb")
parser.add_argument("--use_start_frame", type=int, default=0,
help="0: does not use start_frame, " +
"1: write also start_frame to lmdb")
help="0: does not use start_frame, "
+ "1: write also start_frame to lmdb")
parser.add_argument("--num_epochs", type=int, default=1,
help="Due to lmdb does not allow online shuffle" +
"we can write multiple shuffled list")
help="Due to lmdb does not allow online shuffle"
+ "we can write multiple shuffled list")
args = parser.parse_args()
create_video_db(
args.list_file,
Expand Down
Empty file added lib/models/builder/__init__.py
Empty file.
Loading

0 comments on commit a5235b3

Please sign in to comment.