FastVideoTagging

About this project

Fast video tagging is a project to tagging a short video (about 15-30 second time length) in less than 150 ms. this is a application in short video understanding, for video multilabel-classification and video retrieval. there are three main body net for fast video tagging. All the model implemented by MXNet framework.

Basic video understanding Model

each of above method is based on several video classification paper

Multilabel-classfication Framework

The video tagging problems is a typical Multi-label classification problems.So we choose the following MLC framework

WARP(Weighted approximately ranking pairwise)Deep Convolutional Ranking for Multilabel Image Annotation
LSEP(Log-sum-exp piarwise)Improving Pairwise Ranking for Multi-label Image Classification
CNN-RNN UnifiedCNN-RNN: A Unified Framework for Multi-label Image Classification,Exploring CNN-RNN Architectures for Multilabel Classiﬁcation of the Amazon
SCNN-RNN Semantic Regularisation for Recurrent Image Annotation
RIA Annotation Order Matters: Recurrent Image Annotator for Arbitrary Length Image Tagging
Binary Relevance(BCE) Binary relevance for multi-label learning: an overview
So we use the four kinds of loss function or framework to optimize the deep model

Dataset

UCF101 ：UCF101 is a typical video single label multi-classification dataset
Ai-Challenge 2018 FastVideoTaging ：The Meitu short video tagging dataset.

DataLoader

unlike image data loader ,the video dataloader consume a lot time if not optimized.currently state of the art video decode and load in to memory method.

ffmpeg,just use ffmpeg to decode the key frame or frames near key frame.
nvvl&pynvvl,Nvidia proposed a library nvvl(nvidia video loader for abbreviation) to decode and loader video fast,there is a pytorch implementation in pynvvl,unfortunately, current nvvl does not adapt to different size and frame rate,worsely it would not free cuda memeory after fetch video sequence.
opencv,this is an easy way to get frames from video.just use VideoCapture to read frame.

Result

Achieved 92.6% Accuracy(Clip@1, prediction using only 1 clip) on UCF101 Dataset, which is 1.3% higher than the original Caffe2 model(Accuracy 91.3%).

Usage

Data Preparation

Training

$ python train.py --gpus 0,1,2,3,4,5,6,7 --pretrained ~/r2.5d_d34_l32.pkl --output ~/r2plus1d_output --batch_per_device 4 --lr 1e-4 
--model_depth 34 --wd 0.005 --num_class 101 --num_epoch 80

$ python train_r3d.py --gpus 0,1 --pretrained ./r2.5d_d34_l32.pkl --output ./output --dataset meitu --loss

train with loss type of Log sum exponent pairwise loss,use following command

& nohup python train_r3d.py --gpus 1 --pretrained ./output/test-0001.params --loss_type lsep_nn >mymeitu1.out 2>&1 &

train with loss type of weighted approximatly ranking pairwise loss,(WARP) use following command

$ nohup python train_r3d.py --gpus 1 --pretrained ./output/test-0001.params --loss_type warp_nn >mywarpnn.out 2>&1 &

Testing

Assume the training output directory is ~/r2plus1d_output and the epoch number we want to test is 80.

$ python validation.py --gpus 0 --output ~/r2plus1d_output --eval_epoch 80 --batch_per_device 48 --model_prefix test

The second implementation of R2+1D mxnet edition

training and validation

$ python train_r3d.py --gpus 1,2 --pretrained model.params

To do works

1.change the data loader to nvvl,fix the pynvvl bugs to adapted to different video size and video frame rate.
2.add a multi-label classification loss header
3.train a model with data meitu shot videos
4.write the cnn-rnn unified model structure

origin data

origin train log in /data/jh/notebooks/hudengjun/VideosFamous/R2Plus1D-MXNet

Entry Point file

train.py this is an implementaion for ucf101 sym writtened by Original
train_r3d.py this is an simple-meitu and simple ucf101 dataloader train
train_nvvl.py this is an nvvl-meitu dataloader train model
train_unified.py this is an cnn-rnn framework train model.not implemented.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
model		model
r2plus1d_output		r2plus1d_output
util		util
.gitignore		.gitignore
README.md		README.md
find_nvvl_error.py		find_nvvl_error.py
net.py		net.py
nvvl_meitu.py		nvvl_meitu.py
requirements.txt		requirements.txt
train.py		train.py
train_label_decision_thresh.py		train_label_decision_thresh.py
train_multi_tasknvvl.py		train_multi_tasknvvl.py
train_nvvl_r3d.py		train_nvvl_r3d.py
train_simple_r3d.py		train_simple_r3d.py
train_unified_r3d.py		train_unified_r3d.py
utils.py		utils.py
validation.py		validation.py
videos_reader.py		videos_reader.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FastVideoTagging

About this project

Basic video understanding Model

Multilabel-classfication Framework

Dataset

DataLoader

Result

Usage

Data Preparation

Training

Testing

The second implementation of R2+1D mxnet edition

training and validation

To do works

origin data

Entry Point file

About

Releases

Packages

Languages

bruceyang2012/FastVideoTagging

Folders and files

Latest commit

History

Repository files navigation

FastVideoTagging

About this project

Basic video understanding Model

Multilabel-classfication Framework

Dataset

DataLoader

Result

Usage

Data Preparation

Training

Testing

The second implementation of R2+1D mxnet edition

training and validation

To do works

origin data

Entry Point file

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages