Skip to content

Real time video synthesis for robotic control development.

License

Notifications You must be signed in to change notification settings

alina1021/synthesizeAI

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

synthesize.AI

An implementation of Video-to-Video Synthesis for real-time synthesis of realistic image sequences from depth image stream, designed for more robust robotic development in simulated environments.

This project comes in two repositories. This repository, for general purpose scripts and documentation, and a forked version of the vid2vid repository which is modified to support 1 channel depth image as input. The presentation slides for this project are provided as Google Slides.

Prerequisites

  • Ubuntu 16.04 LTS
  • Python 3
  • NVIDIA GPU (compute capability 6.0+) & CUDA cuDNN
  • PyTorch 0.4 or higher

Setup

Installation

  • Install the required python libraries:
    pip install dominate requests streamlit
  • Clone this repository to your home folder:
    cd ~
    git clone https://github.com/fniroui/synthesizeAI.git
    cd depth2room
  • Clone the forked version of the vid2vid repository which has been modified for this project:
    git clone https://github.com/fniroui/vid2vid.git
    cd vid2vid
  • Download and compile a snapshot of FlowNet2 by running:
    python scripts/download_flownet2.py
    
  • Download the FlowNet2 checkpoint:
    python scripts/download_models_flownet2.py
    

Dataset

  • The SceneNet RGB-D dataset is used in this project. Download the complete or partial training dataset.
  • Navigate to the synthesizeAI directory and run:
    python scripts/data/sceneNet_format.py --dir "sceneNet directory"
    
    with the directory of the downloaded dataset to move and format the dataset to ./vid2vid/datasets/Scenenet.

Testing

  • Download the model and extract it to the .vid2vid/checkpoints folder:
    https://drive.google.com/open?id=1ppXTHXsFaGB-vrNjJlPswWuVDrMka3zg
    
  • To use the provided test sequence located at ./vid2vid/dataset/sceneNet/test_A and test_B, run bash scripts/test/test_320.bash or:
    bash scripts/test/test_320.bash
    

Training

  • Download the dataset and format it by following the above instructions.
  • If you have a single GPU, run bash scripts/train/train_g1_320.sh or:
    cd ~/depth2room/vid2vid
    python train.py --name depth2room_320_0 --dataroot datasets/sceneNet --input_nc 1 --loadSize 320 --n_downsample_G 2 --n_frames_total 2 --n_scales_spatial 2 -num_D 3 --max_frames_per_gpu 4 --max_dataset_size 20 --tf_log --display_freq 10
  • For multi-GPU training, run bash scripts/train/train_320.sh or:
    cd ~/depth2room/vid2vid
    python train.py --name depth2room_320_8g --dataroot datasets/sceneNet --input_nc 1 --loadSize 320 --gpu_ids 0,1,2,3,4,5,6,7 --n_gpus_gen 4 --n_frames_total 6 --niter_step 2 --niter_fix_global 8 --num_D 3 --n_scales_spatial 2 --tf_log --display_freq 100 --max_dataset_size 50

Analysis

The current model, trained using 50 sequences, can generate 2 synthetic images every second using a single NVIDIA Tesla V100 GPU. The surfaces have some texture and shadows are being generated:

License

This project is licensed under the MIT License - see the LICENSE.md file for details and the license of the other projects used within this repository.

Attribution

Thank you to Ting-Chun Wang1, Ming-Yu Liu1, Jun-Yan Zhu2, Guilin Liu1, Andrew Tao1, Jan Kautz1, and Bryan Catanzaro1 for their fantastic work on Video-to-Video Synthesis.

1NVIDIA Corporation, 2MIT CSAIL

About

Real time video synthesis for robotic control development.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 82.6%
  • Shell 17.4%