An implementation of Video-to-Video Synthesis for real-time synthesis of realistic image sequences from depth image stream, designed for more robust robotic development in simulated environments.
This project comes in two repositories. This repository, for general purpose scripts and documentation, and a forked version of the vid2vid repository which is modified to support 1 channel depth image as input. The presentation slides for this project are provided as Google Slides.
- Ubuntu 16.04 LTS
- Python 3
- NVIDIA GPU (compute capability 6.0+) & CUDA cuDNN
- PyTorch 0.4 or higher
- Install the required python libraries:
pip install dominate requests streamlit
- Clone this repository to your home folder:
cd ~ git clone https://github.com/fniroui/synthesizeAI.git cd depth2room
- Clone the forked version of the vid2vid repository which has been modified for this project:
git clone https://github.com/fniroui/vid2vid.git cd vid2vid
- Download and compile a snapshot of FlowNet2 by running:
python scripts/download_flownet2.py
- Download the FlowNet2 checkpoint:
python scripts/download_models_flownet2.py
- The SceneNet RGB-D dataset is used in this project. Download the complete or partial training dataset.
- Navigate to the synthesizeAI directory and run:
with the directory of the downloaded dataset to move and format the dataset to
python scripts/data/sceneNet_format.py --dir "sceneNet directory"
./vid2vid/datasets/Scenenet
.
- Download the model and extract it to the
.vid2vid/checkpoints
folder:https://drive.google.com/open?id=1ppXTHXsFaGB-vrNjJlPswWuVDrMka3zg
- To use the provided test sequence located at
./vid2vid/dataset/sceneNet/test_A and test_B
, runbash scripts/test/test_320.bash
or:bash scripts/test/test_320.bash
- Download the dataset and format it by following the above instructions.
- If you have a single GPU, run
bash scripts/train/train_g1_320.sh
or:cd ~/depth2room/vid2vid python train.py --name depth2room_320_0 --dataroot datasets/sceneNet --input_nc 1 --loadSize 320 --n_downsample_G 2 --n_frames_total 2 --n_scales_spatial 2 -num_D 3 --max_frames_per_gpu 4 --max_dataset_size 20 --tf_log --display_freq 10
- For multi-GPU training, run
bash scripts/train/train_320.sh
or:cd ~/depth2room/vid2vid python train.py --name depth2room_320_8g --dataroot datasets/sceneNet --input_nc 1 --loadSize 320 --gpu_ids 0,1,2,3,4,5,6,7 --n_gpus_gen 4 --n_frames_total 6 --niter_step 2 --niter_fix_global 8 --num_D 3 --n_scales_spatial 2 --tf_log --display_freq 100 --max_dataset_size 50
The current model, trained using 50 sequences, can generate 2 synthetic images every second using a single NVIDIA Tesla V100 GPU. The surfaces have some texture and shadows are being generated:
This project is licensed under the MIT License - see the LICENSE.md file for details and the license of the other projects used within this repository.
Thank you to Ting-Chun Wang1, Ming-Yu Liu1, Jun-Yan Zhu2, Guilin Liu1, Andrew Tao1, Jan Kautz1, and Bryan Catanzaro1 for their fantastic work on Video-to-Video Synthesis.
1NVIDIA Corporation, 2MIT CSAIL