first commit

first commit first commit initial commit initial commit google colab link update citation update citation update citation update citation Pre public (#5) * initial commit * notebook * noteboob/google colab * noteboob/google colab * noteboob/google colab * bug fixes * depth vis * readme update * readme update * bug fixes * fancy readme * fancy readme * fix visualization * datagen + shape pretraining * readme update * update readme * readme+ datagen * poster link * data gen script * datalinks in configs link update readme notebook torch version change MIT license
zubair-irshad · May 18, 2022 · c4c8979 · c4c8979
commit c4c8979
Show file tree

Hide file tree

Showing 68 changed files with 501,111 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -0,0 +1,161 @@
+# CenterSnap: Single-Shot Multi-Object 3D Shape Reconstruction and Categorical 6D Pose and Size Estimation
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/centersnap-single-shot-multi-object-3d-shape/6d-pose-estimation-using-rgbd-on-camera25)](https://paperswithcode.com/sota/6d-pose-estimation-using-rgbd-on-camera25?p=centersnap-single-shot-multi-object-3d-shape)<img src="demo/Pytorch_logo.png" width="10%">
+
+This repository is the pytorch implementation of our paper:
+<a href="https://www.tri.global/" target="_blank">
+ <img align="right" src="demo/tri-logo.png" width="20%"/>
+</a>
+
+**CenterSnap: Single-Shot Multi-Object 3D Shape Reconstruction and Categorical 6D Pose and Size Estimation**<br>
+[__***Muhammad Zubair Irshad***__](https://zubairirshad.com), [Thomas Kollar](http://www.tkollar.com/site/), [Michael Laskey](https://www.linkedin.com/in/michael-laskey-4b087ba2/), [Kevin Stone](https://www.linkedin.com/in/kevin-stone-51171270/), [Zsolt Kira](https://faculty.cc.gatech.edu/~zk15/) <br>
+International Conference on Robotics and Automation (ICRA), 2022<br>
+
+[[Project Page](https://zubair-irshad.github.io/projects/CenterSnap.html)] [[arXiv](https://arxiv.org/abs/2203.01929)] [[PDF](https://arxiv.org/pdf/2203.01929.pdf)] [[Video](https://www.youtube.com/watch?v=Bg5vi6DSMdM)] [[Poster](https://zubair-irshad.github.io/projects/resources/Poster%7CCenterSnap%7CICRA2022.pdf)] 
+
+[![Explore CenterSnap in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/zubair-irshad/CenterSnap/blob/master/notebook/explore_CenterSnap.ipynb)<br>
+
+
+<p align="center">
+<img src="demo/POSE_CS.gif" width="100%">
+</p>
+
+<p align="center">
+<img src="demo/Method_CS.gif" width="100%">
+</p>
+
+## Citation
+
+If you find this repository useful, please consider citing:
+
+```
+@inproceedings{irshad2022centersnap,
+  title={CenterSnap: Single-Shot Multi-Object 3D Shape Reconstruction and Categorical 6D Pose and Size Estimation},
+  author={Muhammad Zubair Irshad and Thomas Kollar and Michael Laskey and Kevin Stone and Zsolt Kira},
+  journal={IEEE International Conference on Robotics and Automation (ICRA)},
+  year={2022},
+  url={https://arxiv.org/abs/2203.01929},
+}
+```
+
+### Contents
+<div class="toc">
+<ul>
+<li><a href="#-environment">💻 Environment</a></li>
+<li><a href="#-dataset">📊 Dataset</a></li>
+<li><a href="#-training-and-validate">✨ Training and Inference</a></li>
+<li><a href="#-faqs">📝 FAQ</a></li>
+</ul>
+</div>
+
+## 💻 Environment
+
+Create a python 3.8 virtual environment and install requirements:
+
+```bash
+cd $CenterSnap_Repo
+conda create -y --prefix ./env python=3.8
+conda activate ./env/
+./env/bin/python -m pip install --upgrade pip
+./env/bin/python -m pip install -r requirements.txt -f https://download.pytorch.org/whl/torch_stable.html
+```
+The code was built and tested on **cuda 10.2**
+
+## 📊 Dataset
+
+1. Download pre-processed dataset
+
+We recommend downloading the preprocessed dataset to train and evaluate CenterSnap model. Download and untar [Synthetic](https://tri-robotics-public.s3.amazonaws.com/centersnap/CAMERA.tar.gz) (868GB) and [Real](https://tri-robotics-public.s3.amazonaws.com/centersnap/Real.tar.gz) (70GB) datasets. These files contains all the training and validation you need to replicate our results.
+
+```
+cd $CenterSnap_REPO/data
+wget https://tri-robotics-public.s3.amazonaws.com/centersnap/CAMERA.tar.gz
+tar -xzvf CAMERA.tar.gz
+
+wget https://tri-robotics-public.s3.amazonaws.com/centersnap/Real.tar.gz
+tar -xzvf Real.tar.gz
+```
+
+The data directory structure should follow:
+
+```
+data
+├── CAMERA
+│   ├── train
+│   └── val_subset
+├── Real
+│   ├── train
+└── └── test
+```
+
+2. To prepare your own dataset, we provide additional scripts under [prepare_data](https://github.com/zubair-irshad/CenterSnap/tree/master/prepare_data).
+
+## ✨ Training and Inference
+
+1. Train on NOCS Synthetic (requires 13GB GPU memory):
+```bash
+./runner.sh net_train.py @configs/net_config.txt
+```
+
+Note than *runner.sh* is equivalent to using *python* to run the script. Additionally it sets up the PYTHONPATH and CenterSnap Enviornment Path automatically. 
+
+2. Finetune on NOCS Real Train (Note that good results can be obtained after finetuning on the Real train set for only a few epochs i.e. 1-5):
+```bash
+./runner.sh net_train.py @configs/net_config_real_resume.txt --checkpoint \path\to\best\checkpoint
+```
+
+3. Inference on a NOCS Real Test Subset
+
+<p align="center">
+<img src="demo/reconstruction.gif" width="100%">
+</p>
+
+Download a small NOCS Real subset from [[here](https://www.dropbox.com/s/yfenvre5fhx3oda/nocs_test_subset.tar.gz?dl=1)]
+
+```bash
+./runner.sh inference/inference_real.py @configs/net_config.txt --data_dir path_to_nocs_test_subset --checkpoint checkpoint_path_here
+```
+
+You should see the **visualizations** saved in ```results/CenterSnap```. Change the --ouput_path in *config.txt to save them to a different folder
+
+4. Optional (Shape Auto-Encoder Pre-training)
+
+We provide pretrained model for shape auto-encoder to be used for data collection and inference. Although our codebase doesn't require separately training the shape auto-encoder, if you would like to do so, we provide additional scripts under **external/shape_pretraining**
+
+
+## 📝 FAQ
+
+**1.** I am getting ```no cuda GPUs available``` while running colab. 
+
+- Ans: Make sure to follow this instruction to activate GPUs in colab:
+
+```
+Make sure that you have enabled the GPU under Runtime-> Change runtime type!
+```
+
+**2.** I am getting ```raise RuntimeError('received %d items of ancdata' %
+RuntimeError: received 0 items of ancdata``` 
+
+- Ans: Increase ulimit to 2048 or 8096 via ```uimit -n 2048```
+
+**3.** I am getting ``` RuntimeError: CUDA error: no kernel image is available for execution on the device``` or ``` You requested GPUs: [0] But your machine only has: [] ``` 
+
+- Ans: Check your pytorch installation with your cuda installation. Try the following:
+
+
+1. Installing cuda 10.2 and running the same script in requirements.txt
+
+2. Installing the relevant pytorch cuda version i.e. changing this line in the requirements.txt
+
+```
+torch==1.7.1
+torchvision==0.8.2
+```
+
+**4.** I am seeing zero val metrics in ***wandb***
+- Ans: Make sure you threshold the metrics. Since pytorch lightning's first validation check metric is high, it seems like all other metrics are zero. Please threshold manually to remove the outlier metric in wandb to see actual metrics.   
+
+## Acknowledgments
+* This code is built upon the implementation from [SimNet](https://github.com/ToyotaResearchInstitute/simnet)
+
+## Licenses
+* The source code is released under the [MIT license](https://opensource.org/licenses/MIT).
diff --git a/configs/inference.txt b/configs/inference.txt
@@ -0,0 +1,25 @@
+--checkpoint=../nocs_test_subset/checkpoint/centersnap_real.ckpt
+--max_steps=380000
+--model_file=models/panoptic_net.py
+--model_name=res_fpn
+--output=results/CenterSnap
+--train_path=file://data/Real/train
+--train_batch_size=32
+--train_num_workers=10
+--val_path=file://data/Real/test
+--val_batch_size=32
+--val_num_workers=10
+--optim_learning_rate=0.0006
+--optim_momentum=0.9
+--optim_weight_decay=1e-4
+--optim_poly_exp=0.9
+--optim_warmup_epochs=1
+--loss_seg_mult=1.0
+--loss_depth_mult=1.0
+--loss_vertex_mult=0.1
+--loss_rotation_mult=0.1
+--loss_heatmap_mult=100.0
+--loss_latent_emb_mult=0.1
+--loss_abs_pose_mult=0.1
+--loss_z_centroid_mult=0.1
+--wandb_name=NOCS_Inference_Real
diff --git a/configs/net_config.txt b/configs/net_config.txt
@@ -0,0 +1,24 @@
+--max_steps=380000
+--model_file=models/panoptic_net.py
+--model_name=res_fpn
+--output=results/CenterSnap_TrainSynthetic
+--train_path=file://data/CAMERA/train
+--train_batch_size=32
+--train_num_workers=10
+--val_path=file://data/CAMERA/val_subset
+--val_batch_size=32
+--val_num_workers=10
+--optim_learning_rate=0.0006
+--optim_momentum=0.9
+--optim_weight_decay=1e-4
+--optim_poly_exp=0.9
+--optim_warmup_epochs=1
+--loss_seg_mult=1.0
+--loss_depth_mult=1.0
+--loss_vertex_mult=0.1
+--loss_rotation_mult=0.1
+--loss_heatmap_mult=100.0
+--loss_latent_emb_mult=0.1
+--loss_abs_pose_mult=0.1
+--loss_z_centroid_mult=0.1
+--wandb_name=NOCS_Train_Synthetic
diff --git a/configs/net_config_real_resume.txt b/configs/net_config_real_resume.txt
@@ -0,0 +1,25 @@
+--max_steps=240000
+--finetune_real=True
+--model_file=models/panoptic_net.py
+--model_name=res_fpn
+--output=results/CenterSnap_FinetuneReal
+--train_path=file://data/Real/train
+--train_batch_size=32
+--train_num_workers=5
+--val_path=file://data/Real/test
+--val_batch_size=32
+--val_num_workers=5
+--optim_learning_rate=0.0006
+--optim_momentum=0.9
+--optim_weight_decay=1e-4
+--optim_poly_exp=0.9
+--optim_warmup_epochs=1
+--loss_seg_mult=1.0
+--loss_depth_mult=1.0
+--loss_vertex_mult=0.1
+--loss_rotation_mult=0.1
+--loss_heatmap_mult=100.0
+--loss_latent_emb_mult=0.1
+--loss_abs_pose_mult=0.1
+--loss_z_centroid_mult=0.1
+--wandb_name=NOCS_Real_Finetune
diff --git a/demo/Method_CS.gif b/demo/Method_CS.gif
diff --git a/demo/POSE_CS.gif b/demo/POSE_CS.gif
diff --git a/demo/Pytorch_logo.png b/demo/Pytorch_logo.png
diff --git a/demo/reconstruction.gif b/demo/reconstruction.gif
diff --git a/demo/tri-logo.png b/demo/tri-logo.png
diff --git a/env/.gitignore b/env/.gitignore
@@ -0,0 +1,3 @@
+*
+*/
+!.gitignore
diff --git a/external/shape_pretraining/README.md b/external/shape_pretraining/README.md
@@ -0,0 +1,41 @@
+## Shape Autoencoder Pre-training<br>
+Shape pretraining code is adapted from [object-deformnet](https://github.com/mentian/object-deformnet).
+
+### Install dependencies
+
+```
+conda activate ./env/
+cd $CenterSnap_Repo
+conda install -c bottler nvidiacub
+conda install -c conda-forge -c fvcore -c iopath fvcore iopath
+./env/bin/python -m pip install "git+https://github.com/facebookresearch/[email protected]"
+```
+
+### Dataset Prepration
+1. Download [object models](http://download.cs.stanford.edu/orion/nocs/obj_models.zip) provided by [NOCS](https://github.com/hughw19/NOCS_CVPR2019)
+
+2. Download NOCS [preprocess data](https://www.dropbox.com/s/8im9fzopo71h6yw/nocs_preprocess.tar.gz?dl=1)
+
+Unzip and organize these files in $CenterSnap/data as follows:
+```
+data
+├── obj_models
+    ├── train
+    ├── val
+    ├── real_train
+    ├── real_test
+    ├── mug_meta.pkl
+```
+
+2. Prepare data:
+
+```
+./runner.sh external/shape_pretraining/shape_data.py --obj_model_dir \path\to\object-model\dir
+```
+A file would generate in ***obj_models*** folder named ***ShapeNetCore_2048.h5***
+
+3. Train shape auto-encoder:
+```
+cd external/shape_pretraining
+./runner.sh external/shape_pretraining\train_ae.py --h5_file \path\to\h5_file
+```
diff --git a/external/shape_pretraining/dataset/__pycache__/shape_dataset.cpython-38.pyc b/external/shape_pretraining/dataset/__pycache__/shape_dataset.cpython-38.pyc
diff --git a/external/shape_pretraining/dataset/shape_dataset.py b/external/shape_pretraining/dataset/shape_dataset.py
@@ -0,0 +1,38 @@
+import h5py
+import numpy as np
+import torch.utils.data as data
+
+class ShapeDataset(data.Dataset):
+    def __init__(self, h5_file, mode, n_points=1024, augment=False):
+        assert (mode == 'train' or mode == 'val'), 'Mode must be "train" or "val".'
+        self.mode = mode
+        self.n_points = n_points
+        self.augment = augment
+        # load data from h5py file
+        with h5py.File(h5_file, 'r') as f:
+            self.length = f[self.mode].attrs['len']
+            self.data = f[self.mode]['data'][:]
+            self.label = f[self.mode]['label'][:]
+        # augmentation parameters
+        self.sigma = 0.01
+        self.clip = 0.02
+        self.shift_range = 0.02
+
+    def __len__(self):
+        return self.length
+
+    def __getitem__(self, index):
+        xyz = self.data[index]
+        label = self.label[index] - 1    # data saved indexed from 1
+        # randomly downsample
+        np_data = xyz.shape[0]
+        assert np_data >= self.n_points, 'Not enough points in shape.'
+        idx = np.random.choice(np_data, self.n_points)
+        xyz = xyz[idx, :]
+        # data augmentation
+        if self.augment:
+            jitter = np.clip(self.sigma*np.random.randn(self.n_points, 3), -self.clip, self.clip)
+            xyz[:, :3] += jitter
+            shift = np.random.uniform(-self.shift_range, self.shift_range, (1, 3))
+            xyz[:, :3] += shift
+        return xyz, label
diff --git a/external/shape_pretraining/model/__pycache__/auto_encoder.cpython-38.pyc b/external/shape_pretraining/model/__pycache__/auto_encoder.cpython-38.pyc