-
Notifications
You must be signed in to change notification settings - Fork 640
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
v1.2.1 Added Zynq Ultrascale Plus Whole App examples Updated U50 XRT and shell to Xilinx-u50-gen3x4-xdma-2-202010.1-2902115 Updated docker launch instructions Updated TRD makefile instructions
- Loading branch information
Showing
55 changed files
with
4,122 additions
and
348 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,5 +15,6 @@ | |
# */ | ||
|
||
|
||
|
||
# compress bitstream | ||
set_property BITSTREAM.GENERAL.COMPRESS TRUE [current_design] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,126 @@ | ||
# Whole Application Acceleration: Accelerating ML Preprocessing for Classification and Detection networks | ||
|
||
## Introduction | ||
|
||
This application demonstrates how Xilinx® [Vitis Vision library](https://github.com/Xilinx/Vitis_Libraries/tree/master/vision) functions can be integrated with deep neural network (DNN) accelerator to achieve complete application acceleration. This application focuses on accelerating the pre-processing involved in inference of object detection networks. | ||
|
||
## Background | ||
|
||
Input images are preprocessed before being fed for inference of different deep neural networks. The pre-processing steps vary from network to network. For example, for classification networks like Resnet-50 the input image is resized to 224 x 224 size and then channel-wise mean subtraction is performed before feeding the data to the DNN accelerator. For detection networks like YOLO v3 the input image is resized to 256 x 512 size using letterbox before feeding the data to the DNN accelerator. | ||
|
||
|
||
[Vitis Vision library](https://github.com/Xilinx/Vitis_Libraries/tree/master/vision) provides functions optimized for FPGA devices that are drop-in replacements for standard OpenCV library functions. This application demonstrates how Vitis Vision library functions can be used to accelerate pre-processing. | ||
|
||
## Resnet50 | ||
|
||
Currently, applications accelerating pre-processing for classification networks (Resnet-50) is provided and can only run on ZCU102 board (device part xczu9eg-ffvb1156-2-e). In this application, software JPEG decoder is used for loading input image. Three processes are created one for image loading , one for running pre-processing kernel and one for running the ML accelerator. JPEG decoder transfer input image data to pre-processing kernel over queue and the pre-processed data is transferred to the ML accelerator over a queue. Below image shows the inference pipeline. | ||
|
||
|
||
<div align="center"> | ||
<img width="75%" height="75%" src="./doc_images/block_dia_classification.PNG"> | ||
</div> | ||
|
||
## ADAS detection | ||
|
||
ADAS (Advanced Driver Assistance Systems) application | ||
using YOLO-v3 network model is an example for object detection. | ||
Accelerating pre-processing for YOLO-v3 is provided and can only run on ZCU102 board (device part xczu9eg-ffvb1156-2-e). In this application, software JPEG decoder is used for loading input image. Three processes are created one for image loading , one for running pre-processing kernel and one for running the ML accelerator. JPEG decoder transfer input image data to pre-processing kernel over queue and the pre-processed data is transferred to the ML accelerator over a queue. Below image shows the inference pipeline. | ||
|
||
<div align="center"> | ||
<img width="75%" height="75%" src="./doc_images/block_dia_adasdetection.PNG"> | ||
</div> | ||
|
||
|
||
## Running the Application | ||
### Setting Up the Target | ||
**To improve the user experience, the Vitis AI Runtime packages have been built into the board image. Therefore, user does not need to install Vitis AI | ||
Runtime packages on the board separately.** | ||
|
||
1. Installing a Board Image. | ||
* Download the SD card system image files from the following links: | ||
|
||
[ZCU102](https://www.xilinx.com/bin/public/openDownload?filename=xilinx-zcu102-dpu-v2020.1-v1.2.0.img.gz) | ||
|
||
Note: The version of the board image should be 2020.1 or above. | ||
* Use Etcher software to burn the image file onto the SD card. | ||
* Insert the SD card with the image into the destination board. | ||
* Plug in the power and boot the board using the serial port to operate on the system. | ||
* Set up the IP information of the board using the serial port. | ||
You can now operate on the board using SSH. | ||
|
||
2. Update the system image files. | ||
* Download the [waa_system_v1.2.0.tar.gz](https://www.xilinx.com/bin/public/openDownload?filename=waa_system_v1.2.0.tar.gz). | ||
* Copy the `waa_system_v1.2.0.tar.gz` to the board using scp. | ||
``` | ||
scp waa_system_v1.2.0.tar.gz root@IP_OF_BOARD:~/ | ||
``` | ||
* Update the system image files on the target side | ||
``` | ||
cd ~ | ||
tar -xzvf waa_system_v1.2.0.tar.gz | ||
cp waa_system_v1.2.0/sd_card/* /mnt/sd-mmcblk0p1/ | ||
cp /mnt/sd-mmcblk0p1/dpu.xclbin /usr/lib/ | ||
ln -s /usr/lib/dpu.xclbin /mnt/dpu.xclbin | ||
cp waa_system_v1.2.0/lib/* /usr/lib/ | ||
reboot | ||
``` | ||
**Note that `waa_system_v1.2.0.tar.gz` can only be used for ZCU102.** | ||
|
||
### Running The Examples | ||
Before running the examples on the target, please copy the examples and images to the target. | ||
|
||
1. Copy the examples to the board using scp. | ||
``` | ||
scp -r Vitis-AI/VART/Whole-App-Acceleration root@IP_OF_BOARD:~/ | ||
``` | ||
2. Prepare the images for the test | ||
|
||
For resnet50_mt_py_waa example, download the images at http://image-net.org/download-images and copy 1000 images to `Vitis-AI/VART/Whole-App-Acceleration/resnet50_mt_py_waa/images` | ||
|
||
For adas_detection_waa example, download the images at https://cocodataset.org/#download and copy the images to `Vitis-AI/VART/Whole-App-Acceleration/adas_detection_waa/data` | ||
|
||
3. Compile and run the program on the target | ||
|
||
For resnet50_mt_py_waa example, please refer to [resnet50_mt_py_waa readme](./resnet50_mt_py_waa/readme) | ||
|
||
For adas_detection_waa example, please refer to [adas_detection_waa readme](./adas_detection_waa/readme) | ||
|
||
### Performance: | ||
Below table shows the comparison of througput achieved by acclerating the pre-processing pipeline on FPGA. | ||
For `Resnet-50`, the performance numbers are achieved by running 1K images randomly picked from ImageNet dataset. | ||
For `YOLO v3`, the performance numbers are achieved by running 5K images randomly picked from COCO dataset. | ||
|
||
FPGA: ZCU102 | ||
|
||
|
||
<table style="undefined;table-layout: fixed; width: 534px"> | ||
<colgroup> | ||
<col style="width: 119px"> | ||
<col style="width: 136px"> | ||
<col style="width: 145px"> | ||
<col style="width: 134px"> | ||
</colgroup> | ||
<tr> | ||
<th rowspan="2">Network</th> | ||
<th colspan="2">E2E Throughput (fps)</th> | ||
<th rowspan="2"><span style="font-weight:bold">Percentage improvement in throughput</span></th> | ||
</tr> | ||
<tr> | ||
<td>with software Pre-processing</td> | ||
<td>with hardware Pre-processing</td> | ||
</tr> | ||
|
||
<tr> | ||
<td>Resnet-50</td> | ||
<td>52.60</td> | ||
<td>62.94</td> | ||
<td>19.66%</td> | ||
</tr> | ||
|
||
<tr> | ||
<td>YOLO v3</td> | ||
<td>7.6</td> | ||
<td>14.9</td> | ||
<td>96.05%</td> | ||
</tr> | ||
</table> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
# | ||
# Copyright 2019 Xilinx Inc. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# | ||
|
||
CXX=${CXX:-g++} | ||
name=$(basename $PWD) | ||
$CXX -O2 -w\ | ||
-fno-inline \ | ||
-I. \ | ||
-o $name \ | ||
-std=c++17 \ | ||
src/main.cc \ | ||
src/common.cpp \ | ||
src/xcl2.cpp \ | ||
-lvart-runner \ | ||
-lopencv_videoio \ | ||
-lopencv_imgcodecs \ | ||
-lopencv_highgui \ | ||
-lopencv_imgproc \ | ||
-lopencv_core \ | ||
-lpthread \ | ||
-lxilinxopencl \ | ||
-lglog \ | ||
-lunilog \ | ||
-lxir |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
/* | ||
* Copyright 2019 Xilinx Inc. | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
1. build & run adas_detection_waa | ||
./build.sh | ||
export XILINX_XRT=/usr | ||
mkdir output #Will be written to the picture after processing | ||
|
||
# ./adas_detection yolov3_adas_pruned_0_9.elf \ | ||
# 0(Use software preprocessing, 1-use hardware preprocessing) | ||
# e.g. | ||
|
||
sample : ./adas_detection_waa yolov3_adas_pruned_0_9.elf 0 | ||
output : | ||
Performance:7.6 FPS | ||
|
||
sample : ./adas_detection_waa yolov3_adas_pruned_0_9.elf 1 | ||
output : | ||
Found Platform | ||
Platform Name: Xilinx | ||
INFO: Reading /usr/lib/dpu.xclbin | ||
Loading: '/usr/lib/dpu.xclbin' | ||
Performance:14.9 FPS | ||
|
91 changes: 91 additions & 0 deletions
91
VART/Whole-App-Acceleration/adas_detection_waa/src/common.cpp
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,91 @@ | ||
|
||
/* | ||
* Copyright 2019 Xilinx Inc. | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
#include "common.h" | ||
|
||
#include <cassert> | ||
#include <numeric> | ||
int getTensorShape(vart::Runner* runner, GraphInfo* shapes, int cntin, | ||
int cntout) { | ||
auto outputTensors = runner->get_output_tensors(); | ||
auto inputTensors = runner->get_input_tensors(); | ||
if (shapes->output_mapping.empty()) { | ||
shapes->output_mapping.resize((unsigned)cntout); | ||
std::iota(shapes->output_mapping.begin(), shapes->output_mapping.end(), 0); | ||
} | ||
for (int i = 0; i < cntin; i++) { | ||
auto dim_num = inputTensors[i]->get_dim_num(); | ||
if (dim_num == 4) { | ||
shapes->inTensorList[i].channel = inputTensors[i]->get_dim_size(3); | ||
shapes->inTensorList[i].width = inputTensors[i]->get_dim_size(2); | ||
shapes->inTensorList[i].height = inputTensors[i]->get_dim_size(1); | ||
shapes->inTensorList[i].size = | ||
inputTensors[i]->get_element_num() / inputTensors[0]->get_dim_size(0); | ||
} else if (dim_num == 2) { | ||
shapes->inTensorList[i].channel = inputTensors[i]->get_dim_size(1); | ||
shapes->inTensorList[i].width = 1; | ||
shapes->inTensorList[i].height = 1; | ||
shapes->inTensorList[i].size = | ||
inputTensors[i]->get_element_num() / inputTensors[0]->get_dim_size(0); | ||
} | ||
} | ||
for (int i = 0; i < cntout; i++) { | ||
auto dim_num = outputTensors[shapes->output_mapping[i]]->get_dim_num(); | ||
if (dim_num == 4) { | ||
shapes->outTensorList[i].channel = | ||
outputTensors[shapes->output_mapping[i]]->get_dim_size(3); | ||
shapes->outTensorList[i].width = | ||
outputTensors[shapes->output_mapping[i]]->get_dim_size(2); | ||
shapes->outTensorList[i].height = | ||
outputTensors[shapes->output_mapping[i]]->get_dim_size(1); | ||
shapes->outTensorList[i].size = | ||
outputTensors[shapes->output_mapping[i]]->get_element_num() / | ||
outputTensors[shapes->output_mapping[0]]->get_dim_size(0); | ||
} else if (dim_num == 2) { | ||
shapes->outTensorList[i].channel = | ||
outputTensors[shapes->output_mapping[i]]->get_dim_size(1); | ||
shapes->outTensorList[i].width = 1; | ||
shapes->outTensorList[i].height = 1; | ||
shapes->outTensorList[i].size = | ||
outputTensors[shapes->output_mapping[i]]->get_element_num() / | ||
outputTensors[shapes->output_mapping[0]]->get_dim_size(0); | ||
} | ||
} | ||
return 0; | ||
} | ||
|
||
static int find_tensor(std::vector<const xir::Tensor*> tensors, | ||
const std::string& name) { | ||
int ret = -1; | ||
for (auto i = 0u; i < tensors.size(); ++i) { | ||
if (tensors[i]->get_name().find(name) != std::string::npos) { | ||
ret = (int)i; | ||
break; | ||
} | ||
} | ||
assert(ret != -1); | ||
return ret; | ||
} | ||
int getTensorShape(vart::Runner* runner, GraphInfo* shapes, int cntin, | ||
std::vector<std::string> output_names) { | ||
for (auto i = 0u; i < output_names.size(); ++i) { | ||
auto idx = find_tensor(runner->get_output_tensors(), output_names[i]); | ||
shapes->output_mapping.push_back(idx); | ||
} | ||
getTensorShape(runner, shapes, cntin, (int)output_names.size()); | ||
return 0; | ||
} |
Oops, something went wrong.