SIMA_GENERIC_PREPROC
Description
Generic Preperoc graph, performs the following operations. The graph uses FP16 precision with vectorization and supports batching/multiple input images.
1. Reads an input image -> Supports NV12, IYUV, RGB or GRAY image data.
a. Up samples the Chroma Planes U/V by 2 using replication for NV12/IYUV inputs
b. Duplicates the single channel to 3 channels for the GRAY input
_______________ __
|xxxxxxxxxxxxxxx| ↑
|xxxxxxxxxxxxxxx|
|xxxxxxxxxxxxxxx| Hi
|xxxxxxxxxxxxxxx|
|xxxxxxxxxxxxxxx| ↓
――――――――――――――― ――
|← Wi →|
2. Down/Up scales the input images -> Supports Nearest neighbor, Bi-linear, Inter-area interpolation method
Wi -> input_width
Hi -> input_height
Wo -> output_width
Ho -> output_height
Ws -> scaled_width
Hs -> scaled_height
The final output preprocessed image dimension will be Ho x Wo x 3.
_______________ __
| | ↑
| |
| | Ho
| |
| | ↓
――――――――――――――― ――
|← Wo →|
The image content will be of dimension Hs x Ws x 3, and the remaining area will be padded.
Padding again depends on the config parameter padding_type.
For example output image with CENTER padding will be like below
_______________ __
| | ↑ __
| xxxxxxxxxxx | ↑
| xxxxxxxxxxx | Ho = Hs
| xxxxxxxxxxx | ↓
| | ↓ ――
――――――――――――――― ――
|← Wo →|
|← Ws →|
The values Hs & Ws are calculated using below logic :
If aspect_ratio is set to true, then
Hs and Ws will be calculated based on below formula and user configured Hs and Ws will be ignored.
Diff = (Hi x Wo) - (Wi x Ho)
if Diff < 0 (i.e letter boxed output), then
Ws = Wo;
Hs = ceil(Hi x Wo / Wi)
if Diff > 0 (i.e pillar boxed output), then
Ws = ceil(Wi x Ho / Hi)
Hs = Ho
For a letter_boxed output with padding_type chosen as CENTER, the output will be like below
where x is the output pixel and p is the padded byte
_______________ __
_ |ppppppppppppppp| ↑
↑ |xxxxxxxxxxxxxxx|
Hs |xxxxxxxxxxxxxxx| Ho
↓_ |xxxxxxxxxxxxxxx|
|ppppppppppppppp| ↓
――――――――――――――― ――
|← Wo = Ws →|
For pillar_boxed output with padding_type chosen as CENTER, the output will be like below
where x is the output pixel and p is the padded byte
_______________ ____
|ppxxxxxxxxxxxpp| ↑
|ppxxxxxxxxxxxpp|
|ppxxxxxxxxxxxpp| Ho=Hs
|ppxxxxxxxxxxxpp|
|ppxxxxxxxxxxxpp| ↓
――――――――――――――― ____
|← Wo →|
|← Ws →|
If aspect_ratio is set to false, then
The Diff calculation will be ignored and the user configured value for scaled_width and scaled_height will be directly used as Ws and Hs.
For example, with aspect_ratio set to false,padding set to CENTER, user configured scaled_width(Ws) < Wi and user configured scaled_height(Hs) < Hi, output image will be like below
where x is the output pixel and p is the padded byte
_______________ __
|ppppppppppppppp| ↑ __
|ppxxxxxxxxxxxpp| ↑
|ppxxxxxxxxxxxpp| Ho = Hs
|ppxxxxxxxxxxxpp| ↓
|ppppppppppppppp| ↓ ――
――――――――――――――― ――
|← Wo →|
|← Ws →|
3. YUV to RGB color space conversion using BT-601/709 ITU Standard (applicable only for NV12, IYUV inputs)
4. Normalization and quantization (if configured)
The norm_quant process uses input config params - channel_mean_r/g/b, channel_stddev_r/g/b, q_zp, q_scale, which is used to derive qOffset & qMultiplier for r/g/b channels
qOffset_r/g/b = channel_mean_r/g/b x 255.0
qMultiplier_r/g/b = q_scale / (channel_stddev_r/g/b * 255.0)
The final normalized quantized pixel is computed as below
resized_pixel_r/g/b = resized_raw_pixel_r/g/b - qOffset
output_pixel_r/g/b = (resized_pixel_r/g/b x qMultiplier_r/g/b) + q_zp
output_pixel_r/g/b = clamp(output_pixel_r/g/b, -128, 127) -> for INT8 output
output_pixel_r/g/b = clamp(output_pixel_r/g/b, -32768, 32767) -> for INT16 output
5. Tessellates the final resized, normalized, quantized RGB output (if configured)
Refer to the documentation of atomic tessellate graph for the tessellation strategy.
Output is 2 buffers in contiguous memory composed of:
|-----tesselated buffer-----|-----resized buffer-----|
Note: GStreamer simaaiprocessmla plugin knows to take only the tesselated only based on the JSON configuration file provided.
Future iterations of this graph will only provide the tesselated buffer as an output.
Supported Input-Output Combinations
===================================
+-------------------+------------------+------------------+---------+
| Scaling Type | Input Image Type | Output Data Type | Support |
+-------------------+------------------+------------------+---------+
| BILINEAR | IYUV | INT8 | Yes |
| | | INT16 | No |
| | NV12 | INT8 | Yes |
| | | INT16 | No |
| | RGB | INT8 | Yes |
| | | INT16 | Yes |
| | BGR | INT8 | Yes |
| | | INT16 | Yes |
| | GRAY | INT8 | Yes |
| | | INT16 | Yes |
+-------------------+------------------+------------------+---------+
| NEAREST_NEIGHBOR | IYUV | INT8 | Yes |
| | | INT16 | No |
| | NV12 | INT8 | Yes |
| | | INT16 | No |
| | RGB | INT8 | No |
| | | INT16 | No |
| | BGR | INT8 | No |
| | | INT16 | No |
| | GRAY | INT8 | No |
| | | INT16 | No |
+-------------------+-------- ----------+------------------+---------+
| INTER_AREA | IYUV | INT8 | Yes |
| | | INT16 | No |
| | NV12 | INT8 | Yes |
| | | INT16 | No |
| | RGB | INT8 | Yes |
| | | INT16 | No |
| | BGR | INT8 | Yes |
| | | INT16 | No |
| | GRAY | INT8 | No |
| | | INT16 | No |
+-------------------+------------------+------------------+---------+
| BICUBIC | IYUV | INT8 | No |
| | | INT16 | No |
| | NV12 | INT8 | No |
| | | INT16 | No |
| | RGB | INT8 | No |
| | | INT16 | No |
| | BGR | INT8 | No |
| | | INT16 | No |
| | GRAY | INT8 | No |
| | | INT16 | No |
+-------------------+------------------+------------------+---------+
Graph Info
Overview
Graph Name |
SIMA_GENERIC_PREPROC |
---|---|
Graph ID |
200 |
Operations Supported |
Resize Normalize Quantize Tesselate |
Available Since Yocto Build |
B684 |
Example Config
Below is the example config json for this graph. We need to use such config for configuring the EV74 graph first. For this purpose, we need a CVU Configuration Application developed in C++.
{
"version": 0.1,
"node_name": "ev-gen-preproc",
"simaai__params": {
"params": 15,
"index": 0,
"cpu": 1,
"next_cpu": 2,
"graph_id": 200,
"no_of_outbuf": 2,
"ibufname": "allegrodec",
"out_sz": 1572864,
"img_height": 720,
"img_width": 1280,
"tile_width": 32,
"tile_height": 86,
"input_width": 1280,
"input_height": 720,
"output_width": 512,
"output_height": 512,
"scaled_width": 512,
"scaled_height": 288,
"batch_size": 1,
"normalize": 0,
"rgb_interleaved": 1,
"aspect_ratio": 1,
"input_depth": 3,
"output_depth": 3,
"quant_scale": 53.59502780503762,
"quant_zp": -14,
"mean_r": 0.485,
"mean_g": 0.456,
"mean_b": 0.406,
"std_dev_r": 0.229,
"std_dev_g": 0.224,
"std_dev_b": 0.225,
"input_type": 0,
"scaling_type": 1,
"output_type": 0,
"padding_type": 0,
"offset": 786432,
"debug": 0,
"dump_data": 1
}
}
Parameters
Parameter Name |
Parameter Description |
Data Type |
Default |
Min |
Max |
---|---|---|---|---|---|
tile_width |
Width of the Slice/Tile for tessellation from model tar.gz *_mpk.json ‘slice_width’ tesselation transform |
int32_t |
32 |
1 |
4096 |
tile_height |
Height of the Slice/Tile for tessellation from model tar.gz *_mpk.json ‘slice_width’ tesselation transform |
int32_t |
16 |
1 |
4096 |
input_width |
Width of the input image |
int32_t |
1920 |
1 |
4096 |
input_height |
Height of the input image |
int32_t |
1080 |
1 |
4096 |
output_width |
Width of the output image |
int32_t |
640 |
1 |
4096 |
output_height |
Height of the output image |
int32_t |
640 |
1 |
4096 |
scaled_width |
Width of output image maintaining the aspect ratio of input image. If aspect_ratio flags is set to false, this value will be used. If aspect_ratio flags is set to true, this value will be auto calculated in the graph. |
int32_t |
640 |
1 |
4096 |
scaled_height |
Height of output image maintaining the aspect ratio of input image. If aspect_ratio flags is set to false, this value will be used. If aspect_ratio flags is set to true, this value will be auto calculated in the graph. |
int32_t |
360 |
1 |
4096 |
batch_size |
Number of input images to be preprocessed at once |
int32_t |
1 |
1 |
50 |
normalize |
True (1) => the output image will be normalized and quantized, False (0) => the output image will be neither normalized nor quantized. |
int32_t |
1 |
0 |
1 |
rgb_interleaved |
Output image should be tessellated(0) or not(1). Set it to 1 as explicit tessellation kernel is invoked in the graph. |
int32_t |
1 |
0 |
1 |
aspect_ratio |
True (1) => Maintain input aspect ratio in resized output image by adding necessary padding, False (0) => Output image height and width will be same as scaled_height & scaled_width values. |
int32_t |
1 |
0 |
1 |
input_depth |
Depth of the the input image |
int32_t |
3 |
1 |
3 |
output_depth |
Depth of the the output image - only 3 is supported at the moment |
int32_t |
3 |
3 |
3 |
quant_scale |
Quantization scale from model tar.gz *_mpk.json ‘channel_params’[0] |
float |
1.0 |
0.0 |
1000.0 |
quant_zp |
Quantization zero point from model tar.gz *_mpk.json ‘channel_params’[1] |
int32_t |
0 |
-128 |
127 |
mean_r |
Dataset mean for Channel R to be used for normalization |
float |
0.003921569 |
0.0 |
1.0 |
mean_g |
Dataset mean for Channel G to be used for normalization |
float |
0.003921569 |
0.0 |
1.0 |
mean_b |
Dataset mean for Channel B to be used for normalization |
float |
0.003921569 |
0.0 |
1.0 |
std_dev_r |
Dataset std. deviation for Channel R to be used for normalization |
float |
0.0 |
0.0 |
1.0 |
std_dev_g |
Dataset std. deviation for Channel G to be used for normalization |
float |
0.0 |
0.0 |
1.0 |
std_dev_b |
Dataset std. deviation for Channel B to be used for normalization |
float |
0.0 |
0.0 |
1.0 |
input_type |
Input Image type 0 → IYUV420, 1 → NV12, 2 → RGB, 3 → BGR |
int32_t |
1 |
0 |
3 |
scaling_type |
Resize Scaling Algo to be used : 0 → no_scaling, 1 → Nearest Neighbor, 2 → Bicubic, 3 → Bilinear |
int32_t |
3 |
0 |
3 |
output_type |
Output Image type : 0 → RGB, 1 → BGR |
int32_t |
0 |
0 |
1 |
padding_type |
Padding to be used : 0 → TopLeft, 1 → TopRight, 2 → BottomLeft, 3 → BottomRight, 4 → Center (Should be set if LetterBox or PillarBox or No Padding is required) |
int32_t |
4 |
0 |
4 |
out_sz |
out_sz = tesselated output + output_size where output_size is the expected tensor shape (for example, resized to 224x224x3 would be an output size of 150528 bytes) |
int32_t |
N/A |
N/A |
N/A |
offset |
Size of tesselated output, can be extracted from model tar.gz *_mpk.json {"plugins" -> "name": "*_tesselation_transform" -> "output_nodes" -> "size"}. It is not uncommon for the tesselated_output size to be the same as the output_size. |
int32_t |
N/A |
N/A |
N/A |
dump_data |
Enable (1) or disable (0) dumping of output tensor to |
int32_t |
0 |
0 |
1 |
debug |
Enable more debug logs, 0 => disable, 1=> additonal logs, 2 => profile runtime of individual input tensors, 3 => profile overall graph runtime. |
int32_t |
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] |
0 |
3 |
dump_data |
Enable (1) or disable (0) dumping of output tensor to |
int32_t |
0 |
0 |
1 |
CVU Configuration Application
Note
The need to write, build and execute a dependent application for the CVU will be removed in an upcoming release.
To configure any CVU graph, a C++ CVU Configuration Application must be cross-compiled and executed on the board before using the CVU. Multiple graphs can be pre-programmed into the CVU before running any application. This guide provides a pre-written C++ application for download for each graph that can be cross-compiled on Palette, and executed on the board prior to running the simaaiprocesscvu GStreamer plugin. An pre-compiled version is also included for direct use. If you encounter issues, please re-compile the application from the sources provided.
How to compile using the files below
Please refer to How to compile CVU Configuration Application? for more info.
Directory structure
.
├── CMakeLists.txt
├── cvu_cfg_graph.cpp
└── cvu_cfg_main.cpp
Code files
1 #include <simaai/ev_cfg_helper.h>
2 #include <simaai/parser.h>
3 #include <simaai/platform/simaevxxipc.h>
4 #include <string.h>
5
6 #define SIMA_IPC_GRAPH_NAME "SIMA_GENERIC_PREPROC"
7 #define SIMA_IPC_GRAPH_CODE (200)
8
9 #define INPUT_WIDTH (1)
10 #define INPUT_HEIGHT (2)
11 #define OUTPUT_WIDTH (3)
12 #define OUTPUT_HEIGHT (4)
13 #define SCALED_WIDTH (5)
14 #define SCALED_HEIGHT (6)
15 #define BATCH_SIZE (7)
16 #define NORMALIZE (8)
17 #define RGB_INTERLEAVED (9)
18 #define ASPECT_RATIO (10)
19 #define TILE_WIDTH (11)
20 #define TILE_HEIGHT (12)
21 #define INPUT_DEPTH (13)
22 #define OUTPUT_DEPTH (14)
23 #define QUANT_ZP (15)
24 #define QUANT_SCALE (16)
25 #define MEAN_R (17)
26 #define MEAN_G (18)
27 #define MEAN_B (19)
28 #define STD_DEV_R (20)
29 #define STD_DEV_G (21)
30 #define STD_DEV_B (22)
31 #define INPUT_TYPE (23)
32 #define OUTPUT_TYPE (24)
33 #define SCALING_TYPE (25)
34 #define OFFSET (26)
35 #define PADDING_TYPE (27)
36 #define INPUT_STRIDE (28)
37 #define OUTPUT_STRIDE (29)
38 #define OUTPUT_DTYPE (30)
39 #define DEBUG (31)
40
41 void configure_graph(const char *json_in) {
42 simaai_params_t *params = parser_node_struct_init();
43 if(params == NULL) {
44 std::cout << "Unable to create params \n";
45 }
46 if((parse_json_file(json_in, params) != PARSER_SUCCESS)) {
47 std::cout << "Unable to start parser \n";
48 }
49
50 uint8_t *buf = (uint8_t *)calloc(1, sizeof(uint8_t) * 16);
51
52 int val = *((int *)parser_get_int(params, "input_width"));
53 send_i32_param(2, SIMA_IPC_GRAPH_CODE, INPUT_WIDTH, buf, val);
54
55 val = *((int *)parser_get_int(params, "input_height"));
56 send_i32_param(2, SIMA_IPC_GRAPH_CODE, INPUT_HEIGHT, buf, val);
57
58 val = *((int *)parser_get_int(params, "output_width"));
59 send_i32_param(2, SIMA_IPC_GRAPH_CODE, OUTPUT_WIDTH, buf, val);
60
61 val = *((int *)parser_get_int(params, "output_height"));
62 send_i32_param(2, SIMA_IPC_GRAPH_CODE, OUTPUT_HEIGHT, buf, val);
63
64 val = *((int *)parser_get_int(params, "scaled_width"));
65 send_i32_param(2, SIMA_IPC_GRAPH_CODE, SCALED_WIDTH, buf, val);
66
67 val = *((int *)parser_get_int(params, "scaled_height"));
68 send_i32_param(2, SIMA_IPC_GRAPH_CODE, SCALED_HEIGHT, buf, val);
69
70 val = *((int *)parser_get_int(params, "batch_size"));
71 send_i32_param(2, SIMA_IPC_GRAPH_CODE, BATCH_SIZE, buf, val);
72
73 val = *((int *)parser_get_int(params, "normalize"));
74 send_i32_param(2, SIMA_IPC_GRAPH_CODE, NORMALIZE, buf, val);
75
76 val = *((int *)parser_get_int(params, "rgb_interleaved"));
77 send_i32_param(2, SIMA_IPC_GRAPH_CODE, RGB_INTERLEAVED, buf, val);
78
79 val = *((int *)parser_get_int(params, "aspect_ratio"));
80 send_i32_param(2, SIMA_IPC_GRAPH_CODE, ASPECT_RATIO, buf, val);
81
82 val = *((int *)parser_get_int(params, "tile_width"));
83 send_i32_param(2, SIMA_IPC_GRAPH_CODE, TILE_WIDTH, buf, val);
84
85 val = *((int *)parser_get_int(params, "tile_height"));
86 send_i32_param(2, SIMA_IPC_GRAPH_CODE, TILE_HEIGHT, buf, val);
87
88 val = *((int *)parser_get_int(params, "input_depth"));
89 send_i32_param(2, SIMA_IPC_GRAPH_CODE, INPUT_DEPTH, buf, val);
90
91 val = *((int *)parser_get_int(params, "output_depth"));
92 send_i32_param(2, SIMA_IPC_GRAPH_CODE, OUTPUT_DEPTH, buf, val);
93
94 val = *((int *)parser_get_int(params, "quant_zp"));
95 send_i32_param(2, SIMA_IPC_GRAPH_CODE, QUANT_ZP, buf, val);
96
97 double val_f = *((double *)parser_get_double(params, "quant_scale"));
98 send_float_param(2, SIMA_IPC_GRAPH_CODE, QUANT_SCALE, buf, val_f);
99
100 val_f = *((double *)parser_get_double(params, "mean_r"));
101 send_float_param(2, SIMA_IPC_GRAPH_CODE, MEAN_R, buf, val_f);
102
103 val_f = *((double *)parser_get_double(params, "mean_g"));
104 send_float_param(2, SIMA_IPC_GRAPH_CODE, MEAN_G, buf, val_f);
105
106 val_f = *((double *)parser_get_double(params, "mean_b"));
107 send_float_param(2, SIMA_IPC_GRAPH_CODE, MEAN_B, buf, val_f);
108
109 val_f = *((double *)parser_get_double(params, "std_dev_r"));
110 send_float_param(2, SIMA_IPC_GRAPH_CODE, STD_DEV_R, buf, val_f);
111
112 val_f = *((double *)parser_get_double(params, "std_dev_g"));
113 send_float_param(2, SIMA_IPC_GRAPH_CODE, STD_DEV_G, buf, val_f);
114
115 val_f = *((double *)parser_get_double(params, "std_dev_b"));
116 send_float_param(2, SIMA_IPC_GRAPH_CODE, STD_DEV_B, buf, val_f);
117
118 val = *((int *)parser_get_int(params, "input_type"));
119 send_i32_param(2, SIMA_IPC_GRAPH_CODE, INPUT_TYPE, buf, val);
120
121 val = *((int *)parser_get_int(params, "output_type"));
122 send_i32_param(2, SIMA_IPC_GRAPH_CODE, OUTPUT_TYPE, buf, val);
123
124 val = *((int *)parser_get_int(params, "scaling_type"));
125 send_i32_param(2, SIMA_IPC_GRAPH_CODE, SCALING_TYPE, buf, val);
126
127 val = *((int *)parser_get_int(params, "offset"));
128 send_i32_param(2, SIMA_IPC_GRAPH_CODE, OFFSET, buf, val);
129
130 val = *((int *)parser_get_int(params, "padding_type"));
131 send_i32_param(2, SIMA_IPC_GRAPH_CODE, PADDING_TYPE, buf, val);
132
133 val = *((int *)parser_get_int(params, "input_stride"));
134 send_i32_param(2, SIMA_IPC_GRAPH_CODE, INPUT_STRIDE, buf, val);
135
136 val = *((int *)parser_get_int(params, "output_stride"));
137 send_i32_param(2, SIMA_IPC_GRAPH_CODE, OUTPUT_STRIDE, buf, val);
138
139 val = *((int *)parser_get_int(params, "output_dtype"));
140 send_i32_param(2, SIMA_IPC_GRAPH_CODE, OUTPUT_DTYPE, buf, val);
141
142 // Should be sent last to trigger the configuration completion at CVU side
143 val = *((int *)parser_get_int(params, "debug"));
144 send_i32_param(2, SIMA_IPC_GRAPH_CODE, DEBUG, buf, val);
145
146 std::cout << "Completed " << SIMA_IPC_GRAPH_NAME << " graph configure \n";
147 free(buf);
148 }
1 #include <getopt.h>
2 #include <sys/stat.h>
3 #include <unistd.h>
4
5 #include <cstring>
6 #include <iostream>
7
8 extern void configure_graph(const char *json_fpath);
9
10 bool is_valid_path(const char *path) {
11 struct stat buffer;
12 return (stat(path, &buffer) == 0);
13 }
14
15 int main(int argc, char **argv) {
16 const char *json_path = argv[1];
17
18 if(is_valid_path(json_path)) {
19 configure_graph(json_path);
20 } else {
21 std::cerr << "Invalid path: " << json_path << std::endl;
22 return 1;
23 }
24
25 return 0;
26 }
1 cmake_minimum_required(VERSION 3.16)
2
3 # set the project name
4 set(GRAPH_NAME "genpreproc_200")
5 set(PROJECT_NAME "CVU Graph Cfg. App.")
6
7 project("${PROJECT_NAME}"
8 VERSION 0.1
9 DESCRIPTION "CVU Graph Configuration Application"
10 LANGUAGES C CXX)
11
12 set(PIPELINE_SOURCES
13 cvu_cfg_graph.cpp)
14
15 execute_process(
16 COMMAND git rev-parse --abbrev-ref HEAD
17 WORKING_DIRECTORY ${CMAKE_SOURCE_DIR}
18 OUTPUT_VARIABLE GIT_BRANCH
19 OUTPUT_STRIP_TRAILING_WHITESPACE
20 )
21
22 # Get the latest abbreviated commit hash of the working branch
23 execute_process(
24 COMMAND git log -1 --format=%h
25 WORKING_DIRECTORY ${CMAKE_SOURCE_DIR}
26 OUTPUT_VARIABLE GIT_COMMIT_HASH
27 OUTPUT_STRIP_TRAILING_WHITESPACE
28 )
29
30 link_directories(${CMAKE_INSTALL_DIR}/core
31 ${CMAKE_INSTALL_DIR}/gst
32 )
33
34 include(GNUInstallDirs)
35
36 # ev-configuration genertion executable
37 set(EV_EXEC_NAME "${GRAPH_NAME}_cvu_cfg_app")
38
39 add_executable(${EV_EXEC_NAME}
40 cvu_cfg_main.cpp
41 cvu_cfg_graph.cpp)
42
43 target_link_libraries(${EV_EXEC_NAME}
44 PUBLIC
45 simaaiparser
46 evhelpers)
47
48 INSTALL(TARGETS "${EV_EXEC_NAME}")