SIMA_GENERIC_PREPROC

Description

Generic Preperoc graph, performs the following operations. The graph uses FP16 precision with vectorization and supports batching/multiple input images.

1. Reads an input image -> Supports NV12, IYUV, RGB or GRAY image data.
    a. Up samples the Chroma Planes U/V by 2 using replication for NV12/IYUV inputs
    b. Duplicates the single channel to 3 channels for the GRAY input

    _______________  __
    |xxxxxxxxxxxxxxx| ↑
    |xxxxxxxxxxxxxxx|
    |xxxxxxxxxxxxxxx| Hi
    |xxxxxxxxxxxxxxx|
    |xxxxxxxxxxxxxxx| ↓
    ―――――――――――――――  ――
    |←     Wi      →|

2. Down/Up scales the input images -> Supports Nearest neighbor, Bi-linear, Inter-area interpolation method

    Wi -> input_width
    Hi -> input_height
    Wo -> output_width
    Ho -> output_height
    Ws -> scaled_width
    Hs -> scaled_height

    The final output preprocessed image dimension will be Ho x Wo x 3.
    _______________  __
    |               | ↑
    |               |
    |               | Ho
    |               |
    |               | ↓
    ―――――――――――――――  ――
    |←     Wo      →|

    The image content will be of dimension Hs x Ws x 3, and the remaining area will be padded.
    Padding again depends on the config parameter padding_type.

    For example output image with CENTER padding will be like below
        _______________  __
        |               | ↑    __
        |  xxxxxxxxxxx  |      ↑
        |  xxxxxxxxxxx  | Ho = Hs
        |  xxxxxxxxxxx  |      ↓
        |               | ↓    ――
        ―――――――――――――――  ――
        |←      Wo     →|
           |←   Ws  →|

        The values Hs & Ws are calculated using below logic :

        If aspect_ratio is set to true, then
        Hs and Ws will be calculated based on below formula and user configured Hs and Ws will be ignored.

        Diff = (Hi x Wo) - (Wi x Ho)

        if Diff < 0 (i.e letter boxed output), then
            Ws = Wo;
            Hs = ceil(Hi x Wo / Wi)

        if Diff > 0 (i.e pillar boxed output),  then
            Ws = ceil(Wi x Ho / Hi)
            Hs = Ho

        For a letter_boxed output with padding_type chosen as CENTER, the output will be like below
            where x is the output pixel and p is the padded byte
            _______________  __
        _  |ppppppppppppppp| ↑
        ↑  |xxxxxxxxxxxxxxx|
        Hs |xxxxxxxxxxxxxxx| Ho
        ↓_ |xxxxxxxxxxxxxxx|
           |ppppppppppppppp| ↓
            ―――――――――――――――  ――
           |←   Wo = Ws   →|

        For pillar_boxed output with padding_type chosen as CENTER, the output will be like below
        where x is the output pixel and p is the padded byte
            _______________  ____
            |ppxxxxxxxxxxxpp|  ↑
            |ppxxxxxxxxxxxpp|
            |ppxxxxxxxxxxxpp| Ho=Hs
            |ppxxxxxxxxxxxpp|
            |ppxxxxxxxxxxxpp|  ↓
            ―――――――――――――――  ____
            |←     Wo      →|
              |←   Ws   →|

        If aspect_ratio is set to false, then
        The Diff calculation will be ignored and the user configured value for scaled_width and scaled_height will be directly used as Ws and Hs.

            For example, with aspect_ratio set to false,padding set to CENTER, user configured scaled_width(Ws) < Wi and user configured scaled_height(Hs) < Hi, output image will be like below
            where x is the output pixel and p is the padded byte
            _______________  __
            |ppppppppppppppp| ↑    __
            |ppxxxxxxxxxxxpp|      ↑
            |ppxxxxxxxxxxxpp| Ho = Hs
            |ppxxxxxxxxxxxpp|      ↓
            |ppppppppppppppp| ↓    ――
            ―――――――――――――――  ――
            |←      Wo     →|
               |←   Ws  →|

3. YUV to RGB color space conversion using BT-601/709 ITU Standard (applicable only for NV12, IYUV inputs)

4. Normalization and quantization (if configured)

    The norm_quant process uses input config params - channel_mean_r/g/b, channel_stddev_r/g/b, q_zp, q_scale, which is used to derive qOffset & qMultiplier for r/g/b channels
        qOffset_r/g/b = channel_mean_r/g/b x 255.0
        qMultiplier_r/g/b = q_scale / (channel_stddev_r/g/b * 255.0)

    The final normalized quantized pixel is computed as below
        resized_pixel_r/g/b = resized_raw_pixel_r/g/b - qOffset
        output_pixel_r/g/b = (resized_pixel_r/g/b x qMultiplier_r/g/b) + q_zp
        output_pixel_r/g/b = clamp(output_pixel_r/g/b, -128, 127)     -> for INT8 output
        output_pixel_r/g/b = clamp(output_pixel_r/g/b, -32768, 32767) -> for INT16 output

5. Tessellates the final resized, normalized, quantized RGB output (if configured)
Refer to the documentation of atomic tessellate graph for the tessellation strategy.

Output is 2 buffers in contiguous memory composed of:

    |-----tesselated buffer-----|-----resized buffer-----|

    Note: GStreamer simaaiprocessmla plugin knows to take only the tesselated only based on the JSON configuration file provided.
          Future iterations of this graph will only provide the tesselated buffer as an output.

Supported Input-Output Combinations
===================================
+-------------------+------------------+------------------+---------+
| Scaling Type      | Input Image Type | Output Data Type | Support |
+-------------------+------------------+------------------+---------+
| BILINEAR          | IYUV             | INT8             | Yes     |
|                   |                  | INT16            | No      |
|                   | NV12             | INT8             | Yes     |
|                   |                  | INT16            | No      |
|                   | RGB              | INT8             | Yes     |
|                   |                  | INT16            | Yes     |
|                   | BGR              | INT8             | Yes     |
|                   |                  | INT16            | Yes     |
|                   | GRAY             | INT8             | Yes     |
|                   |                  | INT16            | Yes     |
+-------------------+------------------+------------------+---------+
| NEAREST_NEIGHBOR  | IYUV             | INT8             | Yes     |
|                   |                  | INT16            | No      |
|                   | NV12             | INT8             | Yes     |
|                   |                  | INT16            | No      |
|                   | RGB              | INT8             | No      |
|                   |                  | INT16            | No      |
|                   | BGR              | INT8             | No      |
|                   |                  | INT16            | No      |
|                   | GRAY             | INT8             | No      |
|                   |                  | INT16            | No      |
+-------------------+--------    ----------+------------------+---------+
| INTER_AREA        | IYUV             | INT8             | Yes     |
|                   |                  | INT16            | No      |
|                   | NV12             | INT8             | Yes     |
|                   |                  | INT16            | No      |
|                   | RGB              | INT8             | Yes     |
|                   |                  | INT16            | No      |
|                   | BGR              | INT8             | Yes     |
|                   |                  | INT16            | No      |
|                   | GRAY             | INT8             | No      |
|                   |                  | INT16            | No      |
+-------------------+------------------+------------------+---------+
| BICUBIC           | IYUV             | INT8             | No      |
|                   |                  | INT16            | No      |
|                   | NV12             | INT8             | No      |
|                   |                  | INT16            | No      |
|                   | RGB              | INT8             | No      |
|                   |                  | INT16            | No      |
|                   | BGR              | INT8             | No      |
|                   |                  | INT16            | No      |
|                   | GRAY             | INT8             | No      |
|                   |                  | INT16            | No      |
+-------------------+------------------+------------------+---------+

Graph Info

Overview

SIMA_GENERIC_PREPROC
Graph Name	SIMA_GENERIC_PREPROC
Graph ID	200
Operations Supported	Resize Normalize Quantize Tesselate
Available Since Yocto Build	B684

Example Config

Below is the example config json for this graph. We need to use such config for configuring the EV74 graph first. For this purpose, we need a CVU Configuration Application developed in C++.

{
  "version": 0.1,
  "node_name": "ev-gen-preproc",
  "simaai__params": {
      "params": 15,
      "index": 0,
      "cpu": 1,
      "next_cpu": 2,
      "graph_id": 200,
      "no_of_outbuf": 2,
      "ibufname": "allegrodec",
      "out_sz": 1572864,
      "img_height": 720,
      "img_width": 1280,
      "tile_width": 32,
      "tile_height": 86,
      "input_width": 1280,
      "input_height": 720,
      "output_width": 512,
      "output_height": 512,
      "scaled_width": 512,
      "scaled_height": 288,
      "batch_size": 1,
      "normalize": 0,
      "rgb_interleaved": 1,
      "aspect_ratio": 1,
      "input_depth": 3,
      "output_depth": 3,
      "quant_scale": 53.59502780503762,
      "quant_zp": -14,
      "mean_r": 0.485,
      "mean_g": 0.456,
      "mean_b": 0.406,
      "std_dev_r": 0.229,
      "std_dev_g": 0.224,
      "std_dev_b": 0.225,
      "input_type": 0,
      "scaling_type": 1,
      "output_type": 0,
      "padding_type": 0,
      "offset": 786432,
      "debug": 0,
      "dump_data": 1
  }
}

Parameters

SIMA_GENERIC_PREPROC Params
Parameter Name	Parameter Description	Data Type	Default	Min	Max
tile_width	Width of the Slice/Tile for tessellation from model tar.gz *_mpk.json ‘slice_width’ tesselation transform	int32_t	32	1	4096
tile_height	Height of the Slice/Tile for tessellation from model tar.gz *_mpk.json ‘slice_width’ tesselation transform	int32_t	16	1	4096
input_width	Width of the input image	int32_t	1920	1	4096
input_height	Height of the input image	int32_t	1080	1	4096
output_width	Width of the output image	int32_t	640	1	4096
output_height	Height of the output image	int32_t	640	1	4096
scaled_width	Width of output image maintaining the aspect ratio of input image. If aspect_ratio flags is set to false, this value will be used. If aspect_ratio flags is set to true, this value will be auto calculated in the graph.	int32_t	640	1	4096
scaled_height	Height of output image maintaining the aspect ratio of input image. If aspect_ratio flags is set to false, this value will be used. If aspect_ratio flags is set to true, this value will be auto calculated in the graph.	int32_t	360	1	4096
batch_size	Number of input images to be preprocessed at once	int32_t	1	1	50
normalize	True (1) => the output image will be normalized and quantized, False (0) => the output image will be neither normalized nor quantized.	int32_t	1	0	1
rgb_interleaved	Output image should be tessellated(0) or not(1). Set it to 1 as explicit tessellation kernel is invoked in the graph.	int32_t	1	0	1
aspect_ratio	True (1) => Maintain input aspect ratio in resized output image by adding necessary padding, False (0) => Output image height and width will be same as scaled_height & scaled_width values.	int32_t	1	0	1
input_depth	Depth of the the input image	int32_t	3	1	3
output_depth	Depth of the the output image - only 3 is supported at the moment	int32_t	3	3	3
quant_scale	Quantization scale from model tar.gz *_mpk.json ‘channel_params’[0]	float	1.0	0.0	1000.0
quant_zp	Quantization zero point from model tar.gz *_mpk.json ‘channel_params’[1]	int32_t	0	-128	127
mean_r	Dataset mean for Channel R to be used for normalization	float	0.003921569	0.0	1.0
mean_g	Dataset mean for Channel G to be used for normalization	float	0.003921569	0.0	1.0
mean_b	Dataset mean for Channel B to be used for normalization	float	0.003921569	0.0	1.0
std_dev_r	Dataset std. deviation for Channel R to be used for normalization	float	0.0	0.0	1.0
std_dev_g	Dataset std. deviation for Channel G to be used for normalization	float	0.0	0.0	1.0
std_dev_b	Dataset std. deviation for Channel B to be used for normalization	float	0.0	0.0	1.0
input_type	Input Image type 0 → IYUV420, 1 → NV12, 2 → RGB, 3 → BGR	int32_t	1	0	3
scaling_type	Resize Scaling Algo to be used : 0 → no_scaling, 1 → Nearest Neighbor, 2 → Bicubic, 3 → Bilinear	int32_t	3	0	3
output_type	Output Image type : 0 → RGB, 1 → BGR	int32_t	0	0	1
padding_type	Padding to be used : 0 → TopLeft, 1 → TopRight, 2 → BottomLeft, 3 → BottomRight, 4 → Center (Should be set if LetterBox or PillarBox or No Padding is required)	int32_t	4	0	4
out_sz	out_sz = tesselated output + output_size where output_size is the expected tensor shape (for example, resized to 224x224x3 would be an output size of 150528 bytes)	int32_t	N/A	N/A	N/A
offset	Size of tesselated output, can be extracted from model tar.gz _mpk.json {"plugins" -> "name": "_tesselation_transform" -> "output_nodes" -> "size"}. It is not uncommon for the tesselated_output size to be the same as the output_size.	int32_t	N/A	N/A	N/A
dump_data	Enable (1) or disable (0) dumping of output tensor to `/tmp` directory on device with the name `{node_name}-###.out`. The sequence number `###` will increment with each output dump (e.g., -001.out, -002.out, …).	int32_t	0	0	1
debug	Enable more debug logs, 0 => disable, 1=> additonal logs, 2 => profile runtime of individual input tensors, 3 => profile overall graph runtime.	int32_t	[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]	0	3
dump_data	Enable (1) or disable (0) dumping of output tensor to `/tmp` directory on device with the name `{node_name}-###.out`. The sequence number `###` will increment with each output dump (e.g., -001.out, -002.out, …).	int32_t	0	0	1

CVU Configuration Application

Note

The need to write, build and execute a dependent application for the CVU will be removed in an upcoming release.

To configure any CVU graph, a C++ CVU Configuration Application must be cross-compiled and executed on the board before using the CVU. Multiple graphs can be pre-programmed into the CVU before running any application. This guide provides a pre-written C++ application for download for each graph that can be cross-compiled on Palette, and executed on the board prior to running the simaaiprocesscvu GStreamer plugin. An pre-compiled version is also included for direct use. If you encounter issues, please re-compile the application from the sources provided.

How to compile using the files below

Please refer to How to compile CVU Configuration Application? for more info.

Directory structure

.
├── CMakeLists.txt
├── cvu_cfg_graph.cpp
└── cvu_cfg_main.cpp

Code files

cvu_cfg_graph.cpp

 #include <simaai/ev_cfg_helper.h>
 #include <simaai/parser.h>
 #include <simaai/platform/simaevxxipc.h>
 #include <string.h>

 #define SIMA_IPC_GRAPH_NAME "SIMA_GENERIC_PREPROC"
 #define SIMA_IPC_GRAPH_CODE (200)

 #define INPUT_WIDTH (1)
 #define INPUT_HEIGHT (2)
 #define OUTPUT_WIDTH (3)
 #define OUTPUT_HEIGHT (4)
 #define SCALED_WIDTH (5)
 #define SCALED_HEIGHT (6)
 #define BATCH_SIZE (7)
 #define NORMALIZE (8)
 #define RGB_INTERLEAVED (9)
 #define ASPECT_RATIO (10)
 #define TILE_WIDTH (11)
 #define TILE_HEIGHT (12)
 #define INPUT_DEPTH (13)
 #define OUTPUT_DEPTH (14)
 #define QUANT_ZP (15)
 #define QUANT_SCALE (16)
 #define MEAN_R (17)
 #define MEAN_G (18)
 #define MEAN_B (19)
 #define STD_DEV_R (20)
 #define STD_DEV_G (21)
 #define STD_DEV_B (22)
 #define INPUT_TYPE (23)
 #define OUTPUT_TYPE (24)
 #define SCALING_TYPE (25)
 #define OFFSET (26)
 #define PADDING_TYPE (27)
 #define INPUT_STRIDE (28)
 #define OUTPUT_STRIDE (29)
 #define OUTPUT_DTYPE (30)
 #define DEBUG (31)

 void configure_graph(const char *json_in) {
   simaai_params_t *params = parser_node_struct_init();
   if(params == NULL) {
     std::cout << "Unable to create params \n";
   }
   if((parse_json_file(json_in, params) != PARSER_SUCCESS)) {
     std::cout << "Unable to start parser \n";
   }

   uint8_t *buf = (uint8_t *)calloc(1, sizeof(uint8_t) * 16);

   int val = *((int *)parser_get_int(params, "input_width"));
   send_i32_param(2, SIMA_IPC_GRAPH_CODE, INPUT_WIDTH, buf, val);

   val = *((int *)parser_get_int(params, "input_height"));
   send_i32_param(2, SIMA_IPC_GRAPH_CODE, INPUT_HEIGHT, buf, val);

   val = *((int *)parser_get_int(params, "output_width"));
   send_i32_param(2, SIMA_IPC_GRAPH_CODE, OUTPUT_WIDTH, buf, val);

   val = *((int *)parser_get_int(params, "output_height"));
   send_i32_param(2, SIMA_IPC_GRAPH_CODE, OUTPUT_HEIGHT, buf, val);

   val = *((int *)parser_get_int(params, "scaled_width"));
   send_i32_param(2, SIMA_IPC_GRAPH_CODE, SCALED_WIDTH, buf, val);

   val = *((int *)parser_get_int(params, "scaled_height"));
   send_i32_param(2, SIMA_IPC_GRAPH_CODE, SCALED_HEIGHT, buf, val);

   val = *((int *)parser_get_int(params, "batch_size"));
   send_i32_param(2, SIMA_IPC_GRAPH_CODE, BATCH_SIZE, buf, val);

   val = *((int *)parser_get_int(params, "normalize"));
   send_i32_param(2, SIMA_IPC_GRAPH_CODE, NORMALIZE, buf, val);

   val = *((int *)parser_get_int(params, "rgb_interleaved"));
   send_i32_param(2, SIMA_IPC_GRAPH_CODE, RGB_INTERLEAVED, buf, val);

   val = *((int *)parser_get_int(params, "aspect_ratio"));
   send_i32_param(2, SIMA_IPC_GRAPH_CODE, ASPECT_RATIO, buf, val);

   val = *((int *)parser_get_int(params, "tile_width"));
   send_i32_param(2, SIMA_IPC_GRAPH_CODE, TILE_WIDTH, buf, val);

   val = *((int *)parser_get_int(params, "tile_height"));
   send_i32_param(2, SIMA_IPC_GRAPH_CODE, TILE_HEIGHT, buf, val);

   val = *((int *)parser_get_int(params, "input_depth"));
   send_i32_param(2, SIMA_IPC_GRAPH_CODE, INPUT_DEPTH, buf, val);

   val = *((int *)parser_get_int(params, "output_depth"));
   send_i32_param(2, SIMA_IPC_GRAPH_CODE, OUTPUT_DEPTH, buf, val);

   val = *((int *)parser_get_int(params, "quant_zp"));
   send_i32_param(2, SIMA_IPC_GRAPH_CODE, QUANT_ZP, buf, val);

   double val_f = *((double *)parser_get_double(params, "quant_scale"));
   send_float_param(2, SIMA_IPC_GRAPH_CODE, QUANT_SCALE, buf, val_f);

   val_f = *((double *)parser_get_double(params, "mean_r"));
   send_float_param(2, SIMA_IPC_GRAPH_CODE, MEAN_R, buf, val_f);

   val_f = *((double *)parser_get_double(params, "mean_g"));
   send_float_param(2, SIMA_IPC_GRAPH_CODE, MEAN_G, buf, val_f);

   val_f = *((double *)parser_get_double(params, "mean_b"));
   send_float_param(2, SIMA_IPC_GRAPH_CODE, MEAN_B, buf, val_f);

   val_f = *((double *)parser_get_double(params, "std_dev_r"));
   send_float_param(2, SIMA_IPC_GRAPH_CODE, STD_DEV_R, buf, val_f);

   val_f = *((double *)parser_get_double(params, "std_dev_g"));
   send_float_param(2, SIMA_IPC_GRAPH_CODE, STD_DEV_G, buf, val_f);

   val_f = *((double *)parser_get_double(params, "std_dev_b"));
   send_float_param(2, SIMA_IPC_GRAPH_CODE, STD_DEV_B, buf, val_f);

   val = *((int *)parser_get_int(params, "input_type"));
   send_i32_param(2, SIMA_IPC_GRAPH_CODE, INPUT_TYPE, buf, val);

   val = *((int *)parser_get_int(params, "output_type"));
   send_i32_param(2, SIMA_IPC_GRAPH_CODE, OUTPUT_TYPE, buf, val);

   val = *((int *)parser_get_int(params, "scaling_type"));
   send_i32_param(2, SIMA_IPC_GRAPH_CODE, SCALING_TYPE, buf, val);

   val = *((int *)parser_get_int(params, "offset"));
   send_i32_param(2, SIMA_IPC_GRAPH_CODE, OFFSET, buf, val);

   val = *((int *)parser_get_int(params, "padding_type"));
   send_i32_param(2, SIMA_IPC_GRAPH_CODE, PADDING_TYPE, buf, val);

   val = *((int *)parser_get_int(params, "input_stride"));
   send_i32_param(2, SIMA_IPC_GRAPH_CODE, INPUT_STRIDE, buf, val);

   val = *((int *)parser_get_int(params, "output_stride"));
   send_i32_param(2, SIMA_IPC_GRAPH_CODE, OUTPUT_STRIDE, buf, val);

   val = *((int *)parser_get_int(params, "output_dtype"));
   send_i32_param(2, SIMA_IPC_GRAPH_CODE, OUTPUT_DTYPE, buf, val);

   // Should be sent last to trigger the configuration completion at CVU side
   val = *((int *)parser_get_int(params, "debug"));
   send_i32_param(2, SIMA_IPC_GRAPH_CODE, DEBUG, buf, val);

   std::cout << "Completed " << SIMA_IPC_GRAPH_NAME << " graph configure \n";
   free(buf);
 }

cvu_cfg_main.cpp

 #include <getopt.h>
 #include <sys/stat.h>
 #include <unistd.h>

 #include <cstring>
 #include <iostream>

 extern void configure_graph(const char *json_fpath);

 bool is_valid_path(const char *path) {
   struct stat buffer;
   return (stat(path, &buffer) == 0);
 }

 int main(int argc, char **argv) {
   const char *json_path = argv[1];

   if(is_valid_path(json_path)) {
     configure_graph(json_path);
   } else {
     std::cerr << "Invalid path: " << json_path << std::endl;
     return 1;
   }

   return 0;
 }

CMakeLists.txt

 cmake_minimum_required(VERSION 3.16)

 # set the project name
 set(GRAPH_NAME "genpreproc_200")
 set(PROJECT_NAME "CVU Graph Cfg. App.")

 project("${PROJECT_NAME}"
     VERSION 0.1
     DESCRIPTION "CVU Graph Configuration Application"
     LANGUAGES C CXX)

 set(PIPELINE_SOURCES
     cvu_cfg_graph.cpp)

 execute_process(
     COMMAND git rev-parse --abbrev-ref HEAD
     WORKING_DIRECTORY ${CMAKE_SOURCE_DIR}
     OUTPUT_VARIABLE GIT_BRANCH
     OUTPUT_STRIP_TRAILING_WHITESPACE
 )

 # Get the latest abbreviated commit hash of the working branch
 execute_process(
     COMMAND git log -1 --format=%h
     WORKING_DIRECTORY ${CMAKE_SOURCE_DIR}
     OUTPUT_VARIABLE GIT_COMMIT_HASH
     OUTPUT_STRIP_TRAILING_WHITESPACE
 )

 link_directories(${CMAKE_INSTALL_DIR}/core
     ${CMAKE_INSTALL_DIR}/gst
 )

 include(GNUInstallDirs)

 # ev-configuration genertion executable
 set(EV_EXEC_NAME "${GRAPH_NAME}_cvu_cfg_app")

 add_executable(${EV_EXEC_NAME}
     cvu_cfg_main.cpp
     cvu_cfg_graph.cpp)

 target_link_libraries(${EV_EXEC_NAME}
     PUBLIC
     simaaiparser
     evhelpers)

 INSTALL(TARGETS "${EV_EXEC_NAME}")