.. _ev74_graph_200_sima_generic_preproc: |graph| ======================= Description ----------- Generic Preperoc graph, performs the following operations. The graph uses FP16 precision with vectorization and supports batching/multiple input images. .. code-block:: 1. Reads an input image -> Supports NV12, IYUV, RGB or GRAY image data. a. Up samples the Chroma Planes U/V by 2 using replication for NV12/IYUV inputs b. Duplicates the single channel to 3 channels for the GRAY input _______________ __ |xxxxxxxxxxxxxxx| ↑ |xxxxxxxxxxxxxxx| |xxxxxxxxxxxxxxx| Hi |xxxxxxxxxxxxxxx| |xxxxxxxxxxxxxxx| ↓ ――――――――――――――― ―― |← Wi →| 2. Down/Up scales the input images -> Supports Nearest neighbor, Bi-linear, Inter-area interpolation method Wi -> input_width Hi -> input_height Wo -> output_width Ho -> output_height Ws -> scaled_width Hs -> scaled_height The final output preprocessed image dimension will be Ho x Wo x 3. _______________ __ | | ↑ | | | | Ho | | | | ↓ ――――――――――――――― ―― |← Wo →| The image content will be of dimension Hs x Ws x 3, and the remaining area will be padded. Padding again depends on the config parameter padding_type. For example output image with CENTER padding will be like below _______________ __ | | ↑ __ | xxxxxxxxxxx | ↑ | xxxxxxxxxxx | Ho = Hs | xxxxxxxxxxx | ↓ | | ↓ ―― ――――――――――――――― ―― |← Wo →| |← Ws →| The values Hs & Ws are calculated using below logic : If aspect_ratio is set to true, then Hs and Ws will be calculated based on below formula and user configured Hs and Ws will be ignored. Diff = (Hi x Wo) - (Wi x Ho) if Diff < 0 (i.e letter boxed output), then Ws = Wo; Hs = ceil(Hi x Wo / Wi) if Diff > 0 (i.e pillar boxed output), then Ws = ceil(Wi x Ho / Hi) Hs = Ho For a letter_boxed output with padding_type chosen as CENTER, the output will be like below where x is the output pixel and p is the padded byte _______________ __ _ |ppppppppppppppp| ↑ ↑ |xxxxxxxxxxxxxxx| Hs |xxxxxxxxxxxxxxx| Ho ↓_ |xxxxxxxxxxxxxxx| |ppppppppppppppp| ↓ ――――――――――――――― ―― |← Wo = Ws →| For pillar_boxed output with padding_type chosen as CENTER, the output will be like below where x is the output pixel and p is the padded byte _______________ ____ |ppxxxxxxxxxxxpp| ↑ |ppxxxxxxxxxxxpp| |ppxxxxxxxxxxxpp| Ho=Hs |ppxxxxxxxxxxxpp| |ppxxxxxxxxxxxpp| ↓ ――――――――――――――― ____ |← Wo →| |← Ws →| If aspect_ratio is set to false, then The Diff calculation will be ignored and the user configured value for scaled_width and scaled_height will be directly used as Ws and Hs. For example, with aspect_ratio set to false,padding set to CENTER, user configured scaled_width(Ws) < Wi and user configured scaled_height(Hs) < Hi, output image will be like below where x is the output pixel and p is the padded byte _______________ __ |ppppppppppppppp| ↑ __ |ppxxxxxxxxxxxpp| ↑ |ppxxxxxxxxxxxpp| Ho = Hs |ppxxxxxxxxxxxpp| ↓ |ppppppppppppppp| ↓ ―― ――――――――――――――― ―― |← Wo →| |← Ws →| 3. YUV to RGB color space conversion using BT-601/709 ITU Standard (applicable only for NV12, IYUV inputs) 4. Normalization and quantization (if configured) The norm_quant process uses input config params - channel_mean_r/g/b, channel_stddev_r/g/b, q_zp, q_scale, which is used to derive qOffset & qMultiplier for r/g/b channels qOffset_r/g/b = channel_mean_r/g/b x 255.0 qMultiplier_r/g/b = q_scale / (channel_stddev_r/g/b * 255.0) The final normalized quantized pixel is computed as below resized_pixel_r/g/b = resized_raw_pixel_r/g/b - qOffset output_pixel_r/g/b = (resized_pixel_r/g/b x qMultiplier_r/g/b) + q_zp output_pixel_r/g/b = clamp(output_pixel_r/g/b, -128, 127) -> for INT8 output output_pixel_r/g/b = clamp(output_pixel_r/g/b, -32768, 32767) -> for INT16 output 5. Tessellates the final resized, normalized, quantized RGB output (if configured) Refer to the documentation of atomic tessellate graph for the tessellation strategy. Output is 2 buffers in contiguous memory composed of: |-----tesselated buffer-----|-----resized buffer-----| Note: GStreamer simaaiprocessmla plugin knows to take only the tesselated only based on the JSON configuration file provided. Future iterations of this graph will only provide the tesselated buffer as an output. Supported Input-Output Combinations =================================== +-------------------+------------------+------------------+---------+ | Scaling Type | Input Image Type | Output Data Type | Support | +-------------------+------------------+------------------+---------+ | BILINEAR | IYUV | INT8 | Yes | | | | INT16 | No | | | NV12 | INT8 | Yes | | | | INT16 | No | | | RGB | INT8 | Yes | | | | INT16 | Yes | | | BGR | INT8 | Yes | | | | INT16 | Yes | | | GRAY | INT8 | Yes | | | | INT16 | Yes | +-------------------+------------------+------------------+---------+ | NEAREST_NEIGHBOR | IYUV | INT8 | Yes | | | | INT16 | No | | | NV12 | INT8 | Yes | | | | INT16 | No | | | RGB | INT8 | No | | | | INT16 | No | | | BGR | INT8 | No | | | | INT16 | No | | | GRAY | INT8 | No | | | | INT16 | No | +-------------------+-------- ----------+------------------+---------+ | INTER_AREA | IYUV | INT8 | Yes | | | | INT16 | No | | | NV12 | INT8 | Yes | | | | INT16 | No | | | RGB | INT8 | Yes | | | | INT16 | No | | | BGR | INT8 | Yes | | | | INT16 | No | | | GRAY | INT8 | No | | | | INT16 | No | +-------------------+------------------+------------------+---------+ | BICUBIC | IYUV | INT8 | No | | | | INT16 | No | | | NV12 | INT8 | No | | | | INT16 | No | | | RGB | INT8 | No | | | | INT16 | No | | | BGR | INT8 | No | | | | INT16 | No | | | GRAY | INT8 | No | | | | INT16 | No | +-------------------+------------------+------------------+---------+ Graph Info ---------- Overview ******** .. list-table:: |graph| :widths: 12 20 :stub-columns: 1 * - Graph Name - |graph| * - Graph ID - 200 * - Operations Supported - Resize Normalize Quantize Tesselate * - Available Since Yocto Build - B684 Example Config -------------- |ev74_example_config_text| .. code-block:: json { "version": 0.1, "node_name": "ev-gen-preproc", "simaai__params": { "params": 15, "index": 0, "cpu": 1, "next_cpu": 2, "graph_id": 200, "no_of_outbuf": 2, "ibufname": "allegrodec", "out_sz": 1572864, "img_height": 720, "img_width": 1280, "tile_width": 32, "tile_height": 86, "input_width": 1280, "input_height": 720, "output_width": 512, "output_height": 512, "scaled_width": 512, "scaled_height": 288, "batch_size": 1, "normalize": 0, "rgb_interleaved": 1, "aspect_ratio": 1, "input_depth": 3, "output_depth": 3, "quant_scale": 53.59502780503762, "quant_zp": -14, "mean_r": 0.485, "mean_g": 0.456, "mean_b": 0.406, "std_dev_r": 0.229, "std_dev_g": 0.224, "std_dev_b": 0.225, "input_type": 0, "scaling_type": 1, "output_type": 0, "padding_type": 0, "offset": 786432, "debug": 0, "dump_data": 1 } } Parameters ********** .. list-table:: |graph| Params :widths: 10 50 10 10 10 10 :header-rows: 1 * - Parameter Name - Parameter Description - Data Type - Default - Min - Max * - tile_width - Width of the Slice/Tile for tessellation from model tar.gz \*_mpk.json 'slice_width' tesselation transform - int32_t - 32 - 1 - 4096 * - tile_height - Height of the Slice/Tile for tessellation from model tar.gz \*_mpk.json 'slice_width' tesselation transform - int32_t - 16 - 1 - 4096 * - input_width - Width of the input image - int32_t - 1920 - 1 - 4096 * - input_height - Height of the input image - int32_t - 1080 - 1 - 4096 * - output_width - Width of the output image - int32_t - 640 - 1 - 4096 * - output_height - Height of the output image - int32_t - 640 - 1 - 4096 * - scaled_width - Width of output image maintaining the aspect ratio of input image. If aspect_ratio flags is set to false, this value will be used. If aspect_ratio flags is set to true, this value will be auto calculated in the graph. - int32_t - 640 - 1 - 4096 * - scaled_height - Height of output image maintaining the aspect ratio of input image. If aspect_ratio flags is set to false, this value will be used. If aspect_ratio flags is set to true, this value will be auto calculated in the graph. - int32_t - 360 - 1 - 4096 * - batch_size - Number of input images to be preprocessed at once - int32_t - 1 - 1 - 50 * - normalize - True (1) => the output image will be normalized and quantized, False (0) => the output image will be neither normalized nor quantized. - int32_t - 1 - 0 - 1 * - rgb_interleaved - Output image should be tessellated(0) or not(1). Set it to 1 as explicit tessellation kernel is invoked in the graph. - int32_t - 1 - 0 - 1 * - aspect_ratio - True (1) => Maintain input aspect ratio in resized output image by adding necessary padding, False (0) => Output image height and width will be same as scaled_height & scaled_width values. - int32_t - 1 - 0 - 1 * - input_depth - Depth of the the input image - int32_t - 3 - 1 - 3 * - output_depth - Depth of the the output image - only 3 is supported at the moment - int32_t - 3 - 3 - 3 * - quant_scale - Quantization scale from model tar.gz \*_mpk.json 'channel_params'[0] - float - 1.0 - 0.0 - 1000.0 * - quant_zp - Quantization zero point from model tar.gz \*_mpk.json 'channel_params'[1] - int32_t - 0 - -128 - 127 * - mean_r - Dataset mean for Channel R to be used for normalization - float - 0.003921569 - 0.0 - 1.0 * - mean_g - Dataset mean for Channel G to be used for normalization - float - 0.003921569 - 0.0 - 1.0 * - mean_b - Dataset mean for Channel B to be used for normalization - float - 0.003921569 - 0.0 - 1.0 * - std_dev_r - Dataset std. deviation for Channel R to be used for normalization - float - 0.0 - 0.0 - 1.0 * - std_dev_g - Dataset std. deviation for Channel G to be used for normalization - float - 0.0 - 0.0 - 1.0 * - std_dev_b - Dataset std. deviation for Channel B to be used for normalization - float - 0.0 - 0.0 - 1.0 * - input_type - Input Image type 0 → IYUV420, 1 → NV12, 2 → RGB, 3 → BGR - int32_t - 1 - 0 - 3 * - scaling_type - Resize Scaling Algo to be used : 0 → no_scaling, 1 → Nearest Neighbor, 2 → Bicubic, 3 → Bilinear - int32_t - 3 - 0 - 3 * - output_type - Output Image type : 0 → RGB, 1 → BGR - int32_t - 0 - 0 - 1 * - padding_type - Padding to be used : 0 → TopLeft, 1 → TopRight, 2 → BottomLeft, 3 → BottomRight, 4 → Center (Should be set if LetterBox or PillarBox or No Padding is required) - int32_t - 4 - 0 - 4 * - out_sz - `out_sz = tesselated output + output_size` where `output_size` is the expected tensor shape (for example, resized to 224x224x3 would be an output size of 150528 bytes) - int32_t - N/A - N/A - N/A * - offset - Size of tesselated output, can be extracted from model tar.gz \*_mpk.json {\"plugins\" -> \"name\": \"\*_tesselation_transform\" -> \"output_nodes\" -> \"size\"}. It is not uncommon for the tesselated_output size to be the same as the output_size. - int32_t - N/A - N/A - N/A * - dump_data - Enable (1) or disable (0) dumping of output tensor to ``/tmp`` directory on device with the name ``{node_name}-###.out``. The sequence number ``###`` will increment with each output dump (e.g., -001.out, -002.out, ...). - int32_t - 0 - 0 - 1 * - debug - Enable more debug logs, 0 => disable, 1=> additonal logs, 2 => profile runtime of individual input tensors, 3 => profile overall graph runtime. - int32_t - [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] - 0 - 3 * - dump_data - Enable (1) or disable (0) dumping of output tensor to ``/tmp`` directory on device with the name ``{node_name}-###.out``. The sequence number ``###`` will increment with each output dump (e.g., -001.out, -002.out, ...). - int32_t - 0 - 0 - 1 |dependent_app| --------------- .. note:: * The need to write, build and execute a dependent application for the CVU will be removed in an upcoming release. |ev74_dependent_app_brief| .. _cvu_generic_preproc_cvu_cfg_graph_cpp: How to compile using the files below ************************************ |ev74_dependent_app_footer| Directory structure ******************* .. code:: shell . ├── CMakeLists.txt ├── cvu_cfg_graph.cpp └── cvu_cfg_main.cpp Code files ********** .. code-block:: cpp :caption: cvu_cfg_graph.cpp :linenos: #include #include #include #include #define SIMA_IPC_GRAPH_NAME "SIMA_GENERIC_PREPROC" #define SIMA_IPC_GRAPH_CODE (200) #define INPUT_WIDTH (1) #define INPUT_HEIGHT (2) #define OUTPUT_WIDTH (3) #define OUTPUT_HEIGHT (4) #define SCALED_WIDTH (5) #define SCALED_HEIGHT (6) #define BATCH_SIZE (7) #define NORMALIZE (8) #define RGB_INTERLEAVED (9) #define ASPECT_RATIO (10) #define TILE_WIDTH (11) #define TILE_HEIGHT (12) #define INPUT_DEPTH (13) #define OUTPUT_DEPTH (14) #define QUANT_ZP (15) #define QUANT_SCALE (16) #define MEAN_R (17) #define MEAN_G (18) #define MEAN_B (19) #define STD_DEV_R (20) #define STD_DEV_G (21) #define STD_DEV_B (22) #define INPUT_TYPE (23) #define OUTPUT_TYPE (24) #define SCALING_TYPE (25) #define OFFSET (26) #define PADDING_TYPE (27) #define INPUT_STRIDE (28) #define OUTPUT_STRIDE (29) #define OUTPUT_DTYPE (30) #define DEBUG (31) void configure_graph(const char *json_in) { simaai_params_t *params = parser_node_struct_init(); if(params == NULL) { std::cout << "Unable to create params \n"; } if((parse_json_file(json_in, params) != PARSER_SUCCESS)) { std::cout << "Unable to start parser \n"; } uint8_t *buf = (uint8_t *)calloc(1, sizeof(uint8_t) * 16); int val = *((int *)parser_get_int(params, "input_width")); send_i32_param(2, SIMA_IPC_GRAPH_CODE, INPUT_WIDTH, buf, val); val = *((int *)parser_get_int(params, "input_height")); send_i32_param(2, SIMA_IPC_GRAPH_CODE, INPUT_HEIGHT, buf, val); val = *((int *)parser_get_int(params, "output_width")); send_i32_param(2, SIMA_IPC_GRAPH_CODE, OUTPUT_WIDTH, buf, val); val = *((int *)parser_get_int(params, "output_height")); send_i32_param(2, SIMA_IPC_GRAPH_CODE, OUTPUT_HEIGHT, buf, val); val = *((int *)parser_get_int(params, "scaled_width")); send_i32_param(2, SIMA_IPC_GRAPH_CODE, SCALED_WIDTH, buf, val); val = *((int *)parser_get_int(params, "scaled_height")); send_i32_param(2, SIMA_IPC_GRAPH_CODE, SCALED_HEIGHT, buf, val); val = *((int *)parser_get_int(params, "batch_size")); send_i32_param(2, SIMA_IPC_GRAPH_CODE, BATCH_SIZE, buf, val); val = *((int *)parser_get_int(params, "normalize")); send_i32_param(2, SIMA_IPC_GRAPH_CODE, NORMALIZE, buf, val); val = *((int *)parser_get_int(params, "rgb_interleaved")); send_i32_param(2, SIMA_IPC_GRAPH_CODE, RGB_INTERLEAVED, buf, val); val = *((int *)parser_get_int(params, "aspect_ratio")); send_i32_param(2, SIMA_IPC_GRAPH_CODE, ASPECT_RATIO, buf, val); val = *((int *)parser_get_int(params, "tile_width")); send_i32_param(2, SIMA_IPC_GRAPH_CODE, TILE_WIDTH, buf, val); val = *((int *)parser_get_int(params, "tile_height")); send_i32_param(2, SIMA_IPC_GRAPH_CODE, TILE_HEIGHT, buf, val); val = *((int *)parser_get_int(params, "input_depth")); send_i32_param(2, SIMA_IPC_GRAPH_CODE, INPUT_DEPTH, buf, val); val = *((int *)parser_get_int(params, "output_depth")); send_i32_param(2, SIMA_IPC_GRAPH_CODE, OUTPUT_DEPTH, buf, val); val = *((int *)parser_get_int(params, "quant_zp")); send_i32_param(2, SIMA_IPC_GRAPH_CODE, QUANT_ZP, buf, val); double val_f = *((double *)parser_get_double(params, "quant_scale")); send_float_param(2, SIMA_IPC_GRAPH_CODE, QUANT_SCALE, buf, val_f); val_f = *((double *)parser_get_double(params, "mean_r")); send_float_param(2, SIMA_IPC_GRAPH_CODE, MEAN_R, buf, val_f); val_f = *((double *)parser_get_double(params, "mean_g")); send_float_param(2, SIMA_IPC_GRAPH_CODE, MEAN_G, buf, val_f); val_f = *((double *)parser_get_double(params, "mean_b")); send_float_param(2, SIMA_IPC_GRAPH_CODE, MEAN_B, buf, val_f); val_f = *((double *)parser_get_double(params, "std_dev_r")); send_float_param(2, SIMA_IPC_GRAPH_CODE, STD_DEV_R, buf, val_f); val_f = *((double *)parser_get_double(params, "std_dev_g")); send_float_param(2, SIMA_IPC_GRAPH_CODE, STD_DEV_G, buf, val_f); val_f = *((double *)parser_get_double(params, "std_dev_b")); send_float_param(2, SIMA_IPC_GRAPH_CODE, STD_DEV_B, buf, val_f); val = *((int *)parser_get_int(params, "input_type")); send_i32_param(2, SIMA_IPC_GRAPH_CODE, INPUT_TYPE, buf, val); val = *((int *)parser_get_int(params, "output_type")); send_i32_param(2, SIMA_IPC_GRAPH_CODE, OUTPUT_TYPE, buf, val); val = *((int *)parser_get_int(params, "scaling_type")); send_i32_param(2, SIMA_IPC_GRAPH_CODE, SCALING_TYPE, buf, val); val = *((int *)parser_get_int(params, "offset")); send_i32_param(2, SIMA_IPC_GRAPH_CODE, OFFSET, buf, val); val = *((int *)parser_get_int(params, "padding_type")); send_i32_param(2, SIMA_IPC_GRAPH_CODE, PADDING_TYPE, buf, val); val = *((int *)parser_get_int(params, "input_stride")); send_i32_param(2, SIMA_IPC_GRAPH_CODE, INPUT_STRIDE, buf, val); val = *((int *)parser_get_int(params, "output_stride")); send_i32_param(2, SIMA_IPC_GRAPH_CODE, OUTPUT_STRIDE, buf, val); val = *((int *)parser_get_int(params, "output_dtype")); send_i32_param(2, SIMA_IPC_GRAPH_CODE, OUTPUT_DTYPE, buf, val); // Should be sent last to trigger the configuration completion at CVU side val = *((int *)parser_get_int(params, "debug")); send_i32_param(2, SIMA_IPC_GRAPH_CODE, DEBUG, buf, val); std::cout << "Completed " << SIMA_IPC_GRAPH_NAME << " graph configure \n"; free(buf); } .. _cvu_generic_preproc_cvu_cfg_main_cpp: .. code-block:: cpp :caption: cvu_cfg_main.cpp :linenos: #include #include #include #include #include extern void configure_graph(const char *json_fpath); bool is_valid_path(const char *path) { struct stat buffer; return (stat(path, &buffer) == 0); } int main(int argc, char **argv) { const char *json_path = argv[1]; if(is_valid_path(json_path)) { configure_graph(json_path); } else { std::cerr << "Invalid path: " << json_path << std::endl; return 1; } return 0; } .. _cvu_generic_preproc_cmakelists: .. code-block:: cmake :caption: CMakeLists.txt :linenos: cmake_minimum_required(VERSION 3.16) # set the project name set(GRAPH_NAME "genpreproc_200") set(PROJECT_NAME "CVU Graph Cfg. App.") project("${PROJECT_NAME}" VERSION 0.1 DESCRIPTION "CVU Graph Configuration Application" LANGUAGES C CXX) set(PIPELINE_SOURCES cvu_cfg_graph.cpp) execute_process( COMMAND git rev-parse --abbrev-ref HEAD WORKING_DIRECTORY ${CMAKE_SOURCE_DIR} OUTPUT_VARIABLE GIT_BRANCH OUTPUT_STRIP_TRAILING_WHITESPACE ) # Get the latest abbreviated commit hash of the working branch execute_process( COMMAND git log -1 --format=%h WORKING_DIRECTORY ${CMAKE_SOURCE_DIR} OUTPUT_VARIABLE GIT_COMMIT_HASH OUTPUT_STRIP_TRAILING_WHITESPACE ) link_directories(${CMAKE_INSTALL_DIR}/core ${CMAKE_INSTALL_DIR}/gst ) include(GNUInstallDirs) # ev-configuration genertion executable set(EV_EXEC_NAME "${GRAPH_NAME}_cvu_cfg_app") add_executable(${EV_EXEC_NAME} cvu_cfg_main.cpp cvu_cfg_graph.cpp) target_link_libraries(${EV_EXEC_NAME} PUBLIC simaaiparser evhelpers) INSTALL(TARGETS "${EV_EXEC_NAME}") .. |graph| replace:: SIMA_GENERIC_PREPROC