afe.apis.model ============== .. py:module:: afe.apis.model Classes ------- .. autoapisummary:: afe.apis.model.Model Module Contents --------------- .. py:class:: Model(net: afe.ir.net.AwesomeNet, fp32_net: Optional[afe.ir.net.AwesomeNet] = None) .. py:method:: execute(inputs: afe.apis.defines.InputValues, *, fast_mode: bool = False, log_level: int | None = logging.NOTSET, keep_layer_outputs: list[afe.ir.defines.NodeName] | str | None = None, output_file_path: str | None = None) -> List[numpy.ndarray] Run input data through the quantized model. :param inputs: Dictionary of placeholder node names (str) to the input data. :param fast_mode: If True, use a fast implementation of operators. If False, use an implementation that exactly matches execution on the MLA. :param log_level: Logging level. :param keep_layer_outputs: List of quantized model layer output names that should be saved. Each element of a list must have a valid name, according to the model layer outputs. Iff 'all', all intermediate results are saved. :param output_file_path: Location where the layer outputs should be saved. If defined, keep_layer_outputs argument must also be provided by the user. Returns: Outputs of quantized model. Also, saves the requested intermediate results inside output_file_path location. .. py:method:: save(model_name: str, output_directory: str = '', *, log_level: Optional[int] = logging.NOTSET) -> None Save the quantized model and its floating-point counterpart (if available) to the specified directory. Defaults to the current working directory if no output directory is provided. :param model_name: Name for the saved quantized model file with a `.sima` extension. :type model_name: str :param output_directory: Directory to save the model files. Defaults to the current working directory. :type output_directory: str, optional :param log_level: Logging level for the operation. Defaults to `logging.NOTSET`. :type log_level: Optional[int], optional :raises UserFacingException: If an error occurs during the save process. Example ~~~~~~~~ >>> model = Model(quantized_net, fp32_net) >>> model.save("my_model", output_directory="models/") .. py:method:: load(model_name: str, network_directory: str = '', *, log_level: Optional[int] = logging.NOTSET) -> Model :staticmethod: .. py:method:: compile(output_path: str, batch_size: int = 1, compress: bool = True, log_level: Optional[int] = logging.NOTSET, tessellate_parameters: Optional[afe.backends.mpk.interface.TessellateParameters] = None, l2_caching_mode: afe.backends.mpk.interface.L2CachingMode = L2CachingMode.NONE, **kwargs) -> None Compile the quantized model into a `.tar.gz` package for deployment in an MPK package. The compiled package includes the binary model and a JSON structure file, saved in `output_path` as `_mpk.tar.gz`. Batch size can be specified, though compiler optimizations may adjust it for optimal performance. The first dimension of input tensors must represent batch size. :param output_path: Directory to save the compiled package. Created if it doesn't exist. :type output_path: str :param batch_size: Batch size for compilation. Defaults to `1`. :type batch_size: int, optional :param compress: Enable DRAM data compression for the `.lm` file. Defaults to `True`. :type compress: bool, optional :param log_level: Logging level. Defaults to `logging.NOTSET`. :type log_level: Optional[int], optional :param tessellate_parameters: Internal use for MLA tessellation parameters. :type tessellate_parameters: Optional[TessellateParameters], optional :param l2_caching_mode: Internal use for N2A compiler's L2 caching. Defaults to `L2CachingMode.NONE`. :type l2_caching_mode: L2CachingMode, optional :param \*\*kwargs: Additional internal options, including: - retained_temporary_directory_name (str): Path to retain intermediate files. - use_power_limits (bool): Enable power limits during compilation. - max_mla_power (float): Set maximum MLA power consumption. - layer_norm_use_fp32_intermediates (bool): Use FP32 intermediates for layer normalization. - rms_norm_use_fp32_intermediates (bool): Use FP32 intermediates for RMS normalization. :raises UserFacingException: If compilation fails due to invalid parameters or errors. Example: ~~~~~~~~ >>> model = Model(quantized_net) >>> model.compile(output_path="compiled_models/", batch_size=4, compress=True) .. py:method:: create_auxiliary_network(transforms: List[afe.apis.transform.Transform], input_types: Dict[afe.ir.defines.InputName, afe.ir.tensor_type.TensorType], *, target: sima_utils.common.Platform = gen1_target, log_level: Optional[int] = logging.NOTSET) -> Model :staticmethod: .. py:method:: compose(nets: List[Model], combined_model_name: str = 'main', log_level: Optional[int] = logging.NOTSET) -> Model :staticmethod: .. py:method:: evaluate(evaluation_data: Iterable[Tuple[afe.apis.defines.InputValues, afe.apis.compilation_job_base.GroundTruth]], criterion: afe.apis.statistic.Statistic[Tuple[List[numpy.ndarray], afe.apis.compilation_job_base.GroundTruth], str], *, fast_mode: bool = False, log_level: Optional[int] = logging.NOTSET) -> str Evaluate the model using the provided evaluation data and criterion. This method runs the model on the given dataset and computes an aggregate result using the specified criterion. It supports a fast execution mode for quicker evaluations and customizable logging levels for diagnostic purposes. :param evaluation_data: An iterable of tuples where each tuple contains input values and the corresponding ground truth for evaluation. :param criterion: A statistical function used to compute the evaluation metric based on the model's outputs and ground truth. :param fast_mode: Optional; if set to True, the evaluation will execute in a faster but potentially less thorough manner. Defaults to False. :param log_level: Optional; specifies the logging level for evaluation. Defaults to logging.NOTSET. :return: A string representing the final result of the evaluation based on the criterion. :raises Exception: If an error occurs during model execution, it is sanitized and re-raised. .. py:method:: analyze_quantization_error(evaluation_data: Iterable[afe.apis.defines.InputValues], error_metric: afe.core.graph_analyzer.utils.Metric, *, local_feed: bool, log_level: Optional[int] = logging.NOTSET) .. py:method:: get_performance_metrics(output_kpi_path: str, *, log_level: Optional[int] = logging.NOTSET) .. py:method:: generate_elf_and_reference_files(input_data: Iterable[afe.apis.defines.InputValues], output_dir: str, *, batch_size: int = 1, compress: bool = True, tessellate_parameters: Optional[afe.backends.mpk.interface.TessellateParameters] = None, log_level: Optional[int] = logging.NOTSET, l2_caching_mode: afe.backends.mpk.interface.L2CachingMode = L2CachingMode.NONE) -> None .. py:method:: execute_in_accelerator_mode(input_data: Iterable[afe.apis.defines.InputValues], devkit: str, *, username: str = cp.DEFAULT_USERNAME, password: str = '', batch_size: int = 1, compress: bool = True, tessellate_parameters: Optional[afe.backends.mpk.interface.TessellateParameters] = None, log_level: Optional[int] = logging.NOTSET, l2_caching_mode: afe.backends.mpk.interface.L2CachingMode = L2CachingMode.NONE) -> List[numpy.ndarray]