afe.apis.modelο
Classesο
Module Contentsο
- class afe.apis.model.Model(net: afe.ir.net.AwesomeNet, fp32_net: afe.ir.net.AwesomeNet | None = None)ο
- execute(inputs: afe.apis.defines.InputValues, *, fast_mode: bool = False, log_level: int | None = logging.NOTSET, keep_layer_outputs: list[afe.ir.defines.NodeName] | str | None = None, output_file_path: str | None = None) List[numpy.ndarray]ο
Run input data through the quantized model.
- Parameters:
inputs β Dictionary of placeholder node names (str) to the input data.
fast_mode β If True, use a fast implementation of operators. If False, use an implementation that exactly matches execution on the MLA.
log_level β Logging level.
keep_layer_outputs β List of quantized model layer output names that should be saved. Each element of a list must have a valid name, according to the model layer outputs. Iff βallβ, all intermediate results are saved.
output_file_path β Location where the layer outputs should be saved. If defined, keep_layer_outputs argument must also be provided by the user.
- Returns: Outputs of quantized model.
Also, saves the requested intermediate results inside output_file_path location.
- save(model_name: str, output_directory: str = '', *, log_level: int | None = logging.NOTSET) Noneο
Save the quantized model and its floating-point counterpart (if available) to the specified directory.
Defaults to the current working directory if no output directory is provided.
- Parameters:
model_name (str) β Name for the saved quantized model file with a .sima extension.
output_directory (str, optional) β Directory to save the model files. Defaults to the current working directory.
log_level (Optional[int], optional) β Logging level for the operation. Defaults to logging.NOTSET.
- Raises:
UserFacingException β If an error occurs during the save process.
Exampleο
>>> model = Model(quantized_net, fp32_net) >>> model.save("my_model", output_directory="models/")
- static load(model_name: str, network_directory: str = '', *, log_level: int | None = logging.NOTSET) Modelο
- compile(output_path: str, batch_size: int = 1, compress: bool = True, log_level: int | None = logging.NOTSET, tessellate_parameters: afe.backends.mpk.interface.TessellateParameters | None = None, l2_caching_mode: afe.backends.mpk.interface.L2CachingMode = L2CachingMode.NONE, **kwargs) Noneο
Compile the quantized model into a .tar.gz package for deployment in an MPK package.
The compiled package includes the binary model and a JSON structure file, saved in output_path as <model_name>_mpk.tar.gz. Batch size can be specified, though compiler optimizations may adjust it for optimal performance. The first dimension of input tensors must represent batch size.
- Parameters:
output_path (str) β Directory to save the compiled package. Created if it doesnβt exist.
batch_size (int, optional) β Batch size for compilation. Defaults to 1.
compress (bool, optional) β Enable DRAM data compression for the .lm file. Defaults to True.
log_level (Optional[int], optional) β Logging level. Defaults to logging.NOTSET.
tessellate_parameters (Optional[TessellateParameters], optional) β Internal use for MLA tessellation parameters.
l2_caching_mode (L2CachingMode, optional) β Internal use for N2A compilerβs L2 caching. Defaults to L2CachingMode.NONE.
**kwargs β
Additional internal options, including:
retained_temporary_directory_name (str): Path to retain intermediate files.
use_power_limits (bool): Enable power limits during compilation.
max_mla_power (float): Set maximum MLA power consumption.
layer_norm_use_fp32_intermediates (bool): Use FP32 intermediates for layer normalization.
rms_norm_use_fp32_intermediates (bool): Use FP32 intermediates for RMS normalization.
- Raises:
UserFacingException β If compilation fails due to invalid parameters or errors.
Example:ο
>>> model = Model(quantized_net) >>> model.compile(output_path="compiled_models/", batch_size=4, compress=True)
- static create_auxiliary_network(transforms: List[afe.apis.transform.Transform], input_types: Dict[afe.ir.defines.InputName, afe.ir.tensor_type.TensorType], *, target: sima_utils.common.Platform = gen1_target, log_level: int | None = logging.NOTSET) Modelο
- static compose(nets: List[Model], combined_model_name: str = 'main', log_level: int | None = logging.NOTSET) Modelο
- evaluate(evaluation_data: Iterable[Tuple[afe.apis.defines.InputValues, afe.apis.compilation_job_base.GroundTruth]], criterion: afe.apis.statistic.Statistic[Tuple[List[numpy.ndarray], afe.apis.compilation_job_base.GroundTruth], str], *, fast_mode: bool = False, log_level: int | None = logging.NOTSET) strο
Evaluate the model using the provided evaluation data and criterion.
This method runs the model on the given dataset and computes an aggregate result using the specified criterion. It supports a fast execution mode for quicker evaluations and customizable logging levels for diagnostic purposes.
- Parameters:
evaluation_data β An iterable of tuples where each tuple contains input values and the corresponding ground truth for evaluation.
criterion β A statistical function used to compute the evaluation metric based on the modelβs outputs and ground truth.
fast_mode β Optional; if set to True, the evaluation will execute in a faster but potentially less thorough manner. Defaults to False.
log_level β Optional; specifies the logging level for evaluation. Defaults to logging.NOTSET.
- Returns:
A string representing the final result of the evaluation based on the criterion.
- Raises:
Exception β If an error occurs during model execution, it is sanitized and re-raised.
- analyze_quantization_error(evaluation_data: Iterable[afe.apis.defines.InputValues], error_metric: afe.core.graph_analyzer.utils.Metric, *, local_feed: bool, log_level: int | None = logging.NOTSET)ο
- get_performance_metrics(output_kpi_path: str, *, log_level: int | None = logging.NOTSET)ο
- generate_elf_and_reference_files(input_data: Iterable[afe.apis.defines.InputValues], output_dir: str, *, batch_size: int = 1, compress: bool = True, tessellate_parameters: afe.backends.mpk.interface.TessellateParameters | None = None, log_level: int | None = logging.NOTSET, l2_caching_mode: afe.backends.mpk.interface.L2CachingMode = L2CachingMode.NONE) Noneο
- execute_in_accelerator_mode(input_data: Iterable[afe.apis.defines.InputValues], devkit: str, *, username: str = cp.DEFAULT_USERNAME, password: str = '', batch_size: int = 1, compress: bool = True, tessellate_parameters: afe.backends.mpk.interface.TessellateParameters | None = None, log_level: int | None = logging.NOTSET, l2_caching_mode: afe.backends.mpk.interface.L2CachingMode = L2CachingMode.NONE) List[numpy.ndarray]ο