Run Inference

run_inference(self, model, in_data: Union[np.ndarray, bytes], do_async=False, callback_f=None)

Description

This method runs inference on the input data using the specified model. It can run synchronously or asynchronously based on the do_async flag. If asynchronous, it requires a callback function. This method has a timeout, but if the inference takes much longer than expected, try recovering and resetting the device.

Possible failures:
- Giving a different tensor than what is expected.
  
  Check that the input tensor size is as set in the API.
- The MLSoC does not respond to the frames we are sending, returns an error or is taking longer than expected inference time.
  
  First do a soft reset of the model. Perform an unload_model and load_model (soft reset) and retry the inference.
- If that fails, reset the MLSoC invoking reset(self, device). The reset API is a blocking call, once the function returns, re-enumerate to ensure the GUID is found upon reboot, connect using the same GUID, and retry loading the model and then running inference again.
- If this does not work, then power cycle the machine.

Parameters

model (ModelReference): The model reference
in_data (Union[np.ndarray, bytes]): Input data for inference
do_async (bool, optional): Flag to run inference asynchronously. Default is False.
callback_f (function, optional): Callback function for asynchronous inference. Default is None.

Returns

int: Error code if any else 0

Raises

Exception: If the model reference is mismatched

Usage

input_data = np.random.rand(1, 224, 224, 3).astype(np.float32)
device_interface.run_inference(model_ref, input_data)