Run Inference
run_inference(self, model, in_data: Union[np.ndarray, bytes], do_async=False, callback_f=None)
Description
This method runs inference on the input data using the specified model. It can run synchronously or asynchronously based on the do_async flag. If asynchronous, it requires a callback function. This method has a timeout, but if the inference takes much longer than expected, try recovering and resetting the device.
- Possible failures:
- Giving a different tensor than what is expected.
Check that the input tensor size is as set in the API.
- The MLSoC does not respond to the frames we are sending, returns an error or is taking longer than expected inference time.
First do a soft reset of the model. Perform an unload_model and load_model (soft reset) and retry the inference.
If that fails, reset the MLSoC invoking reset(self, device). The reset API is a blocking call, once the function returns, re-enumerate to ensure the GUID is found upon reboot, connect using the same GUID, and retry loading the model and then running inference again.
If this does not work, then power cycle the machine.
Parameters
model (ModelReference): The model reference
in_data (Union[np.ndarray, bytes]): Input data for inference
do_async (bool, optional): Flag to run inference asynchronously. Default is False.
callback_f (function, optional): Callback function for asynchronous inference. Default is None.
Returns
int: Error code if any else 0
Raises
Exception: If the model reference is mismatched
Usage
1input_data = np.random.rand(1, 224, 224, 3).astype(np.float32)
2device_interface.run_inference(model_ref, input_data)