Run Inference

run_inference(self, model, in_data: Union[np.ndarray, bytes], do_async=False, callback_f=None)

Description

This method runs inference on the input data using the specified model. It can run synchronously or asynchronously based on the do_async flag. If asynchronous, it requires a callback function. This method has a timeout, but if the inference takes much longer than expected, try recovering and resetting the device.

  • Possible failures:
    • Giving a different tensor than what is expected.
      • Check that the input tensor size is as set in the API.

    • The MLSoC does not respond to the frames we are sending, returns an error or is taking longer than expected inference time.
      • First do a soft reset of the model. Perform an unload_model and load_model (soft reset) and retry the inference.

    • If that fails, reset the MLSoC invoking reset(self, device). The reset API is a blocking call, once the function returns, re-enumerate to ensure the GUID is found upon reboot, connect using the same GUID, and retry loading the model and then running inference again.

    • If this does not work, then power cycle the machine.

Parameters

  • model (ModelReference): The model reference

  • in_data (Union[np.ndarray, bytes]): Input data for inference

  • do_async (bool, optional): Flag to run inference asynchronously. Default is False.

  • callback_f (function, optional): Callback function for asynchronous inference. Default is None.

Returns

  • int: Error code if any else 0

Raises

  • Exception: If the model reference is mismatched

Usage

1input_data = np.random.rand(1, 224, 224, 3).astype(np.float32)
2device_interface.run_inference(model_ref, input_data)