.. _run_inference: Run Inference ============= .. code-block:: python run_inference(self, model, in_data: Union[np.ndarray, bytes], do_async=False, callback_f=None) **Description** This method runs inference on the input data using the specified model. It can run synchronously or asynchronously based on the do_async flag. If asynchronous, it requires a callback function. This method has a timeout, but if the inference takes much longer than expected, try recovering and resetting the device. * Possible failures: - Giving a different tensor than what is expected. - Check that the input tensor size is as set in the API. - The MLSoC does not respond to the frames we are sending, returns an error or is taking longer than expected inference time. - First do a soft reset of the model. Perform an unload_model and load_model (soft reset) and retry the inference. - If that fails, reset the MLSoC invoking `reset(self, device)`. The reset API is a blocking call, once the function returns, re-enumerate to ensure the GUID is found upon reboot, connect using the same GUID, and retry loading the model and then running inference again. - If this does not work, then power cycle the machine. **Parameters** * `model (ModelReference)`: The model reference * `in_data (Union[np.ndarray, bytes])`: Input data for inference * `do_async (bool, optional)`: Flag to run inference asynchronously. Default is False. * `callback_f (function, optional)`: Callback function for asynchronous inference. Default is None. **Returns** * `int`: Error code if any else 0 **Raises** * `Exception`: If the model reference is mismatched **Usage** .. code-block:: python :linenos: input_data = np.random.rand(1, 224, 224, 3).astype(np.float32) device_interface.run_inference(model_ref, input_data)