.. _run_inference:


Run Inference
=============

.. code-block:: python

      run_inference(self, model, in_data: Union[np.ndarray, bytes], do_async=False, callback_f=None)


**Description**

This method runs inference on the input data using the specified model. It can run
synchronously or asynchronously based on the do_async flag. If asynchronous, it requires a
callback function. This method has a timeout, but if the inference takes much longer than
expected, try recovering and resetting the device.

* Possible failures:
   - Giving a different tensor than what is expected.
      - Check that the input tensor size is as set in the API.
   - The MLSoC does not respond to the frames we are sending, returns an error or is taking longer than expected inference time.
      - First do a soft reset of the model. Perform an unload_model and load_model (soft reset) and retry the inference.
   - If that fails, reset the MLSoC invoking `reset(self, device)`. The reset API is a blocking call, once the function returns, re-enumerate to ensure the GUID is found upon reboot, connect using the same GUID, and retry loading the model and then running inference again.
   - If this does not work, then power cycle the machine.


**Parameters**

* `model (ModelReference)`: The model reference
* `in_data (Union[np.ndarray, bytes])`: Input data for inference
* `do_async (bool, optional)`: Flag to run inference asynchronously. Default is False.
* `callback_f (function, optional)`: Callback function for asynchronous inference. Default is None.

**Returns**

* `int`: Error code if any else 0

**Raises**

* `Exception`: If the model reference is mismatched

**Usage**

.. code-block:: python
    :linenos:

    input_data = np.random.rand(1, 224, 224, 3).astype(np.float32)
    device_interface.run_inference(model_ref, input_data)