sima_utils.transformer.preproc.whisper_preproc
==============================================

.. py:module:: sima_utils.transformer.preproc.whisper_preproc


Attributes
----------

.. autoapisummary::

   sima_utils.transformer.preproc.whisper_preproc.SAMPLE_RATE
   sima_utils.transformer.preproc.whisper_preproc.N_FFT
   sima_utils.transformer.preproc.whisper_preproc.HOP_LENGTH
   sima_utils.transformer.preproc.whisper_preproc.CHUNK_LENGTH
   sima_utils.transformer.preproc.whisper_preproc.N_SAMPLES
   sima_utils.transformer.preproc.whisper_preproc.N_FRAMES
   sima_utils.transformer.preproc.whisper_preproc.stft_window


Functions
---------

.. autoapisummary::

   sima_utils.transformer.preproc.whisper_preproc.get_ffmpeg_file
   sima_utils.transformer.preproc.whisper_preproc.load_audio
   sima_utils.transformer.preproc.whisper_preproc.pad_or_trim
   sima_utils.transformer.preproc.whisper_preproc.mel_filters
   sima_utils.transformer.preproc.whisper_preproc.log_mel_spectrogram
   sima_utils.transformer.preproc.whisper_preproc.stft_numpy
   sima_utils.transformer.preproc.whisper_preproc.preprocess_audio
   sima_utils.transformer.preproc.whisper_preproc.load_and_preprocess_numpy


Module Contents
---------------

.. py:data:: SAMPLE_RATE
   :value: 16000


.. py:data:: N_FFT
   :value: 400


.. py:data:: HOP_LENGTH
   :value: 160


.. py:data:: CHUNK_LENGTH
   :value: 30


.. py:data:: N_SAMPLES
   :value: 480000


.. py:data:: N_FRAMES
   :value: 3000


.. py:function:: get_ffmpeg_file() -> pathlib.Path

.. py:function:: load_audio(file: str, sr: int = SAMPLE_RATE) -> numpy.ndarray

   Open an audio file and read as mono waveform, resampling as necessary

   :param file: The audio file to open.
   :param sr: The sample rate to resample the audio if necessary.

   Returns
       A NumPy array containing the audio waveform, in float32 dtype.


.. py:function:: pad_or_trim(array: numpy.ndarray, length: int = N_SAMPLES, *, axis: int = -1) -> numpy.ndarray

   Pad or trim the audio array to N_SAMPLES, as expected by the encoder.


.. py:function:: mel_filters(n_mels: int, hf_preprocessor_config_json_file: pathlib.Path | None = None) -> numpy.ndarray

   load the mel filterbank matrix for projecting STFT into a Mel spectrogram.
   Allows decoupling librosa dependency; saved using:

       np.savez_compressed(
           "mel_filters.npz",
           mel_80=librosa.filters.mel(sr=16000, n_fft=400, n_mels=80),
           mel_128=librosa.filters.mel(sr=16000, n_fft=400, n_mels=128),
       )


.. py:data:: stft_window

.. py:function:: log_mel_spectrogram(audio: numpy.ndarray, n_mels: int = 80, hf_preprocessor_config_json_file: pathlib.Path | None = None, stft_style: str = 'scipy')

   Compute the log-Mel spectrogram of

   :param audio: A NumPy array containing the audio waveform in 16 kHz.
   :param n_mels: The number of Mel-frequency filters, only 80 is supported.

   :returns: A Tensor that contains the Mel spectrogram.


.. py:function:: stft_numpy(signal: numpy.ndarray, window_size: int = 400, hop_size: int = 160, pad_mode: str = 'reflect', window_type: str = 'hann', center: bool = True) -> numpy.ndarray

.. py:function:: preprocess_audio(audio: numpy.ndarray, hf_preprocessor_config_json_file: pathlib.Path | None) -> numpy.ndarray

.. py:function:: load_and_preprocess_numpy(audio_file: pathlib.Path, hf_preprocessor_config_json_file: pathlib.Path | None, out_dtype: type | str = np.float32) -> numpy.ndarray