sima_utils.transformer.preproc.whisper_preproc ============================================== .. py:module:: sima_utils.transformer.preproc.whisper_preproc Attributes ---------- .. autoapisummary:: sima_utils.transformer.preproc.whisper_preproc.SAMPLE_RATE sima_utils.transformer.preproc.whisper_preproc.N_FFT sima_utils.transformer.preproc.whisper_preproc.HOP_LENGTH sima_utils.transformer.preproc.whisper_preproc.CHUNK_LENGTH sima_utils.transformer.preproc.whisper_preproc.N_SAMPLES sima_utils.transformer.preproc.whisper_preproc.N_FRAMES sima_utils.transformer.preproc.whisper_preproc.stft_window Functions --------- .. autoapisummary:: sima_utils.transformer.preproc.whisper_preproc.get_ffmpeg_file sima_utils.transformer.preproc.whisper_preproc.load_audio sima_utils.transformer.preproc.whisper_preproc.pad_or_trim sima_utils.transformer.preproc.whisper_preproc.mel_filters sima_utils.transformer.preproc.whisper_preproc.log_mel_spectrogram sima_utils.transformer.preproc.whisper_preproc.stft_numpy sima_utils.transformer.preproc.whisper_preproc.preprocess_audio sima_utils.transformer.preproc.whisper_preproc.load_and_preprocess_numpy Module Contents --------------- .. py:data:: SAMPLE_RATE :value: 16000 .. py:data:: N_FFT :value: 400 .. py:data:: HOP_LENGTH :value: 160 .. py:data:: CHUNK_LENGTH :value: 30 .. py:data:: N_SAMPLES :value: 480000 .. py:data:: N_FRAMES :value: 3000 .. py:function:: get_ffmpeg_file() -> pathlib.Path .. py:function:: load_audio(file: str, sr: int = SAMPLE_RATE) -> numpy.ndarray Open an audio file and read as mono waveform, resampling as necessary :param file: The audio file to open. :param sr: The sample rate to resample the audio if necessary. Returns A NumPy array containing the audio waveform, in float32 dtype. .. py:function:: pad_or_trim(array: numpy.ndarray, length: int = N_SAMPLES, *, axis: int = -1) -> numpy.ndarray Pad or trim the audio array to N_SAMPLES, as expected by the encoder. .. py:function:: mel_filters(n_mels: int, hf_preprocessor_config_json_file: pathlib.Path | None = None) -> numpy.ndarray load the mel filterbank matrix for projecting STFT into a Mel spectrogram. Allows decoupling librosa dependency; saved using: np.savez_compressed( "mel_filters.npz", mel_80=librosa.filters.mel(sr=16000, n_fft=400, n_mels=80), mel_128=librosa.filters.mel(sr=16000, n_fft=400, n_mels=128), ) .. py:data:: stft_window .. py:function:: log_mel_spectrogram(audio: numpy.ndarray, n_mels: int = 80, hf_preprocessor_config_json_file: pathlib.Path | None = None, stft_style: str = 'scipy') Compute the log-Mel spectrogram of :param audio: A NumPy array containing the audio waveform in 16 kHz. :param n_mels: The number of Mel-frequency filters, only 80 is supported. :returns: A Tensor that contains the Mel spectrogram. .. py:function:: stft_numpy(signal: numpy.ndarray, window_size: int = 400, hop_size: int = 160, pad_mode: str = 'reflect', window_type: str = 'hann', center: bool = True) -> numpy.ndarray .. py:function:: preprocess_audio(audio: numpy.ndarray, hf_preprocessor_config_json_file: pathlib.Path | None) -> numpy.ndarray .. py:function:: load_and_preprocess_numpy(audio_file: pathlib.Path, hf_preprocessor_config_json_file: pathlib.Path | None, out_dtype: type | str = np.float32) -> numpy.ndarray