sima_utils.transformer.preproc.whisper_preproc
Attributes
Functions
|
|
|
Open an audio file and read as mono waveform, resampling as necessary |
|
Pad or trim the audio array to N_SAMPLES, as expected by the encoder. |
|
load the mel filterbank matrix for projecting STFT into a Mel spectrogram. |
|
Compute the log-Mel spectrogram of |
|
|
|
|
|
Module Contents
- sima_utils.transformer.preproc.whisper_preproc.SAMPLE_RATE = 16000
- sima_utils.transformer.preproc.whisper_preproc.N_FFT = 400
- sima_utils.transformer.preproc.whisper_preproc.HOP_LENGTH = 160
- sima_utils.transformer.preproc.whisper_preproc.CHUNK_LENGTH = 30
- sima_utils.transformer.preproc.whisper_preproc.N_SAMPLES = 480000
- sima_utils.transformer.preproc.whisper_preproc.N_FRAMES = 3000
- sima_utils.transformer.preproc.whisper_preproc.get_ffmpeg_file() pathlib.Path
- sima_utils.transformer.preproc.whisper_preproc.load_audio(file: str, sr: int = SAMPLE_RATE) numpy.ndarray
Open an audio file and read as mono waveform, resampling as necessary
- Parameters:
file – The audio file to open.
sr – The sample rate to resample the audio if necessary.
- Returns
A NumPy array containing the audio waveform, in float32 dtype.
- sima_utils.transformer.preproc.whisper_preproc.pad_or_trim(array: numpy.ndarray, length: int = N_SAMPLES, *, axis: int = -1) numpy.ndarray
Pad or trim the audio array to N_SAMPLES, as expected by the encoder.
- sima_utils.transformer.preproc.whisper_preproc.mel_filters(n_mels: int, hf_preprocessor_config_json_file: pathlib.Path | None = None) numpy.ndarray
load the mel filterbank matrix for projecting STFT into a Mel spectrogram. Allows decoupling librosa dependency; saved using:
- np.savez_compressed(
“mel_filters.npz”, mel_80=librosa.filters.mel(sr=16000, n_fft=400, n_mels=80), mel_128=librosa.filters.mel(sr=16000, n_fft=400, n_mels=128),
)
- sima_utils.transformer.preproc.whisper_preproc.stft_window
- sima_utils.transformer.preproc.whisper_preproc.log_mel_spectrogram(audio: numpy.ndarray, n_mels: int = 80, hf_preprocessor_config_json_file: pathlib.Path | None = None, stft_style: str = 'scipy')
Compute the log-Mel spectrogram of
- Parameters:
audio – A NumPy array containing the audio waveform in 16 kHz.
n_mels – The number of Mel-frequency filters, only 80 is supported.
- Returns:
A Tensor that contains the Mel spectrogram.
- sima_utils.transformer.preproc.whisper_preproc.stft_numpy(signal: numpy.ndarray, window_size: int = 400, hop_size: int = 160, pad_mode: str = 'reflect', window_type: str = 'hann', center: bool = True) numpy.ndarray
- sima_utils.transformer.preproc.whisper_preproc.preprocess_audio(audio: numpy.ndarray, hf_preprocessor_config_json_file: pathlib.Path | None) numpy.ndarray
- sima_utils.transformer.preproc.whisper_preproc.load_and_preprocess_numpy(audio_file: pathlib.Path, hf_preprocessor_config_json_file: pathlib.Path | None, out_dtype: type | str = np.float32) numpy.ndarray