sima_utils.transformer.vision_preprocessor

Attributes

PILImageResampling

IMAGE_RGB_STATS

vm_type

Classes

ImageProcessor

Image processor for CLIP and SigLIP vision model.

Module Contents

sima_utils.transformer.vision_preprocessor.PILImageResampling
sima_utils.transformer.vision_preprocessor.IMAGE_RGB_STATS
class sima_utils.transformer.vision_preprocessor.ImageProcessor(model_type: str, target_size: int)

Image processor for CLIP and SigLIP vision model.

model_type

The type of vision model, β€œclip” or β€œsiglip”.

image_size

The target image size for the vision model.

keep_aspect

If true, keep aspect ratio by squaring before resize.

image_mean

The mean of RGB images used in model training.

image_std

The std-dev of RGB images used in model training.

resample

The method of resampling used to resize an image.

model_type: str
image_size: tuple[int, int]
keep_aspect: bool
image_mean: list[float]
image_std: list[float]
resample: PILImageResampling
load_image_from_file(image_files: list[str])
expand2square(pil_img: PIL.Image.Image)
preprocess(images: list[PIL.Image.Image], channel_first: bool = True) list[numpy.ndarray]

Preprocess a list of images as input to a vision model.

Parameters:
  • images – A list of RGB images.

  • channel_first – A flag to output CHW if true, or HWC if false.

Returns:

A list of processed images as numpy arrays.

sima_utils.transformer.vision_preprocessor.vm_type