Skip to content

Utils

Utility functions for multimodal modules.

combine_strings(texts)

Combine multiple strings into a single string with newline separators.

This function takes a list of strings and returns a single string where each string from the list is on a new line. It filters out any empty or whitespace-only strings from the list before joining them.

Parameters:

Name Type Description Default
texts list[str]

A list of strings to combine.

required

Returns:

Name Type Description
str str

A single string containing all valid strings, where each string is on a new line.

get_file_from_file_path(source)

Read image file and return its binary data if valid.

Parameters:

Name Type Description Default
source str

Path to the image file.

required

Returns:

Type Description
bytes | None

bytes | None: Binary data of the image file if valid, None otherwise.

get_file_from_gdrive(gdrive_source)

Download a file from Google Drive given a file ID or URL.

Parameters:

Name Type Description Default
gdrive_source str

Google Drive file ID or full URL.

required

Returns:

Type Description
bytes | None

bytes | None: The file's binary content if successful, None otherwise.

get_file_from_s3(url)

Get file from S3 bucket.

This function attempts to get a file from an S3 bucket using: 1. Default credentials (from AWS CLI or instance profile). 2. Environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY). 3. Session token if available.

Parameters:

Name Type Description Default
url str

The S3 URL to get the file from. Can be in the format: 1. s3://bucket/key. 2. https://bucket.s3.amazonaws.com/key.

required

Returns:

Type Description
bytes | None

bytes | None: The file contents if successful, None otherwise

Raises:

Type Description
ValueError

If AWS credentials are not found or invalid.

get_file_from_url(file_source, timeout=30, session=None) async

Asynchronously download and validate an image from a URL.

This function performs the following steps: 1. Attempts to download the content from the provided URL. 2. Validates that the downloaded content is a valid image. 3. Returns the binary data if both steps succeed.

Parameters:

Name Type Description Default
file_source str

The URL of the file to download. Supports HTTP and HTTPS protocols.

required
timeout int

The timeout for the HTTP request in seconds. Defaults to 30 seconds.

30
session Optional[ClientSession]

An existing aiohttp session to use. If None, a new session will be created. Defaults to None.

None

Returns:

Type Description
bytes | None

bytes | None: The downloaded image binary data if successful and valid, None if the download fails or the content is not a valid image.

get_image_metadata(image_binary)

Extract metadata from image binary data.

This function extracts metadata from the image, including: 1. GPS coordinates (latitude/longitude) if available in EXIF data

Parameters:

Name Type Description Default
image_binary bytes

The binary data of the image.

required

Returns:

Type Description
dict[str, Any]

dict[str, Any]: Dictionary containing image metadata.

get_unique_non_empty_strings(texts)

Get unique non-empty strings from a list of strings and remove whitespace.

This function takes a list of strings and returns a list of strings where each string from the list is not empty or whitespace-only. It also removes duplicates.

Parameters:

Name Type Description Default
texts list[str]

A list of strings to combine.

required

Returns:

Type Description
list[str]

list[str]: A list of strings where each string is not empty or whitespace-only.