Utils
Utility functions for multimodal modules.
combine_strings(texts)
Combine multiple strings into a single string with newline separators.
This function takes a list of strings and returns a single string where each string from the list is on a new line. It filters out any empty or whitespace-only strings from the list before joining them.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
texts |
list[str]
|
A list of strings to combine. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
A single string containing all valid strings, where each string is on a new line. |
get_file_from_file_path(source)
Read image file and return its binary data if valid.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
source |
str
|
Path to the image file. |
required |
Returns:
Type | Description |
---|---|
bytes | None
|
bytes | None: Binary data of the image file if valid, None otherwise. |
get_file_from_gdrive(gdrive_source)
Download a file from Google Drive given a file ID or URL.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
gdrive_source |
str
|
Google Drive file ID or full URL. |
required |
Returns:
Type | Description |
---|---|
bytes | None
|
bytes | None: The file's binary content if successful, None otherwise. |
get_file_from_s3(url)
Get file from S3 bucket.
This function attempts to get a file from an S3 bucket using: 1. Default credentials (from AWS CLI or instance profile). 2. Environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY). 3. Session token if available.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
url |
str
|
The S3 URL to get the file from. Can be in the format: 1. s3://bucket/key. 2. https://bucket.s3.amazonaws.com/key. |
required |
Returns:
Type | Description |
---|---|
bytes | None
|
bytes | None: The file contents if successful, None otherwise |
Raises:
Type | Description |
---|---|
ValueError
|
If AWS credentials are not found or invalid. |
get_file_from_url(file_source, timeout=30, session=None)
async
Asynchronously download and validate an image from a URL.
This function performs the following steps: 1. Attempts to download the content from the provided URL. 2. Validates that the downloaded content is a valid image. 3. Returns the binary data if both steps succeed.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_source |
str
|
The URL of the file to download. Supports HTTP and HTTPS protocols. |
required |
timeout |
int
|
The timeout for the HTTP request in seconds. Defaults to 30 seconds. |
30
|
session |
Optional[ClientSession]
|
An existing aiohttp session to use. If None, a new session will be created. Defaults to None. |
None
|
Returns:
Type | Description |
---|---|
bytes | None
|
bytes | None: The downloaded image binary data if successful and valid, None if the download fails or the content is not a valid image. |
get_image_metadata(image_binary)
Extract metadata from image binary data.
This function extracts metadata from the image, including: 1. GPS coordinates (latitude/longitude) if available in EXIF data
Parameters:
Name | Type | Description | Default |
---|---|---|---|
image_binary |
bytes
|
The binary data of the image. |
required |
Returns:
Type | Description |
---|---|
dict[str, Any]
|
dict[str, Any]: Dictionary containing image metadata. |
get_unique_non_empty_strings(texts)
Get unique non-empty strings from a list of strings and remove whitespace.
This function takes a list of strings and returns a list of strings where each string from the list is not empty or whitespace-only. It also removes duplicates.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
texts |
list[str]
|
A list of strings to combine. |
required |
Returns:
Type | Description |
---|---|
list[str]
|
list[str]: A list of strings where each string is not empty or whitespace-only. |