Cloud Storage
Cloud storage module for GLLM training.
This module provides functionality for storing model artifacts in various cloud storage providers such as AWS S3, Google Cloud Storage, and Azure Blob Storage.
Reviewer
- Muhammad Afif Al Hawari (muhammad.a.a.hawari@gdplabs.id)
References
NONE
CloudStorageClient(config)
Bases: ABC
Abstract base class for cloud storage clients.
This class defines the interface that all cloud storage implementations must follow. It provides methods for uploading and downloading files from cloud storage services.
Initialize the cloud storage client with provider-specific configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config |
Dict[str, str]
|
Configuration parameters for the storage client. The exact keys will depend on the specific cloud provider implementation. |
required |
delete_file(remote_path)
abstractmethod
Delete a file from cloud storage.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
remote_path |
str
|
Path to the file in cloud storage. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
bool |
bool
|
True if the file was deleted successfully, False otherwise. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the remote file does not exist. |
RuntimeError
|
If the deletion fails. |
download_file(remote_path, local_path)
abstractmethod
Download a file from cloud storage.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
remote_path |
str
|
Path to the file in cloud storage. |
required |
local_path |
Union[str, Path]
|
Path where the downloaded file should be saved. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Path |
Path
|
Path to the downloaded file. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the remote file does not exist. |
RuntimeError
|
If the download fails. |
get_storage_client(provider, config)
staticmethod
Factory method to get the appropriate storage client.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
provider |
str
|
The storage provider to use (e.g., "s3", "gcs", "azure"). |
required |
config |
Dict[str, str]
|
Configuration for the storage client. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
CloudStorageClient |
CloudStorageClient
|
The storage client for the specified provider. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the provider is not supported. |
upload_file(local_path, remote_path)
abstractmethod
Upload a file to cloud storage.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
local_path |
Union[str, Path]
|
Path to the local file to upload. |
required |
remote_path |
str
|
Path in the cloud storage where the file should be saved. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
URL or identifier of the uploaded file in the cloud storage. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the local file does not exist. |
RuntimeError
|
If the upload fails. |
CloudStorageRegistry
Registry for available cloud storage clients.
get_client(provider, config)
classmethod
Get an instantiated storage client by provider name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
provider |
str
|
Name of the storage provider. |
required |
config |
Dict[str, str]
|
Configuration parameters for the storage client. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
CloudStorageClient |
CloudStorageClient
|
An instance of the storage client. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the provider name is not recognized. |
list_available_providers()
classmethod
List all registered storage provider names.
Returns:
| Type | Description |
|---|---|
list[str]
|
list[str]: A list of storage provider names. |
register(provider, client_class)
classmethod
Register a cloud storage client class.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
provider |
str
|
Identifier for the storage provider (e.g., "s3", "gcs", "azure"). |
required |
client_class |
Type[CloudStorageClient]
|
The storage client class. |
required |
S3StorageClient(config)
Bases: CloudStorageClient
AWS S3 and S3-compatible storage client implementation.
This class implements the CloudStorageClient interface for AWS S3 and other S3-compatible storage providers like MinIO.
Initialize the S3 storage client.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config |
Dict[str, str]
|
S3 configuration parameters: - bucket_name: Name of the S3 bucket. - prefix: Optional prefix (folder) within the bucket. - access_key_id: AWS access key REQUIRED. - secret_access_key: AWS secret key REQUIRED. - region: AWS region REQUIRED. - endpoint_url: Optional endpoint URL for S3-compatible services. |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If required configuration parameters are missing. |
delete_file(remote_path)
Delete a file from S3 storage.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
remote_path |
str
|
Path to the file in S3. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
bool |
bool
|
True if the file was deleted successfully. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the remote file does not exist. |
RuntimeError
|
If the deletion fails. |
download_file(remote_path, local_path)
Download a file from S3 storage.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
remote_path |
str
|
Path to the file in S3. |
required |
local_path |
Union[str, Path]
|
Path where the downloaded file should be saved. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Path |
Path
|
Path to the downloaded file. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the remote file does not exist. |
RuntimeError
|
If the download fails. |
upload_file(local_path, remote_path)
Upload a file to S3 storage.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
local_path |
Union[str, Path]
|
Path to the local file to upload. |
required |
remote_path |
str
|
Path in S3 where the file should be saved. If a relative path is provided, it will be appended to the prefix. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
S3 URI for the uploaded file (s3://{bucket_name}/{full_remote_path}). |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the local file does not exist. |
RuntimeError
|
If the upload fails. |
get_storage_client(provider, config)
Get a storage client instance by provider type.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
provider |
str
|
The type of storage provider. |
required |
config |
Dict[str, str]
|
Configuration parameters for the storage client. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
CloudStorageClient |
CloudStorageClient
|
A storage client instance. |