Skip to content

Cloud Storage

Cloud storage module for GLLM training.

This module provides functionality for storing model artifacts in various cloud storage providers such as AWS S3, Google Cloud Storage, and Azure Blob Storage.

Authors
  • Alfan Dinda Rahmawan (alfan.d.rahmawan@gdplabs.id)
Reviewer
  • Muhammad Afif Al Hawari (muhammad.a.a.hawari@gdplabs.id)
References

NONE

CloudStorageClient(config)

Bases: ABC

Abstract base class for cloud storage clients.

This class defines the interface that all cloud storage implementations must follow. It provides methods for uploading and downloading files from cloud storage services.

Initialize the cloud storage client with provider-specific configuration.

Parameters:

Name Type Description Default
config Dict[str, str]

Configuration parameters for the storage client. The exact keys will depend on the specific cloud provider implementation.

required

delete_file(remote_path) abstractmethod

Delete a file from cloud storage.

Parameters:

Name Type Description Default
remote_path str

Path to the file in cloud storage.

required

Returns:

Name Type Description
bool bool

True if the file was deleted successfully, False otherwise.

Raises:

Type Description
FileNotFoundError

If the remote file does not exist.

RuntimeError

If the deletion fails.

download_file(remote_path, local_path) abstractmethod

Download a file from cloud storage.

Parameters:

Name Type Description Default
remote_path str

Path to the file in cloud storage.

required
local_path Union[str, Path]

Path where the downloaded file should be saved.

required

Returns:

Name Type Description
Path Path

Path to the downloaded file.

Raises:

Type Description
FileNotFoundError

If the remote file does not exist.

RuntimeError

If the download fails.

get_storage_client(provider, config) staticmethod

Factory method to get the appropriate storage client.

Parameters:

Name Type Description Default
provider str

The storage provider to use (e.g., "s3", "gcs", "azure").

required
config Dict[str, str]

Configuration for the storage client.

required

Returns:

Name Type Description
CloudStorageClient CloudStorageClient

The storage client for the specified provider.

Raises:

Type Description
ValueError

If the provider is not supported.

upload_file(local_path, remote_path) abstractmethod

Upload a file to cloud storage.

Parameters:

Name Type Description Default
local_path Union[str, Path]

Path to the local file to upload.

required
remote_path str

Path in the cloud storage where the file should be saved.

required

Returns:

Name Type Description
str str

URL or identifier of the uploaded file in the cloud storage.

Raises:

Type Description
FileNotFoundError

If the local file does not exist.

RuntimeError

If the upload fails.

CloudStorageRegistry

Registry for available cloud storage clients.

get_client(provider, config) classmethod

Get an instantiated storage client by provider name.

Parameters:

Name Type Description Default
provider str

Name of the storage provider.

required
config Dict[str, str]

Configuration parameters for the storage client.

required

Returns:

Name Type Description
CloudStorageClient CloudStorageClient

An instance of the storage client.

Raises:

Type Description
ValueError

If the provider name is not recognized.

list_available_providers() classmethod

List all registered storage provider names.

Returns:

Type Description
list[str]

list[str]: A list of storage provider names.

register(provider, client_class) classmethod

Register a cloud storage client class.

Parameters:

Name Type Description Default
provider str

Identifier for the storage provider (e.g., "s3", "gcs", "azure").

required
client_class Type[CloudStorageClient]

The storage client class.

required

S3StorageClient(config)

Bases: CloudStorageClient

AWS S3 and S3-compatible storage client implementation.

This class implements the CloudStorageClient interface for AWS S3 and other S3-compatible storage providers like MinIO.

Initialize the S3 storage client.

Parameters:

Name Type Description Default
config Dict[str, str]

S3 configuration parameters: - bucket_name: Name of the S3 bucket. - prefix: Optional prefix (folder) within the bucket. - access_key_id: AWS access key REQUIRED. - secret_access_key: AWS secret key REQUIRED. - region: AWS region REQUIRED. - endpoint_url: Optional endpoint URL for S3-compatible services.

required

Raises:

Type Description
ValueError

If required configuration parameters are missing.

delete_file(remote_path)

Delete a file from S3 storage.

Parameters:

Name Type Description Default
remote_path str

Path to the file in S3.

required

Returns:

Name Type Description
bool bool

True if the file was deleted successfully.

Raises:

Type Description
FileNotFoundError

If the remote file does not exist.

RuntimeError

If the deletion fails.

download_file(remote_path, local_path)

Download a file from S3 storage.

Parameters:

Name Type Description Default
remote_path str

Path to the file in S3.

required
local_path Union[str, Path]

Path where the downloaded file should be saved.

required

Returns:

Name Type Description
Path Path

Path to the downloaded file.

Raises:

Type Description
FileNotFoundError

If the remote file does not exist.

RuntimeError

If the download fails.

upload_file(local_path, remote_path)

Upload a file to S3 storage.

Parameters:

Name Type Description Default
local_path Union[str, Path]

Path to the local file to upload.

required
remote_path str

Path in S3 where the file should be saved. If a relative path is provided, it will be appended to the prefix.

required

Returns:

Name Type Description
str str

S3 URI for the uploaded file (s3://{bucket_name}/{full_remote_path}).

Raises:

Type Description
FileNotFoundError

If the local file does not exist.

RuntimeError

If the upload fails.

get_storage_client(provider, config)

Get a storage client instance by provider type.

Parameters:

Name Type Description Default
provider str

The type of storage provider.

required
config Dict[str, str]

Configuration parameters for the storage client.

required

Returns:

Name Type Description
CloudStorageClient CloudStorageClient

A storage client instance.