Skip to content

Schema

Schema package for gllm_training.

This module re-exports all schema components from various modules for backward compatibility.

Authors
  • Alfan Dinda Rahmawan (alfan.d.rahmawan@gdplabs.id)
Reviewer
  • Muhammad Afif Al Hawari (muhammad.a.a.hawari@gdplabs.id)
References

NONE

ColumnMappingConfig

Bases: BaseModel

Defines the structure for column mapping configuration.

Attributes:

Name Type Description
input_columns Dict[str, str]

Mapping of template variables to DataFrame column names.

output_columns List[str]

List of column names for output.

label_columns Optional[str]

Column name for labels.

ExperimentConfig

Bases: BaseModel

Defines the configuration for a single fine-tuning experiment.

Attributes:

Name Type Description
experiment_id str

The ID of the experiment.

hyperparameters Hyperparameters

The hyperparameters for the experiment.

hyperparameters_id str

The ID of the hyperparameters.

topic str

The topic of the experiment.

model_name str

The name of the model.

framework FinetuningLibraryTypes

The fine-tuning framework to use.

multimodal bool

Whether the model is multimodal.

datasets_path str | None

The path to the datasets directory.

train_filename str | None

CSV filename for training data.

validation_filename str | None

CSV filename for validation data.

prompt_filename str | None

CSV filename for prompt templates.

spreadsheet_id str | None

The ID of the Google Sheets spreadsheet.

google_sheets_client_email str | None

Google Sheets client email.

google_sheets_private_key str | None

Google Sheets private key.

google_scopes list[str] | None

Google scopes for service account.

google_token_uri str | None

Google token URI for service account.

train_sheet str

The name of the training sheet.

validation_sheet str | None

The name of the validation sheet.

prompt_sheet str

The name of the prompt sheet.

prompt_name str | None

The name of the prompt.

dataset_text_field str

The text field in the dataset.

column_mapping_config dict[str, Any] | None

Custom configuration for column mappings.

save_processed_dataset bool

Whether to save the processed dataset.

output_processed_dir str

The output directory for the processed dataset.

model_path str | None

The path to the fine-tuned adapter or full model.

FinetunedHyperparameters

Bases: BaseModel

Defines the hyperparameters for a fine-tuning experiment.

Attributes:

Name Type Description
hyperparameters_id str

The ID of the hyperparameters.

model_settings ModelConfig

The model configuration.

lora_config LoraConfig

The LoRA configuration.

training_config TrainingConfig

The training configuration.

storage_config StorageConfig

The storage configuration.

FrameworkType

Bases: StrEnum

Defines valid fine-tuning frameworks.

GoogleSheetsAuthentication

Bases: BaseModel

Defines valid authentication parameters for Google Sheets API interaction.

Attributes:

Name Type Description
spreadsheet_id str

The ID of the Google Sheets spreadsheet.

client_email str

The client email for the Google Sheets API.

private_key str

The private key for the Google Sheets API.

google_token_uri str

The Google token URI for the Google Sheets API.

StorageConfig

Bases: BaseModel

Defines the storage configuration for model uploads.

Attributes:

Name Type Description
bucket_name str

The name of the bucket.

upload_to_cloud bool

Whether to upload the model to the cloud.

object_prefix str

The object prefix in the bucket.

endpoint_url str | None

The complete URL to use for the storage client.

access_key_id str | None

The access key ID for cloud storage account.

secret_access_key str | None

The secret access key for cloud storage account.

region str | None

The region to use when creating the storage client.

provider str

The cloud provider.