Dpo Trainer
Direct Preference Optimization (DPO) trainer module for GLLM training.
This module provides classes and utilities for direct preference optimization (DPO) of language models, including training orchestration and trainer abstractions for preference-based learning.
DPOTrainer(model_name, datasets_path=None, train_filename=None, validation_filename=None, prompt_filename=None, spreadsheet_id=None, train_sheet=None, validation_sheet=None, prompt_sheet=None, prompt_name=None, experiment_id=None, hyperparameters=None, storage_config=None, resume_from_checkpoint=None, topic=None, framework=None, multimodal=None, dpo_column_mapping_config=None, save_processed_dataset=True, output_processed_dir=None)
Bases: BaseTrainer
A class to orchestrate Direct Preference Optimization (DPO) training.
This class provides an end-to-end workflow for Direct Preference Optimization (DPO) fine-tuning, including data loading, model initialization, training execution, and model artifact management. It supports both CSV and Google Sheets data sources, handles checkpoint resumption, and provides cloud storage integration for saving trained models.
Attributes:
| Name | Type | Description |
|---|---|---|
components |
DPOComponents instance containing model, tokenizer, and trainer. |
|
validator |
ComponentsValidator instance for validating training components. |
|
experiment_args |
ExperimentConfig instance containing experiment configuration. |
|
hyperparameters |
FinetunedHyperparameters instance containing training hyperparameters. |
|
storage_args |
StorageConfig instance containing storage configuration. |
|
dpo_column_mapping_config |
DPOColumnMappingConfig instance for column mapping. |
|
resume_from_checkpoint |
Whether to resume training from a checkpoint. |
Initialize the DPOTrainer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_name
|
str
|
Name of the model to use for training. |
required |
datasets_path
|
str | None
|
Path to the datasets directory. Defaults to None. |
None
|
train_filename
|
str | None
|
Filename for training data. Defaults to None. |
None
|
validation_filename
|
str | None
|
Filename for validation data. Defaults to None. |
None
|
prompt_filename
|
str | None
|
Filename for prompt data. Defaults to None. |
None
|
spreadsheet_id
|
str | None
|
Google Sheets spreadsheet ID. Defaults to None. |
None
|
train_sheet
|
str | None
|
Name of the training sheet. Defaults to None. |
None
|
validation_sheet
|
str | None
|
Name of the validation sheet. Defaults to None. |
None
|
prompt_sheet
|
str | None
|
Name of the prompt sheet. Defaults to None. |
None
|
prompt_name
|
str | None
|
Name of the prompt template. Defaults to None. |
None
|
experiment_id
|
str | None
|
Unique identifier for the experiment. Defaults to None. |
None
|
hyperparameters
|
FinetunedHyperparameters | None
|
Hyperparameters configuration. Defaults to None. |
None
|
storage_config
|
StorageConfig | None
|
Storage configuration. Defaults to None. |
None
|
resume_from_checkpoint
|
bool | None
|
Whether to resume from checkpoint. Defaults to None. |
None
|
topic
|
str | None
|
Topic of the experiment. Defaults to None. |
None
|
grpo_ability
|
GRPO ability configuration. Defaults to None. |
required | |
framework
|
str | None
|
Fine-tuning framework to use. Defaults to None. |
None
|
multimodal
|
bool | None
|
Whether the model is multimodal. Defaults to None. |
None
|
dpo_column_mapping_config
|
DPOColumnMappingConfig | None
|
DPO column mapping configuration. Defaults to None. |
None
|
save_processed_dataset
|
bool
|
Whether to save processed dataset. Defaults to True. |
True
|
output_processed_dir
|
str | None
|
Output directory for processed dataset. Defaults to None. |
None
|
save_existing_model(model_path)
Save an existing model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_path
|
str
|
Path to the model directory. |
required |
save_model(results=None, model_path=None)
Saves model artifacts, handling both trained and existing models.
This method provides a unified interface for: 1. Saving a newly trained model (when results is provided) 2. Uploading an existing model (when model_path is provided)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
results
|
dict[str, Any] | None
|
Training results for newly trained models. Defaults to None. |
None
|
model_path
|
str | None
|
Path to the model directory. Defaults to None. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
Path where the model was saved or uploaded. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If neither model_path nor results is provided, or if configuration is invalid. |
RuntimeError
|
If the save operation fails. |
train()
Runs the complete DPO fine-tuning process.
This method orchestrates the entire workflow, initializing components, running the DPO training loop, and saving the results.
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: A dictionary containing DPO training metrics and results, such as DPO training loss, evaluation loss, and runtime. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If required training parameters are missing or invalid. |
RuntimeError
|
If components fail to initialize, the model or trainer is not set up correctly, or if the training process itself fails for any reason. |