Skip to content

Dpo Trainer

Direct Preference Optimization (DPO) trainer module for GLLM training.

This module provides classes and utilities for direct preference optimization (DPO) of language models, including training orchestration and trainer abstractions for preference-based learning.

DPOTrainer(model_name, datasets_path=None, train_filename=None, validation_filename=None, prompt_filename=None, spreadsheet_id=None, train_sheet=None, validation_sheet=None, prompt_sheet=None, prompt_name=None, experiment_id=None, hyperparameters=None, storage_config=None, resume_from_checkpoint=None, topic=None, framework=None, multimodal=None, dpo_column_mapping_config=None, save_processed_dataset=True, output_processed_dir=None)

Bases: BaseTrainer

A class to orchestrate Direct Preference Optimization (DPO) training.

This class provides an end-to-end workflow for Direct Preference Optimization (DPO) fine-tuning, including data loading, model initialization, training execution, and model artifact management. It supports both CSV and Google Sheets data sources, handles checkpoint resumption, and provides cloud storage integration for saving trained models.

Attributes:

Name Type Description
components

DPOComponents instance containing model, tokenizer, and trainer.

validator

ComponentsValidator instance for validating training components.

experiment_args

ExperimentConfig instance containing experiment configuration.

hyperparameters

FinetunedHyperparameters instance containing training hyperparameters.

storage_args

StorageConfig instance containing storage configuration.

dpo_column_mapping_config

DPOColumnMappingConfig instance for column mapping.

resume_from_checkpoint

Whether to resume training from a checkpoint.

Initialize the DPOTrainer.

Parameters:

Name Type Description Default
model_name str

Name of the model to use for training.

required
datasets_path str | None

Path to the datasets directory. Defaults to None.

None
train_filename str | None

Filename for training data. Defaults to None.

None
validation_filename str | None

Filename for validation data. Defaults to None.

None
prompt_filename str | None

Filename for prompt data. Defaults to None.

None
spreadsheet_id str | None

Google Sheets spreadsheet ID. Defaults to None.

None
train_sheet str | None

Name of the training sheet. Defaults to None.

None
validation_sheet str | None

Name of the validation sheet. Defaults to None.

None
prompt_sheet str | None

Name of the prompt sheet. Defaults to None.

None
prompt_name str | None

Name of the prompt template. Defaults to None.

None
experiment_id str | None

Unique identifier for the experiment. Defaults to None.

None
hyperparameters FinetunedHyperparameters | None

Hyperparameters configuration. Defaults to None.

None
storage_config StorageConfig | None

Storage configuration. Defaults to None.

None
resume_from_checkpoint bool | None

Whether to resume from checkpoint. Defaults to None.

None
topic str | None

Topic of the experiment. Defaults to None.

None
grpo_ability

GRPO ability configuration. Defaults to None.

required
framework str | None

Fine-tuning framework to use. Defaults to None.

None
multimodal bool | None

Whether the model is multimodal. Defaults to None.

None
dpo_column_mapping_config DPOColumnMappingConfig | None

DPO column mapping configuration. Defaults to None.

None
save_processed_dataset bool

Whether to save processed dataset. Defaults to True.

True
output_processed_dir str | None

Output directory for processed dataset. Defaults to None.

None

save_existing_model(model_path)

Save an existing model.

Parameters:

Name Type Description Default
model_path str

Path to the model directory.

required

save_model(results=None, model_path=None)

Saves model artifacts, handling both trained and existing models.

This method provides a unified interface for: 1. Saving a newly trained model (when results is provided) 2. Uploading an existing model (when model_path is provided)

Parameters:

Name Type Description Default
results dict[str, Any] | None

Training results for newly trained models. Defaults to None.

None
model_path str | None

Path to the model directory. Defaults to None.

None

Returns:

Name Type Description
str str

Path where the model was saved or uploaded.

Raises:

Type Description
ValueError

If neither model_path nor results is provided, or if configuration is invalid.

RuntimeError

If the save operation fails.

train()

Runs the complete DPO fine-tuning process.

This method orchestrates the entire workflow, initializing components, running the DPO training loop, and saving the results.

Returns:

Type Description
dict[str, Any]

dict[str, Any]: A dictionary containing DPO training metrics and results, such as DPO training loss, evaluation loss, and runtime.

Raises:

Type Description
ValueError

If required training parameters are missing or invalid.

RuntimeError

If components fail to initialize, the model or trainer is not set up correctly, or if the training process itself fails for any reason.