Dpo Trainer

Direct Preference Optimization (DPO) trainer module for GLLM training.

This module provides classes and utilities for direct preference optimization (DPO) of language models, including training orchestration and trainer abstractions for preference-based learning.

`DPOTrainer(model_name, datasets_path=None, train_filename=None, validation_filename=None, prompt_filename=None, spreadsheet_id=None, train_sheet=None, validation_sheet=None, prompt_sheet=None, prompt_name=None, experiment_id=None, hyperparameters=None, storage_config=None, resume_from_checkpoint=None, topic=None, framework=None, multimodal=None, dpo_column_mapping_config=None, save_processed_dataset=True, output_processed_dir=None)`

Bases: BaseTrainer

A class to orchestrate Direct Preference Optimization (DPO) training.

This class provides an end-to-end workflow for Direct Preference Optimization (DPO) fine-tuning, including data loading, model initialization, training execution, and model artifact management. It supports both CSV and Google Sheets data sources, handles checkpoint resumption, and provides cloud storage integration for saving trained models.

Attributes:

Name	Type	Description
`components`		DPOComponents instance containing model, tokenizer, and trainer.
`validator`		ComponentsValidator instance for validating training components.
`experiment_args`		ExperimentConfig instance containing experiment configuration.
`hyperparameters`		FinetunedHyperparameters instance containing training hyperparameters.
`storage_args`		StorageConfig instance containing storage configuration.
`dpo_column_mapping_config`		DPOColumnMappingConfig instance for column mapping.
`resume_from_checkpoint`		Whether to resume training from a checkpoint.

Initialize the DPOTrainer.

Parameters:

Name	Type	Description	Default
`model_name`	`str`	Name of the model to use for training.	required
`datasets_path`	`str \| None`	Path to the datasets directory. Defaults to None.	`None`
`train_filename`	`str \| None`	Filename for training data. Defaults to None.	`None`
`validation_filename`	`str \| None`	Filename for validation data. Defaults to None.	`None`
`prompt_filename`	`str \| None`	Filename for prompt data. Defaults to None.	`None`
`spreadsheet_id`	`str \| None`	Google Sheets spreadsheet ID. Defaults to None.	`None`
`train_sheet`	`str \| None`	Name of the training sheet. Defaults to None.	`None`
`validation_sheet`	`str \| None`	Name of the validation sheet. Defaults to None.	`None`
`prompt_sheet`	`str \| None`	Name of the prompt sheet. Defaults to None.	`None`
`prompt_name`	`str \| None`	Name of the prompt template. Defaults to None.	`None`
`experiment_id`	`str \| None`	Unique identifier for the experiment. Defaults to None.	`None`
`hyperparameters`	`FinetunedHyperparameters \| None`	Hyperparameters configuration. Defaults to None.	`None`
`storage_config`	`StorageConfig \| None`	Storage configuration. Defaults to None.	`None`
`resume_from_checkpoint`	`bool \| None`	Whether to resume from checkpoint. Defaults to None.	`None`
`topic`	`str \| None`	Topic of the experiment. Defaults to None.	`None`
`grpo_ability`		GRPO ability configuration. Defaults to None.	required
`framework`	`str \| None`	Fine-tuning framework to use. Defaults to None.	`None`
`multimodal`	`bool \| None`	Whether the model is multimodal. Defaults to None.	`None`
`dpo_column_mapping_config`	`DPOColumnMappingConfig \| None`	DPO column mapping configuration. Defaults to None.	`None`
`save_processed_dataset`	`bool`	Whether to save processed dataset. Defaults to True.	`True`
`output_processed_dir`	`str \| None`	Output directory for processed dataset. Defaults to None.	`None`

`save_existing_model(model_path)`

Save an existing model.

Parameters:

Name	Type	Description	Default
`model_path`	`str`	Path to the model directory.	required

`save_model(results=None, model_path=None)`

Saves model artifacts, handling both trained and existing models.

This method provides a unified interface for: 1. Saving a newly trained model (when results is provided) 2. Uploading an existing model (when model_path is provided)

Parameters:

Name	Type	Description	Default
`results`	`dict[str, Any] \| None`	Training results for newly trained models. Defaults to None.	`None`
`model_path`	`str \| None`	Path to the model directory. Defaults to None.	`None`

Returns:

Name	Type	Description
`str`	`str`	Path where the model was saved or uploaded.

Raises:

Type	Description
`ValueError`	If neither model_path nor results is provided, or if configuration is invalid.
`RuntimeError`	If the save operation fails.

`train()`

Runs the complete DPO fine-tuning process.

This method orchestrates the entire workflow, initializing components, running the DPO training loop, and saving the results.

Returns:

Type	Description
`dict[str, Any]`	dict[str, Any]: A dictionary containing DPO training metrics and results, such as DPO training loss, evaluation loss, and runtime.

Raises:

Type	Description
`ValueError`	If required training parameters are missing or invalid.
`RuntimeError`	If components fail to initialize, the model or trainer is not set up correctly, or if the training process itself fails for any reason.