Skip to content

Grpo Trainer

Group-Oriented (GRPO) Trainer module for GLLM training.

This module contains classes and utilities for group-based training orchestration within the GLLM training framework.

GRPOTrainer(model_name, datasets_path=None, train_filename=None, validation_filename=None, prompt_filename=None, spreadsheet_id=None, train_sheet=None, validation_sheet=None, prompt_sheet=None, prompt_name=None, reward_functions=None, experiment_id=None, hyperparameters=None, storage_config=None, resume_from_checkpoint=None, topic=None, grpo_ability=None, framework=None, multimodal=None, grpo_column_mapping_config=None, save_processed_dataset=True, output_processed_dir=None, dataset_text_field=None, vllm_sampling_params=None)

Bases: BaseTrainer

A class to orchestrate Group Relative Policy Optimization (GRPO).

This class contains the main logic to orchestrate Group Relative Policy Optimization (GRPO) along with supporting functionality for creating and managing the Group Relative Policy Optimization (GRPO) process.

Initialize the GRPOTrainer.

Parameters:

Name Type Description Default
model_name str

Name of the model to use for training.

required
datasets_path str | None

Path to the datasets directory. Defaults to None.

None
train_filename str | None

Filename for training data. Defaults to None.

None
validation_filename str | None

Filename for validation data. Defaults to None.

None
prompt_filename str | None

Filename for prompt data. Defaults to None.

None
spreadsheet_id str | None

Google Sheets spreadsheet ID. Defaults to None.

None
train_sheet str | None

Name of the training sheet. Defaults to None.

None
validation_sheet str | None

Name of the validation sheet. Defaults to None.

None
prompt_sheet str | None

Name of the prompt sheet. Defaults to None.

None
prompt_name str | None

Name of the prompt template. Defaults to None.

None
reward_functions list[Callable[[list[Any], Any], list[float]]] | None

Reward functions configuration. Defaults to None.

None
experiment_id str | None

Unique identifier for the experiment. Defaults to None.

None
hyperparameters FinetunedHyperparameters | None

Hyperparameters configuration. Defaults to None.

None
storage_config StorageConfig | None

Storage configuration. Defaults to None.

None
resume_from_checkpoint bool | None

Whether to resume from checkpoint. Defaults to None.

None
topic str | None

Topic of the experiment. Defaults to None.

None
grpo_ability str | None

GRPO ability configuration. Defaults to None.

None
framework str | None

Fine-tuning framework to use. Defaults to None.

None
multimodal bool | None

Whether the model is multimodal. Defaults to None.

None
grpo_column_mapping_config GRPOColumnMappingConfig | None

Column mapping configuration. Defaults to None.

None
save_processed_dataset bool

Whether to save processed dataset. Defaults to True.

True
output_processed_dir str | None

Output directory for processed dataset. Defaults to None.

None
dataset_text_field str | None

Text field name in dataset. Defaults to None.

None
vllm_sampling_params VllmSamplingParamsConfig | None

VLLM sampling parameters. Defaults to None.

None

save_existing_model(model_path)

Save an existing model.

Parameters:

Name Type Description Default
model_path str

Path to the model directory.

required

save_model(results=None, model_path=None)

Saves model artifacts, handling both trained and existing models.

This method provides a unified interface for: 1. Saving a newly trained model (when results is provided) 2. Uploading an existing model (when model_path is provided)

Parameters:

Name Type Description Default
results dict

Training results for newly trained models.

None
model_path str

Path to the model directory.

None

Returns:

Name Type Description
str

Path where the model was saved or uploaded.

Raises:

Type Description
ValueError

If neither model_path nor results is provided, or if configuration is invalid.

RuntimeError

If the save operation fails.

train()

Runs the complete grpo fine tuning process.

This method orchestrates the entire workflow, initializing components, running the grpo training loop, and saving the results.

Returns:

Type Description
dict[str, Any]

dict[str, Any]: A dictionary containing grpo training metrics and results, such as grpo training loss, evaluation loss, and runtime.

Raises:

Type Description
ValueError

If required training parameters are missing or invalid.

RuntimeError

If components fail to initialize, the model or trainer is not set up correctly, or if the training process itself fails for any reason.