Grpo Trainer

Group-Oriented (GRPO) Trainer module for GLLM training.

This module contains classes and utilities for group-based training orchestration within the GLLM training framework.

GRPOTrainer(model_name, datasets_path=None, train_filename=None, validation_filename=None, prompt_filename=None, spreadsheet_id=None, train_sheet=None, validation_sheet=None, prompt_sheet=None, prompt_name=None, reward_functions=None, experiment_id=None, hyperparameters=None, storage_config=None, resume_from_checkpoint=None, topic=None, grpo_ability=None, framework=None, multimodal=None, grpo_column_mapping_config=None, save_processed_dataset=True, output_processed_dir=None, dataset_text_field=None, vllm_sampling_params=None)

Bases: BaseTrainer

A class to orchestrate Group Relative Policy Optimization (GRPO).

This class contains the main logic to orchestrate Group Relative Policy Optimization (GRPO) along with supporting functionality for creating and managing the Group Relative Policy Optimization (GRPO) process.

Initialize the GRPOTrainer.

Parameters:

Name	Type	Description	Default
`model_name`	`str`	Name of the model to use for training.	required
`datasets_path`	`str \| None`	Path to the datasets directory. Defaults to None.	`None`
`train_filename`	`str \| None`	Filename for training data. Defaults to None.	`None`
`validation_filename`	`str \| None`	Filename for validation data. Defaults to None.	`None`
`prompt_filename`	`str \| None`	Filename for prompt data. Defaults to None.	`None`
`spreadsheet_id`	`str \| None`	Google Sheets spreadsheet ID. Defaults to None.	`None`
`train_sheet`	`str \| None`	Name of the training sheet. Defaults to None.	`None`
`validation_sheet`	`str \| None`	Name of the validation sheet. Defaults to None.	`None`
`prompt_sheet`	`str \| None`	Name of the prompt sheet. Defaults to None.	`None`
`prompt_name`	`str \| None`	Name of the prompt template. Defaults to None.	`None`
`reward_functions`	`list[Callable[[list[Any], Any], list[float]]] \| None`	Reward functions configuration. Defaults to None.	`None`
`experiment_id`	`str \| None`	Unique identifier for the experiment. Defaults to None.	`None`
`hyperparameters`	`FinetunedHyperparameters \| None`	Hyperparameters configuration. Defaults to None.	`None`
`storage_config`	`StorageConfig \| None`	Storage configuration. Defaults to None.	`None`
`resume_from_checkpoint`	`bool \| None`	Whether to resume from checkpoint. Defaults to None.	`None`
`topic`	`str \| None`	Topic of the experiment. Defaults to None.	`None`
`grpo_ability`	`str \| None`	GRPO ability configuration. Defaults to None.	`None`
`framework`	`str \| None`	Fine-tuning framework to use. Defaults to None.	`None`
`multimodal`	`bool \| None`	Whether the model is multimodal. Defaults to None.	`None`
`grpo_column_mapping_config`	`GRPOColumnMappingConfig \| None`	Column mapping configuration. Defaults to None.	`None`
`save_processed_dataset`	`bool`	Whether to save processed dataset. Defaults to True.	`True`
`output_processed_dir`	`str \| None`	Output directory for processed dataset. Defaults to None.	`None`
`dataset_text_field`	`str \| None`	Text field name in dataset. Defaults to None.	`None`
`vllm_sampling_params`	`VllmSamplingParamsConfig \| None`	VLLM sampling parameters. Defaults to None.	`None`

`save_existing_model(model_path)`

Save an existing model directory or zip artifact.

Parameters:

Name	Type	Description	Default
`model_path`	`str`	Path to the existing model directory or zip artifact.	required

Returns:

Name	Type	Description
`str`		Local path or remote URL where the artifact was saved/uploaded.

Raises:

Type	Description
`FileNotFoundError`	If the model path does not exist.
`ValueError`	If the path is not a valid adapter directory or zip artifact.
`PermissionError`	If the artifact is not readable.
`FileExistsError`	If the remote artifact already exists.
`RuntimeError`	If the upload fails.

`save_model(results=None, model_path=None)`

Saves model artifacts, handling both trained and existing models.

This method provides a unified interface for: 1. Saving a newly trained model (when results is provided) 2. Uploading an existing model (when model_path is provided)

Parameters:

Name	Type	Description	Default
`results`	`dict`	Training results for newly trained models.	`None`
`model_path`	`str`	Path to the model directory.	`None`

Returns:

Name	Type	Description
`str`		Path where the model was saved or uploaded.

Raises:

Type	Description
`ValueError`	If neither model_path nor results is provided, or if configuration is invalid.
`RuntimeError`	If the save operation fails.

`train()`

Runs the complete grpo fine tuning process.

This method orchestrates the entire workflow, initializing components, running the grpo training loop, and saving the results.

Returns:

Type	Description
`dict[str, Any]`	dict[str, Any]: A dictionary containing grpo training metrics and results, such as grpo training loss, evaluation loss, and runtime.

Raises:

Type	Description
`ValueError`	If required training parameters are missing or invalid.
`RuntimeError`	If components fail to initialize, the model or trainer is not set up correctly, or if the training process itself fails for any reason.