Preprocessor
Preprocessor module for data preprocessing and template processing.
This module contains classes and utilities for preprocessing training data, including template formatting and prompt generation for language model training.
Reviewer
- Muhammad Afif Al Hawari (muhammad.a.a.hawari@gdplabs.id)
References
NONE
PromptPreprocessor(experiment_args)
Bases: BasePreprocessor
Preprocessor for handling and formatting prompt templates.
This class handles preprocessing of prompt templates, including variable substitution, format validation, and preparation for training. It works with system and user prompts and ensures they meet the requirements for model training.
Initialize the prompt preprocessor.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
experiment_args |
ExperimentConfig
|
Configuration for experiment. |
required |
column_mapping |
Configuration for mapping input columns to output columns. Defaults to None. |
required | |
system_prompt |
Template for system prompt. Defaults to None. |
required | |
user_prompt |
Template for user prompt. Defaults to None. |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If experiment_args is not provided. |
preprocess(df_prompt=None)
Load system and user prompts from a DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df_prompt |
DataFrame
|
DataFrame containing prompt templates. Defaults to None. |
None
|
Returns:
| Type | Description |
|---|---|
tuple[str, str]
|
Tuple of (system_prompt, user_prompt). |
Raises:
| Type | Description |
|---|---|
PromptTemplateError
|
If prompt name is not found or required columns are missing. |