Preprocessor
Base classes and utilities for data preprocessing.
This module provides the foundational components for preprocessing data from various sources. It contains abstract base classes, common exceptions, and utility functions that are shared across different preprocessor implementations.
Reviewer
- Muhammad Afif Al Hawari (muhammad.a.a.hawari@gdplabs.id)
References
NONE
BasePreprocessor
Bases: ABC
Base class for data preprocessors.
This class defines the common interface that all preprocessors must implement. Each concrete preprocessor should provide implementation for transforming data according to specific requirements (text cleaning, tokenization, augmentation, etc.).
The interface is designed to be consistent across different preprocessor types, making it easy to switch between implementations, add new ones, and maintain the factory pattern architecture.
All preprocessors must implement: 1. Factory methods for creating instances from experiment arguments 2. Configuration detection methods 3. Specific preprocessing methods for different data types or sources 4. Generic preprocessing method for custom transformation pipelines
preprocess(data)
abstractmethod
Preprocess the provided data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data |
DataFrame
|
The data to preprocess. |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
The preprocessed data. |
PreprocessorError
Bases: Exception
Base exception for all preprocessor errors.
This exception serves as the base class for all preprocessing-related errors. It provides a common interface for handling various types of failures that can occur during preprocessing operations such as data transformation issues, validation failures, or formatting problems.