Skip to content

Preprocessor

Base classes and utilities for data preprocessing.

This module provides the foundational components for preprocessing data from various sources. It contains abstract base classes, common exceptions, and utility functions that are shared across different preprocessor implementations.

Authors
  • Alfan Dinda Rahmawan (alfan.d.rahmawan@gdplabs.id)
Reviewer
  • Muhammad Afif Al Hawari (muhammad.a.a.hawari@gdplabs.id)
References

NONE

BasePreprocessor

Bases: ABC

Base class for data preprocessors.

This class defines the common interface that all preprocessors must implement. Each concrete preprocessor should provide implementation for transforming data according to specific requirements (text cleaning, tokenization, augmentation, etc.).

The interface is designed to be consistent across different preprocessor types, making it easy to switch between implementations, add new ones, and maintain the factory pattern architecture.

All preprocessors must implement: 1. Factory methods for creating instances from experiment arguments 2. Configuration detection methods 3. Specific preprocessing methods for different data types or sources 4. Generic preprocessing method for custom transformation pipelines

preprocess(data) abstractmethod

Preprocess the provided data.

Parameters:

Name Type Description Default
data DataFrame

The data to preprocess.

required

Returns:

Type Description
DataFrame

The preprocessed data.

PreprocessorError

Bases: Exception

Base exception for all preprocessor errors.

This exception serves as the base class for all preprocessing-related errors. It provides a common interface for handling various types of failures that can occur during preprocessing operations such as data transformation issues, validation failures, or formatting problems.