Skip to content

Loader

Document Processing Orchestrator Loader Package.

Modules:

Name Description
BaseLoader

Abstract base class for document loader.

PipelineLoader

Module for loading document with several loaders using pipeline.

Authors

Devita (devita1@gdplabs.id)

Reviewers

Timotius Nugroho Chandra (timotius.n.chandra@gdplabs.id)

BaseLoader

Bases: ABC

An abstract base class for document loaders.

This class defines the structure for loading and processing documents to retrieve required values. Subclasses are expected to implement the 'load' method to handle document loading from a given source.

Methods:

Name Description
load

Abstract method to load a document.

load(source, loaded_elements=None, **kwargs) abstractmethod

Load and process a document.

This method is abstract and must be implemented in subclasses. It defines the process of loading a document using its source.

Parameters:

Name Type Description Default
source str

Might be file path, URL, the content itself.

required
loaded_elements Any

The loaded elements from previous loaders. ideally formatted as List[Dict].

None
**kwargs Any

Additional keyword arguments for customization.

{}

Returns:

Name Type Description
Any Any

The loaded document, ideally formatted as List[Dict]. Each dictionary within the list are recommended to follows the structure of model 'Element', to ensure consistency and ease of use across Document Processing Orchestrator.

PipelineLoader(cache_data_store=None)

A pipeline loader for loading documents.

This class serves as the pipeline loader for loading document. It defines the structure for loading document with several loaders using pipeline.

Methods:

Name Description
add_loader

Add loader to the pipeline loader.

load

Load the document from the given source.

Initialize the PipelineLoader.

Parameters:

Name Type Description Default
cache_data_store BaseHybridCache

The cache data store to be used. Defaults to None.

None

add_loader(loader)

Add loader to the pipeline loader.

This method defines the process of adding loader to the pipeline loader.

Parameters:

Name Type Description Default
loader BaseLoader

The loader to be added.

required

load(source, **kwargs)

Load the document from the given file path.

This method defines the process of loading the document using loaders.

Parameters:

Name Type Description Default
source str

Might be file path, URL, the content itself.

required
**kwargs Any

Additional keyword arguments.

{}
Kwargs

ttl (int, optional): The TTL of the cache. Defaults to None.

Returns:

Type Description
list[dict[str, Any]]

List[dict[str, Any]]: A list of dictionaries containing loaded content and metadata.