Loader
Document Processing Orchestrator Loader Package.
Modules:
Name | Description |
---|---|
BaseLoader |
Abstract base class for document loader. |
PipelineLoader |
Module for loading document with several loaders using pipeline. |
Reviewers
Timotius Nugroho Chandra (timotius.n.chandra@gdplabs.id)
BaseLoader
Bases: ABC
An abstract base class for document loaders.
This class defines the structure for loading and processing documents to retrieve required values. Subclasses are expected to implement the 'load' method to handle document loading from a given source.
Methods:
Name | Description |
---|---|
load |
Abstract method to load a document. |
load(source, loaded_elements=None, **kwargs)
abstractmethod
Load and process a document.
This method is abstract and must be implemented in subclasses. It defines the process of loading a document using its source.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
source |
str
|
Might be file path, URL, the content itself. |
required |
loaded_elements |
Any
|
The loaded elements from previous loaders. ideally formatted as List[Dict]. |
None
|
**kwargs |
Any
|
Additional keyword arguments for customization. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
Any |
Any
|
The loaded document, ideally formatted as List[Dict]. Each dictionary within the list are recommended to follows the structure of model 'Element', to ensure consistency and ease of use across Document Processing Orchestrator. |
PipelineLoader(cache_data_store=None)
A pipeline loader for loading documents.
This class serves as the pipeline loader for loading document. It defines the structure for loading document with several loaders using pipeline.
Methods:
Name | Description |
---|---|
add_loader |
Add loader to the pipeline loader. |
load |
Load the document from the given source. |
Initialize the PipelineLoader.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
cache_data_store |
BaseHybridCache
|
The cache data store to be used. Defaults to None. |
None
|
add_loader(loader)
Add loader to the pipeline loader.
This method defines the process of adding loader to the pipeline loader.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
loader |
BaseLoader
|
The loader to be added. |
required |
load(source, **kwargs)
Load the document from the given file path.
This method defines the process of loading the document using loaders.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
source |
str
|
Might be file path, URL, the content itself. |
required |
**kwargs |
Any
|
Additional keyword arguments. |
{}
|
Kwargs
ttl (int, optional): The TTL of the cache. Defaults to None.
Returns:
Type | Description |
---|---|
list[dict[str, Any]]
|
List[dict[str, Any]]: A list of dictionaries containing loaded content and metadata. |