Skip to content

Parser

Document Processing Orchestrator Parser Package.

Modules:

Name Description
BaseParser

Abstract base class for document parser.

PipelineParser

Module for parsing document with several parsers using pipeline.

Authors

Devita (devita1@gdplabs.id)

BaseParser

Bases: ABC

Base class for document parser.

This class serves as the base for document parser, which will define the structure for every content of document.

Methods:

Name Description
parse

Abstract method to parse a document.

parse(loaded_elements, **kwargs) abstractmethod

Parse loaded elements to get element structure.

This method is abstract and must be implemented in subclasses. It defines the process of parsing a document using loaded elements.

Parameters:

Name Type Description Default
loaded_elements Any

The loaded elements from loader. ideally formatted as List[Dict].

required
**kwargs Any

Additional keyword arguments for customization.

{}

Returns:

Name Type Description
Any Any

The parsed document, ideally formatted as List[Dict]. Each dictionary within the list are recommended to follows the structure of model 'Element', to ensure consistency and ease of use across Document Processing Orchestrator.

PipelineParser()

Pipeline parser for parsing documents.

This class serves as the pipeline parser for parsing documents. It defines the structure for parsing documents with several parsers using pipeline.

Methods:

Name Description
add_parser

Add parser to the pipeline parser.

parse

Parse the elements using parsers.

Initialize the pipeline parser.

add_parser(parser)

Add parser to the pipeline parser.

This method defines the process of adding parser to the pipeline parser.

Parameters:

Name Type Description Default
parser BaseParser

The parser to be added.

required

parse(elements, **kwargs)

Parse the elements using pipeline parser.

This method defines the process of parsing the elements using parsers.

Parameters:

Name Type Description Default
elements list[dict[str, Any]]

A list of dictionaries containing elements.

required
**kwargs Any

Additional keyword arguments.

{}