Parser
Document Processing Orchestrator Parser Package.
Modules:
Name | Description |
---|---|
BaseParser |
Abstract base class for document parser. |
PipelineParser |
Module for parsing document with several parsers using pipeline. |
BaseParser
Bases: ABC
Base class for document parser.
This class serves as the base for document parser, which will define the structure for every content of document.
Methods:
Name | Description |
---|---|
parse |
Abstract method to parse a document. |
parse(loaded_elements, **kwargs)
abstractmethod
Parse loaded elements to get element structure.
This method is abstract and must be implemented in subclasses. It defines the process of parsing a document using loaded elements.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
loaded_elements |
Any
|
The loaded elements from loader. ideally formatted as List[Dict]. |
required |
**kwargs |
Any
|
Additional keyword arguments for customization. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
Any |
Any
|
The parsed document, ideally formatted as List[Dict]. Each dictionary within the list are recommended to follows the structure of model 'Element', to ensure consistency and ease of use across Document Processing Orchestrator. |
PipelineParser()
Pipeline parser for parsing documents.
This class serves as the pipeline parser for parsing documents. It defines the structure for parsing documents with several parsers using pipeline.
Methods:
Name | Description |
---|---|
add_parser |
Add parser to the pipeline parser. |
parse |
Parse the elements using parsers. |
Initialize the pipeline parser.
add_parser(parser)
Add parser to the pipeline parser.
This method defines the process of adding parser to the pipeline parser.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
parser |
BaseParser
|
The parser to be added. |
required |
parse(elements, **kwargs)
Parse the elements using pipeline parser.
This method defines the process of parsing the elements using parsers.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
elements |
list[dict[str, Any]]
|
A list of dictionaries containing elements. |
required |
**kwargs |
Any
|
Additional keyword arguments. |
{}
|