Base parser
Defines an abstract base class to parse file by defining document structure.
Reviewers
Timotius Nugroho Chandra (timotius.n.chandra@gdplabs.id)
BaseParser
Bases: ABC
Base class for document parser.
This class serves as the base for document parser, which will define the structure for every content of document.
Methods:
Name | Description |
---|---|
parse |
Abstract method to parse a document. |
parse(loaded_elements, **kwargs)
abstractmethod
Parse loaded elements to get element structure.
This method is abstract and must be implemented in subclasses. It defines the process of parsing a document using loaded elements.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
loaded_elements |
Any
|
The loaded elements from loader. ideally formatted as List[Dict]. |
required |
**kwargs |
Any
|
Additional keyword arguments for customization. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
Any |
Any
|
The parsed document, ideally formatted as List[Dict]. Each dictionary within the list are recommended to follows the structure of model 'Element', to ensure consistency and ease of use across Document Processing Orchestrator. |