Skip to content

Base parser

Defines an abstract base class to parse file by defining document structure.

Authors

Devita (devita1@gdplabs.id)

Reviewers

Timotius Nugroho Chandra (timotius.n.chandra@gdplabs.id)

BaseParser

Bases: ABC

Base class for document parser.

This class serves as the base for document parser, which will define the structure for every content of document.

Methods:

Name Description
parse

Abstract method to parse a document.

parse(loaded_elements, **kwargs) abstractmethod

Parse loaded elements to get element structure.

This method is abstract and must be implemented in subclasses. It defines the process of parsing a document using loaded elements.

Parameters:

Name Type Description Default
loaded_elements Any

The loaded elements from loader. ideally formatted as List[Dict].

required
**kwargs Any

Additional keyword arguments for customization.

{}

Returns:

Name Type Description
Any Any

The parsed document, ideally formatted as List[Dict]. Each dictionary within the list are recommended to follows the structure of model 'Element', to ensure consistency and ease of use across Document Processing Orchestrator.