Skip to content

Model

Document Processing Orchestrator Model Package.

Modules:

Name Description
Element

An Element model.

ElementMetadata

An ElementMetadata model.

LoaderType

An LoaderType model containing the loader type enum.

Media

An Media model containing media information.

Authors

Devita (devita1@gdplabs.id)

Reviewers

Timotius Nugroho Chandra (timotius.n.chandra@gdplabs.id)

Element

Bases: BaseModel

An Element model.

This class serves as the Element model for storing element text, structure, and metadata.

Attributes:

Name Type Description
text str

The element text.

structure str

The element structure.

metadata dict

The element metadata.

from_list_dict(elements) staticmethod

Convert a list of dictionaries to a list of Element objects.

to_list_dict(elements) staticmethod

Convert a list of Element objects to a list of dictionaries.

ElementMetadata

Bases: BaseModel

Element metadata model.

This class serves as the Element metadata model for storing element metadata.

Mandatory Attributes

source (str): The source of the element. source_type (str): The source type of the element. loaded_datetime (datetime): The datetime when the element is loaded.

Config

Pydantic model configuration.

This class defines the Pydantic model configuration for the ElementMetadata model.

Attributes:

Name Type Description
extra str

Allow extra fields.

LoaderType

Bases: StrEnum

Loader Type Enum.

This enum defines the different loader types.

Media

Bases: BaseModel

Media model which contains media information.

This class serves as the base model for storing media information in element metadata. Element with media (image, audio, video, youtube) will have metadata media in list of dict. Each dict will be following the Media model schema.

Attributes:

Name Type Description
media_id str

Unique identifier for the media, automatically generated from media_type and media_content.

media_type MediaType

Type of media (image, audio, video, youtube).

media_content str

Base64 encoded string or URL pointing to the media content.

media_content_type MediaSourceType

Type of content source (base64 or url).

media_id: str property

Generate a standardized media ID.

This property generates a standardized media ID in the format: {media_type}_{sha256_from_media_content_16_digit}

Returns:

Name Type Description
str str

The generated media ID.

Config

Pydantic model configuration.

This class defines the Pydantic model configuration for the Media model.

Attributes:

Name Type Description
extra str

Allow extra fields.

ParserType

Bases: StrEnum

Parser Type Enum.

This enum defines the different parser types.