Model
Document Processing Orchestrator Model Package.
Modules:
Name | Description |
---|---|
Element |
An Element model. |
ElementMetadata |
An ElementMetadata model. |
LoaderType |
An LoaderType model containing the loader type enum. |
Media |
An Media model containing media information. |
Reviewers
Timotius Nugroho Chandra (timotius.n.chandra@gdplabs.id)
Element
Bases: BaseModel
An Element model.
This class serves as the Element model for storing element text, structure, and metadata.
Attributes:
Name | Type | Description |
---|---|---|
text |
str
|
The element text. |
structure |
str
|
The element structure. |
metadata |
dict
|
The element metadata. |
from_list_dict(elements)
staticmethod
Convert a list of dictionaries to a list of Element objects.
to_list_dict(elements)
staticmethod
Convert a list of Element objects to a list of dictionaries.
ElementMetadata
Bases: BaseModel
Element metadata model.
This class serves as the Element metadata model for storing element metadata.
Mandatory Attributes
source (str): The source of the element. source_type (str): The source type of the element. loaded_datetime (datetime): The datetime when the element is loaded.
Config
Pydantic model configuration.
This class defines the Pydantic model configuration for the ElementMetadata model.
Attributes:
Name | Type | Description |
---|---|---|
extra |
str
|
Allow extra fields. |
LoaderType
Bases: StrEnum
Loader Type Enum.
This enum defines the different loader types.
Media
Bases: BaseModel
Media model which contains media information.
This class serves as the base model for storing media information in element metadata.
Element with media (image, audio, video, youtube) will have metadata media
in list of dict.
Each dict will be following the Media model schema.
Attributes:
Name | Type | Description |
---|---|---|
media_id |
str
|
Unique identifier for the media, automatically generated from media_type and media_content. |
media_type |
MediaType
|
Type of media (image, audio, video, youtube). |
media_content |
str
|
Base64 encoded string or URL pointing to the media content. |
media_content_type |
MediaSourceType
|
Type of content source (base64 or url). |
media_id: str
property
Generate a standardized media ID.
This property generates a standardized media ID in the format: {media_type}_{sha256_from_media_content_16_digit}
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
The generated media ID. |
Config
Pydantic model configuration.
This class defines the Pydantic model configuration for the Media model.
Attributes:
Name | Type | Description |
---|---|---|
extra |
str
|
Allow extra fields. |
ParserType
Bases: StrEnum
Parser Type Enum.
This enum defines the different parser types.