Skip to content

Dpo Router

Defines a router to route the input into different processing pipelines based on certain criteria.

Authors

Hubert Michael Sanyoto (hubert.m.sanyoto@gdplabs.id)

BaseDPORouter

Bases: ABC

Base class for routing in document processing.

route(*args, **kwargs) abstractmethod

Routes the input into different processing pipelines based on certain criteria.

Parameters:

Name Type Description Default
*args Any

Variable length argument list for routing parameters.

()
**kwargs Any

Arbitrary keyword arguments for additional routing configuration.

{}

Returns:

Name Type Description
Any Any

The result of the routing process.

LoaderRouter()

Bases: BaseDPORouter

Loader Router class.

This router determines the appropriate loader type based on the input source. Returns a dict with the loader type information.

Initialize the LoaderRouter.

This method initializes the LoaderRouter.

is_html_from_json(source)

Check if the source file contains valid HTML metadata.

Parameters:

Name Type Description Default
source str

The file path to check.

required

Returns:

Name Type Description
bool bool

True if the file is a valid HTML metadata file, False otherwise.

route(source, *args, **kwargs)

Route the input source to the appropriate loader type.

This method determines the appropriate loader type based on the input source. It checks if the source is a file or a YouTube URL. 1. If it is a file, it checks the file extension or content to determine the loader type. 2. If it is a YouTube URL, it returns the audio loader type. 3. If it is not a file or a YouTube URL, it returns the uncategorized loader type.

Parameters:

Name Type Description Default
source str

The input source, either a file path or a YouTube URL.

required
*args Any

Additional arguments.

()
**kwargs Any

Additional keyword arguments.

{}

Returns:

Type Description
dict[str, Any]

dict[str, Any]: A dictionary containing the loader type information. Example: {LoaderType.KEY: LoaderType.PDF_LOADER}

ParserRouter()

Bases: BaseDPORouter

Parser Router class.

This router determines the appropriate parser type based on the input source. Returns a dict with the parser type information.

Initialize the ParserRouter.

This method initializes the ParserRouter.

route(source, *args, **kwargs)

Determine the parser type from the input source.

The input source can be: - A string path to a JSON file containing loaded elements. - A list of loaded element dictionaries (in memory).

This method reads the input, extracts the source_type from the first element’s metadata, and returns the appropriate parser type. If loading fails or metadata is missing, the parser type will be set as UNCATEGORIZED.

Parameters:

Name Type Description Default
source str | list[dict[str, Any]]

The input source, which can be either: - str: path to a JSON file containing loaded elements. - list[dict[str, Any]]: loaded elements.

required
*args Any

Additional arguments.

()
**kwargs Any

Additional keyword arguments.

{}

Returns:

Type Description
dict[str, Any]

dict[str, Any]: A dictionary containing the parser type information. Example: {ParserType.KEY: ParserType.PDF_PARSER}