Dpo Router
Defines a router to route the input into different processing pipelines based on certain criteria.
BaseDPORouter
Bases: ABC
Base class for routing in document processing.
route(*args, **kwargs)
abstractmethod
Routes the input into different processing pipelines based on certain criteria.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*args |
Any
|
Variable length argument list for routing parameters. |
()
|
**kwargs |
Any
|
Arbitrary keyword arguments for additional routing configuration. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
Any |
Any
|
The result of the routing process. |
LoaderRouter()
Bases: BaseDPORouter
Loader Router class.
This router determines the appropriate loader type based on the input source. Returns a dict with the loader type information.
Initialize the LoaderRouter.
This method initializes the LoaderRouter.
is_html_from_json(source)
Check if the source file contains valid HTML metadata.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
source |
str
|
The file path to check. |
required |
Returns:
Name | Type | Description |
---|---|---|
bool |
bool
|
True if the file is a valid HTML metadata file, False otherwise. |
route(source, *args, **kwargs)
Route the input source to the appropriate loader type.
This method determines the appropriate loader type based on the input source. It checks if the source is a file or a YouTube URL. 1. If it is a file, it checks the file extension or content to determine the loader type. 2. If it is a YouTube URL, it returns the audio loader type. 3. If it is not a file or a YouTube URL, it returns the uncategorized loader type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
source |
str
|
The input source, either a file path or a YouTube URL. |
required |
*args |
Any
|
Additional arguments. |
()
|
**kwargs |
Any
|
Additional keyword arguments. |
{}
|
Returns:
Type | Description |
---|---|
dict[str, Any]
|
dict[str, Any]: A dictionary containing the loader type information. Example: {LoaderType.KEY: LoaderType.PDF_LOADER} |
ParserRouter()
Bases: BaseDPORouter
Parser Router class.
This router determines the appropriate parser type based on the input source. Returns a dict with the parser type information.
Initialize the ParserRouter.
This method initializes the ParserRouter.
route(source, *args, **kwargs)
Determine the parser type from the input source.
The input source can be: - A string path to a JSON file containing loaded elements. - A list of loaded element dictionaries (in memory).
This method reads the input, extracts the source_type
from the first element’s metadata,
and returns the appropriate parser type. If loading fails or metadata is missing,
the parser type will be set as UNCATEGORIZED.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
source |
str | list[dict[str, Any]]
|
The input source, which can be either: - str: path to a JSON file containing loaded elements. - list[dict[str, Any]]: loaded elements. |
required |
*args |
Any
|
Additional arguments. |
()
|
**kwargs |
Any
|
Additional keyword arguments. |
{}
|
Returns:
Type | Description |
---|---|
dict[str, Any]
|
dict[str, Any]: A dictionary containing the parser type information. Example: {ParserType.KEY: ParserType.PDF_PARSER} |