Downloader
Document Processing Orchestrator Downloader Package.
Modules:
Name | Description |
---|---|
BaseDownloader |
Abstract base class for document downloader. |
DirectFileURLDownloader |
Downloader for direct file URL. |
BaseDownloader
Bases: ABC
Base class for document downloader.
download(source, output, **kwargs)
abstractmethod
Download source to the output directory.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
source |
str
|
The source to be downloaded. |
required |
output |
str
|
The output directory where the downloaded source will be saved. |
required |
**kwargs |
Any
|
Additional keyword arguments. |
{}
|
Returns:
Type | Description |
---|---|
list[str] | None
|
list[str] | None: A list of file paths of successfully downloaded files. If no files are downloaded, an empty list should be returned. Returning None is only for backward compatibility and should be avoided in new implementations. |
DirectFileURLDownloader(stream_buffer_size=65536, max_retries=3, timeout=None)
Bases: BaseDownloader
A class for downloading files from a direct file URL to the defined output directory.
Initialize the DirectFileURLDownloader.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
stream_buffer_size |
int
|
The size of the buffer for streaming downloads in bytes. Defaults to 64KB (65536 bytes). |
65536
|
max_retries |
int
|
The maximum number of retries for failed downloads. Defaults to 3. |
3
|
timeout |
int | None
|
The timeout for the download request in seconds. Defaults to None. |
None
|
download(source, output, **kwargs)
Download source to the output directory.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
source |
str
|
The source to be downloaded. |
required |
output |
str
|
The output directory where the downloaded source will be saved. |
required |
**kwargs |
Any
|
Additional keyword arguments. |
{}
|
Returns:
Type | Description |
---|---|
list[str]
|
list[str]: A list of file paths of successfully downloaded files. |