Skip to content

Downloader

Document Processing Orchestrator Downloader Package.

Modules:

Name Description
BaseDownloader

Abstract base class for document downloader.

DirectFileURLDownloader

Downloader for direct file URL.

BaseDownloader

Bases: ABC

Base class for document downloader.

download(source, output, **kwargs) abstractmethod

Download source to the output directory.

Parameters:

Name Type Description Default
source str

The source to be downloaded.

required
output str

The output directory where the downloaded source will be saved.

required
**kwargs Any

Additional keyword arguments.

{}

Returns:

Type Description
list[str] | None

list[str] | None: A list of file paths of successfully downloaded files. If no files are downloaded, an empty list should be returned. Returning None is only for backward compatibility and should be avoided in new implementations.

DirectFileURLDownloader(stream_buffer_size=65536, max_retries=3, timeout=None)

Bases: BaseDownloader

A class for downloading files from a direct file URL to the defined output directory.

Initialize the DirectFileURLDownloader.

Parameters:

Name Type Description Default
stream_buffer_size int

The size of the buffer for streaming downloads in bytes. Defaults to 64KB (65536 bytes).

65536
max_retries int

The maximum number of retries for failed downloads. Defaults to 3.

3
timeout int | None

The timeout for the download request in seconds. Defaults to None.

None

download(source, output, **kwargs)

Download source to the output directory.

Parameters:

Name Type Description Default
source str

The source to be downloaded.

required
output str

The output directory where the downloaded source will be saved.

required
**kwargs Any

Additional keyword arguments.

{}

Returns:

Type Description
list[str]

list[str]: A list of file paths of successfully downloaded files.