Builder
Modules concerning the builder utilities of GLLM Multimodal modules.
build_caption_output_formatter(formatting_strategy)
Build a caption output formatter based on the provided formatting strategy.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
formatting_strategy
|
CaptionOutputFormattingStrategy
|
The formatting strategy to use. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
BaseCaptionOutputFormatter |
BaseCaptionOutputFormatter
|
The caption output formatter. |
build_modality_converter(source_modality, target_modality, task_type=ModalityConverterTask.AUTO, approach_type=None, preset=None, strategy=None, **kwargs)
Build and initialize a modality converter instance for a given configuration.
The factory looks up the converter class based on the combination of
- source_modality: input modality (e.g., Modality.IMAGE, Modality.AUDIO)
- target_modality: output modality (e.g., Modality.TEXT)
- task_type: conversion task (e.g., CAPTIONING, TRANSCRIPT, MERMAID, or AUTO)
- approach_type: the converter's algorithmic approach; required for non-AUTO tasks, must be None for AUTO
All supported combinations must be registered in MODALITY_CONVERTER_REGISTRY.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source_modality
|
Modality
|
The source modality. |
required |
target_modality
|
Modality
|
The output modality. |
required |
task_type
|
ModalityConverterTask
|
The conversion task. Defaults to ModalityConverterTask.AUTO. |
AUTO
|
approach_type
|
ModalityConverterApproach | None
|
The approach for the conversion. Required for non-AUTO tasks; must be None for task_type=AUTO. |
None
|
preset
|
str | None
|
Preset identifier for the converter's .from_preset() method. If None, uses the default preset for the class. |
None
|
strategy
|
ModalityConverterBuildStrategy | None
|
The build strategy to use. If None, the strategy is determined automatically based on the provided parameters. |
None
|
**kwargs
|
Any
|
Additional keyword arguments passed to the converter, including:
1. lmrp_config (dict[str, Any]): Configuration to build an LMRP instance.
Should follow the same structure as |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
BaseModalityConverter |
BaseModalityConverter
|
An instance of the matching converter class. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the configuration is invalid or not registered, including: - (source_modality, target_modality, task_type, approach) not registered - approach_type missing for non-AUTO task_type - approach_type provided when task_type is AUTO - Any dimension unsupported for the given combination |
Modules
caption_output_formatter_builder
Builder for caption output formatters.
This module provides a factory function to create caption output formatters based on the specified formatting strategy.
build_caption_output_formatter(formatting_strategy)
Build a caption output formatter based on the provided formatting strategy.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
formatting_strategy
|
CaptionOutputFormattingStrategy
|
The formatting strategy to use. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
BaseCaptionOutputFormatter |
BaseCaptionOutputFormatter
|
The caption output formatter. |
modality_converter_builder
Defines a convenience function to build a modality converter.
build_modality_converter(source_modality, target_modality, task_type=ModalityConverterTask.AUTO, approach_type=None, preset=None, strategy=None, **kwargs)
Build and initialize a modality converter instance for a given configuration.
The factory looks up the converter class based on the combination of
- source_modality: input modality (e.g., Modality.IMAGE, Modality.AUDIO)
- target_modality: output modality (e.g., Modality.TEXT)
- task_type: conversion task (e.g., CAPTIONING, TRANSCRIPT, MERMAID, or AUTO)
- approach_type: the converter's algorithmic approach; required for non-AUTO tasks, must be None for AUTO
All supported combinations must be registered in MODALITY_CONVERTER_REGISTRY.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source_modality
|
Modality
|
The source modality. |
required |
target_modality
|
Modality
|
The output modality. |
required |
task_type
|
ModalityConverterTask
|
The conversion task. Defaults to ModalityConverterTask.AUTO. |
AUTO
|
approach_type
|
ModalityConverterApproach | None
|
The approach for the conversion. Required for non-AUTO tasks; must be None for task_type=AUTO. |
None
|
preset
|
str | None
|
Preset identifier for the converter's .from_preset() method. If None, uses the default preset for the class. |
None
|
strategy
|
ModalityConverterBuildStrategy | None
|
The build strategy to use. If None, the strategy is determined automatically based on the provided parameters. |
None
|
**kwargs
|
Any
|
Additional keyword arguments passed to the converter, including:
1. lmrp_config (dict[str, Any]): Configuration to build an LMRP instance.
Should follow the same structure as |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
BaseModalityConverter |
BaseModalityConverter
|
An instance of the matching converter class. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the configuration is invalid or not registered, including: - (source_modality, target_modality, task_type, approach) not registered - approach_type missing for non-AUTO task_type - approach_type provided when task_type is AUTO - Any dimension unsupported for the given combination |
modality_transformer_builder
Defines a convenience function to build a modality transformer.
build_modality_transformer(source_modality=Modality.IMAGE, target_modality=Modality.TEXT, transformer_type=ModalityTransformerType.STANDARD, router_config=None, converter_config=None, **kwargs)
Build and initialize a modality transformer instance for a given configuration.
The factory looks up the converter class based on the combination of
- source_modality: input modality (e.g., Modality.IMAGE, Modality.AUDIO)
- target_modality: output modality (e.g., Modality.TEXT)
- transformer_type: transformer type (e.g., ModalityTransformerType.STANDARD)
The factory then delegates construction to each class's from_config() classmethod.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source_modality
|
Modality
|
The source modality. Defaults to Modality.IMAGE. |
IMAGE
|
target_modality
|
Modality
|
The output modality. Defaults to Modality.TEXT. |
TEXT
|
transformer_type
|
ModalityTransformerType
|
The transformer type. Defaults to ModalityTransformerType.STANDARD. |
STANDARD
|
router_config
|
RouterConfig | None
|
Additional router configuration. Defaults to None. |
None
|
converter_config
|
dict[str, ConverterConfig] | None
|
Additional converter configuration. Defaults to None. |
None
|
**kwargs
|
Additional keyword arguments passed to transformer. |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
BaseModalityTransformer |
BaseModalityTransformer
|
An instance of the matching transformer class. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the configuration is invalid or not registered, including: - (source_modality, target_modality, transformer_type) not registered. |
Examples:
-
Default routers and default converters. >>> build_modality_transformer( ... source_modality=Modality.IMAGE, ... target_modality=Modality.TEXT, ... transformer_type=ModalityTransformerType.STANDARD, ... )
-
Custom router and default converters. Note: route_mapping values must match the keys used in converter_config (e.g., "captioning", "mermaid" when using default converters). >>> build_modality_transformer( ... source_modality=Modality.IMAGE, ... target_modality=Modality.TEXT, ... transformer_type=ModalityTransformerType.STANDARD, ... router_config=RouterConfig( ... modality=Modality.IMAGE, ... preset="multimodal", ... model_id="openai/text-embedding-3-small", ... es_url="http://localhost:9200", ... es_index_name="gllm_multimodal", ... route_mapping={ ... "chart": "captioning", ... "diagram": "mermaid", ... } ... ) ... )
-
Default router and custom converters with preset. # Additional imports needed for examples 3-5: # from gllm_multimodal.constants import ModalityConverterTask, ModalityConverterApproach >>> build_modality_transformer( ... source_modality=Modality.IMAGE, ... target_modality=Modality.TEXT, ... transformer_type=ModalityTransformerType.STANDARD, ... converter_config={ ... "captioning": ConverterConfig( ... source_modality=Modality.IMAGE, ... target_modality=Modality.TEXT, ... task_type=ModalityConverterTask.CAPTIONING, ... approach_type=ModalityConverterApproach.LM_BASED, ... preset="default", ... ), ... "mermaid": ConverterConfig( ... source_modality=Modality.IMAGE, ... target_modality=Modality.TEXT, ... task_type=ModalityConverterTask.MERMAID, ... approach_type=ModalityConverterApproach.LM_BASED, ... preset="default", ... ), ... } ... )
-
Default router and custom converters with custom LMRP. >>> build_modality_transformer( ... source_modality=Modality.IMAGE, ... target_modality=Modality.TEXT, ... transformer_type=ModalityTransformerType.STANDARD, ... converter_config={ ... "captioning": ConverterConfig( ... source_modality=Modality.IMAGE, ... target_modality=Modality.TEXT, ... task_type=ModalityConverterTask.CAPTIONING, ... approach_type=ModalityConverterApproach.LM_BASED, ... lmrp_config={ ... "model_id": "google/gemini-3-flash-preview", ... "system_template": DEFAULT_CAPTION_SYSTEM_PROMPT, ... "user_template": DEFAULT_CAPTION_USER_PROMPT, ... "output_parser_type": "json", ... }, ... ), ... "mermaid": ConverterConfig( ... source_modality=Modality.IMAGE, ... target_modality=Modality.TEXT, ... task_type=ModalityConverterTask.MERMAID, ... approach_type=ModalityConverterApproach.LM_BASED, ... lmrp_config={ ... "model_id": "openai/gpt-5.2-latest", ... "system_template": DEFAULT_MERMAID_SYSTEM_PROMPT, ... "user_template": DEFAULT_MERMAID_USER_PROMPT, ... }, ... ), ... } ... )
-
Custom router and custom converters. >>> build_modality_transformer( ... source_modality=Modality.IMAGE, ... target_modality=Modality.TEXT, ... transformer_type=ModalityTransformerType.STANDARD, ... router_config=RouterConfig( ... modality=Modality.IMAGE, ... preset="multimodal", ... model_id="openai/text-embedding-3-small", ... route_mapping={ ... "chart": "captioning", ... "diagram": "mermaid", ... }, ... ), ... converter_config={ ... "captioning": ConverterConfig( ... source_modality=Modality.IMAGE, ... target_modality=Modality.TEXT, ... task_type=ModalityConverterTask.CAPTIONING, ... approach_type=ModalityConverterApproach.LM_BASED, ... lmrp_config={ ... "model_id": "google/gemini-3-flash-preview", ... "system_template": DEFAULT_CAPTION_SYSTEM_PROMPT, ... "user_template": DEFAULT_CAPTION_USER_PROMPT, ... "output_parser_type": "json", ... }, ... ), ... "mermaid": ConverterConfig( ... source_modality=Modality.IMAGE, ... target_modality=Modality.TEXT, ... task_type=ModalityConverterTask.MERMAID, ... approach_type=ModalityConverterApproach.LM_BASED, ... lmrp_config={ ... "model_id": "openai/gpt-5.2-latest", ... "system_template": DEFAULT_MERMAID_SYSTEM_PROMPT, ... "user_template": DEFAULT_MERMAID_USER_PROMPT, ... }, ... ), ... } ... )