Skip to content

Builder

Modules concerning the builder utilities of GLLM Multimodal modules.

build_caption_output_formatter(formatting_strategy)

Build a caption output formatter based on the provided formatting strategy.

Parameters:

Name Type Description Default
formatting_strategy CaptionOutputFormattingStrategy

The formatting strategy to use.

required

Returns:

Name Type Description
BaseCaptionOutputFormatter BaseCaptionOutputFormatter

The caption output formatter.

build_modality_converter(source_modality, target_modality, task_type=ModalityConverterTask.AUTO, approach_type=None, preset=None, strategy=None, **kwargs)

Build and initialize a modality converter instance for a given configuration.

The factory looks up the converter class based on the combination of
  • source_modality: input modality (e.g., Modality.IMAGE, Modality.AUDIO)
  • target_modality: output modality (e.g., Modality.TEXT)
  • task_type: conversion task (e.g., CAPTIONING, TRANSCRIPT, MERMAID, or AUTO)
  • approach_type: the converter's algorithmic approach; required for non-AUTO tasks, must be None for AUTO

All supported combinations must be registered in MODALITY_CONVERTER_REGISTRY.

Parameters:

Name Type Description Default
source_modality Modality

The source modality.

required
target_modality Modality

The output modality.

required
task_type ModalityConverterTask

The conversion task. Defaults to ModalityConverterTask.AUTO.

AUTO
approach_type ModalityConverterApproach | None

The approach for the conversion. Required for non-AUTO tasks; must be None for task_type=AUTO.

None
preset str | None

Preset identifier for the converter's .from_preset() method. If None, uses the default preset for the class.

None
strategy ModalityConverterBuildStrategy | None

The build strategy to use. If None, the strategy is determined automatically based on the provided parameters.

None
**kwargs Any

Additional keyword arguments passed to the converter, including: 1. lmrp_config (dict[str, Any]): Configuration to build an LMRP instance. Should follow the same structure as build_lm_request_processor. 2. Any other parameters supported by the converter's initialization method.

{}

Returns:

Name Type Description
BaseModalityConverter BaseModalityConverter

An instance of the matching converter class.

Raises:

Type Description
ValueError

If the configuration is invalid or not registered, including: - (source_modality, target_modality, task_type, approach) not registered - approach_type missing for non-AUTO task_type - approach_type provided when task_type is AUTO - Any dimension unsupported for the given combination

Modules

caption_output_formatter_builder

Builder for caption output formatters.

This module provides a factory function to create caption output formatters based on the specified formatting strategy.

build_caption_output_formatter(formatting_strategy)

Build a caption output formatter based on the provided formatting strategy.

Parameters:

Name Type Description Default
formatting_strategy CaptionOutputFormattingStrategy

The formatting strategy to use.

required

Returns:

Name Type Description
BaseCaptionOutputFormatter BaseCaptionOutputFormatter

The caption output formatter.

modality_converter_builder

Defines a convenience function to build a modality converter.

build_modality_converter(source_modality, target_modality, task_type=ModalityConverterTask.AUTO, approach_type=None, preset=None, strategy=None, **kwargs)

Build and initialize a modality converter instance for a given configuration.

The factory looks up the converter class based on the combination of
  • source_modality: input modality (e.g., Modality.IMAGE, Modality.AUDIO)
  • target_modality: output modality (e.g., Modality.TEXT)
  • task_type: conversion task (e.g., CAPTIONING, TRANSCRIPT, MERMAID, or AUTO)
  • approach_type: the converter's algorithmic approach; required for non-AUTO tasks, must be None for AUTO

All supported combinations must be registered in MODALITY_CONVERTER_REGISTRY.

Parameters:

Name Type Description Default
source_modality Modality

The source modality.

required
target_modality Modality

The output modality.

required
task_type ModalityConverterTask

The conversion task. Defaults to ModalityConverterTask.AUTO.

AUTO
approach_type ModalityConverterApproach | None

The approach for the conversion. Required for non-AUTO tasks; must be None for task_type=AUTO.

None
preset str | None

Preset identifier for the converter's .from_preset() method. If None, uses the default preset for the class.

None
strategy ModalityConverterBuildStrategy | None

The build strategy to use. If None, the strategy is determined automatically based on the provided parameters.

None
**kwargs Any

Additional keyword arguments passed to the converter, including: 1. lmrp_config (dict[str, Any]): Configuration to build an LMRP instance. Should follow the same structure as build_lm_request_processor. 2. Any other parameters supported by the converter's initialization method.

{}

Returns:

Name Type Description
BaseModalityConverter BaseModalityConverter

An instance of the matching converter class.

Raises:

Type Description
ValueError

If the configuration is invalid or not registered, including: - (source_modality, target_modality, task_type, approach) not registered - approach_type missing for non-AUTO task_type - approach_type provided when task_type is AUTO - Any dimension unsupported for the given combination

modality_transformer_builder

Defines a convenience function to build a modality transformer.

build_modality_transformer(source_modality=Modality.IMAGE, target_modality=Modality.TEXT, transformer_type=ModalityTransformerType.STANDARD, router_config=None, converter_config=None, **kwargs)

Build and initialize a modality transformer instance for a given configuration.

The factory looks up the converter class based on the combination of
  • source_modality: input modality (e.g., Modality.IMAGE, Modality.AUDIO)
  • target_modality: output modality (e.g., Modality.TEXT)
  • transformer_type: transformer type (e.g., ModalityTransformerType.STANDARD)

The factory then delegates construction to each class's from_config() classmethod.

Parameters:

Name Type Description Default
source_modality Modality

The source modality. Defaults to Modality.IMAGE.

IMAGE
target_modality Modality

The output modality. Defaults to Modality.TEXT.

TEXT
transformer_type ModalityTransformerType

The transformer type. Defaults to ModalityTransformerType.STANDARD.

STANDARD
router_config RouterConfig | None

Additional router configuration. Defaults to None.

None
converter_config dict[str, ConverterConfig] | None

Additional converter configuration. Defaults to None.

None
**kwargs

Additional keyword arguments passed to transformer.

{}

Returns:

Name Type Description
BaseModalityTransformer BaseModalityTransformer

An instance of the matching transformer class.

Raises:

Type Description
ValueError

If the configuration is invalid or not registered, including: - (source_modality, target_modality, transformer_type) not registered.

Examples:

  1. Default routers and default converters. >>> build_modality_transformer( ... source_modality=Modality.IMAGE, ... target_modality=Modality.TEXT, ... transformer_type=ModalityTransformerType.STANDARD, ... )

  2. Custom router and default converters. Note: route_mapping values must match the keys used in converter_config (e.g., "captioning", "mermaid" when using default converters). >>> build_modality_transformer( ... source_modality=Modality.IMAGE, ... target_modality=Modality.TEXT, ... transformer_type=ModalityTransformerType.STANDARD, ... router_config=RouterConfig( ... modality=Modality.IMAGE, ... preset="multimodal", ... model_id="openai/text-embedding-3-small", ... es_url="http://localhost:9200", ... es_index_name="gllm_multimodal", ... route_mapping={ ... "chart": "captioning", ... "diagram": "mermaid", ... } ... ) ... )

  3. Default router and custom converters with preset. # Additional imports needed for examples 3-5: # from gllm_multimodal.constants import ModalityConverterTask, ModalityConverterApproach >>> build_modality_transformer( ... source_modality=Modality.IMAGE, ... target_modality=Modality.TEXT, ... transformer_type=ModalityTransformerType.STANDARD, ... converter_config={ ... "captioning": ConverterConfig( ... source_modality=Modality.IMAGE, ... target_modality=Modality.TEXT, ... task_type=ModalityConverterTask.CAPTIONING, ... approach_type=ModalityConverterApproach.LM_BASED, ... preset="default", ... ), ... "mermaid": ConverterConfig( ... source_modality=Modality.IMAGE, ... target_modality=Modality.TEXT, ... task_type=ModalityConverterTask.MERMAID, ... approach_type=ModalityConverterApproach.LM_BASED, ... preset="default", ... ), ... } ... )

  4. Default router and custom converters with custom LMRP. >>> build_modality_transformer( ... source_modality=Modality.IMAGE, ... target_modality=Modality.TEXT, ... transformer_type=ModalityTransformerType.STANDARD, ... converter_config={ ... "captioning": ConverterConfig( ... source_modality=Modality.IMAGE, ... target_modality=Modality.TEXT, ... task_type=ModalityConverterTask.CAPTIONING, ... approach_type=ModalityConverterApproach.LM_BASED, ... lmrp_config={ ... "model_id": "google/gemini-3-flash-preview", ... "system_template": DEFAULT_CAPTION_SYSTEM_PROMPT, ... "user_template": DEFAULT_CAPTION_USER_PROMPT, ... "output_parser_type": "json", ... }, ... ), ... "mermaid": ConverterConfig( ... source_modality=Modality.IMAGE, ... target_modality=Modality.TEXT, ... task_type=ModalityConverterTask.MERMAID, ... approach_type=ModalityConverterApproach.LM_BASED, ... lmrp_config={ ... "model_id": "openai/gpt-5.2-latest", ... "system_template": DEFAULT_MERMAID_SYSTEM_PROMPT, ... "user_template": DEFAULT_MERMAID_USER_PROMPT, ... }, ... ), ... } ... )

  5. Custom router and custom converters. >>> build_modality_transformer( ... source_modality=Modality.IMAGE, ... target_modality=Modality.TEXT, ... transformer_type=ModalityTransformerType.STANDARD, ... router_config=RouterConfig( ... modality=Modality.IMAGE, ... preset="multimodal", ... model_id="openai/text-embedding-3-small", ... route_mapping={ ... "chart": "captioning", ... "diagram": "mermaid", ... }, ... ), ... converter_config={ ... "captioning": ConverterConfig( ... source_modality=Modality.IMAGE, ... target_modality=Modality.TEXT, ... task_type=ModalityConverterTask.CAPTIONING, ... approach_type=ModalityConverterApproach.LM_BASED, ... lmrp_config={ ... "model_id": "google/gemini-3-flash-preview", ... "system_template": DEFAULT_CAPTION_SYSTEM_PROMPT, ... "user_template": DEFAULT_CAPTION_USER_PROMPT, ... "output_parser_type": "json", ... }, ... ), ... "mermaid": ConverterConfig( ... source_modality=Modality.IMAGE, ... target_modality=Modality.TEXT, ... task_type=ModalityConverterTask.MERMAID, ... approach_type=ModalityConverterApproach.LM_BASED, ... lmrp_config={ ... "model_id": "openai/gpt-5.2-latest", ... "system_template": DEFAULT_MERMAID_SYSTEM_PROMPT, ... "user_template": DEFAULT_MERMAID_USER_PROMPT, ... }, ... ), ... } ... )