Multimodal Lm Invoker

Modules concerning the multimodal language model invokers used in Gen AI applications.

`AnthropicMultimodalLMInvoker(model_name, api_key, model_kwargs=None, default_hyperparameters=None)`

Bases: BaseMultimodalLMInvoker[str | bytes, str]

An invoker to interact with multimodal language models hosted through Anthropic API endpoints.

The AnthropicMultimodalLMInvoker class is designed to interact with multimodal language models hosted through Anthropic API endpoints. It provides a framework for invoking multimodal language models with the provided prompt and hyperparameters. It supports both standard and streaming invocation. Streaming mode is enabled if an event emitter is provided.

Attributes:

Name	Type	Description
`client`	`Anthropic`	The Anthropic client instance.
`model_name`	`str`	The name of the Anthropic model.
`default_hyperparameters`	`dict[str, Any]`	Default hyperparameters for invoking the multimodal language model.

Notes

The AnthropicMultimodalLMInvoker currently supports the following contents: 1. Text, which can be passed as plain strings. 2. Image, which can be passed as: 1. Base64 encoded image bytes. 2. URL pointing to an image. 3. Local image file path.

Initializes a new instance of the AnthropicMultimodalLMInvoker class.

Parameters:

Name	Type	Description	Default
`model_name`	`str`	The name of the Anthropic model.	required
`api_key`	`str`	The API key for authenticating with Anthropic.	required
`model_kwargs`	`dict[str, Any] \| None`	Additional model parameters. Defaults to None.	`None`
`default_hyperparameters`	`dict[str, Any] \| None`	Default hyperparameters for invoking the model. Defaults to None.	`None`

`AzureOpenAIMultimodalLMInvoker(api_key, azure_endpoint, azure_deployment, api_version, model_kwargs=None, default_hyperparameters=None)`

Bases: BaseMultimodalLMInvoker[str | bytes, str]

An invoker to interact with multimodal language models hosted through Azure OpenAI API endpoints.

The AzureOpenAIMultimodalLMInvoker class is designed to interact with multimodal language models hosted through Azure OpenAI API endpoints. It provides a framework for invoking multimodal language models with the provided prompt and hyperparameters. It supports both standard and streaming invocation. Streaming mode is enabled if an event emitter is provided.

Attributes:

Name	Type	Description
`client`	`AsyncAzureOpenAI`	The AsyncAzureOpenAI client instance.
`default_hyperparameters`	`dict[str, Any]`	Default hyperparameters for invoking the multimodal language model.

Notes

The AzureOpenAIMultimodalLMInvoker currently supports the following contents: 1. Text, which can be passed as plain strings. 2. Image, which can be passed as: 1. Base64 encoded image bytes. 2. URL pointing to an image. 3. Local image file path.

Initializes a new instance of the AzureOpenAIMultimodalLMInvoker class.

Parameters:

Name	Type	Description	Default
`api_key`	`str`	The API key for authenticating with Azure OpenAI.	required
`azure_endpoint`	`str`	The endpoint of the Azure OpenAI service.	required
`azure_deployment`	`str`	The deployment of the Azure OpenAI service.	required
`api_version`	`str`	The API version of the Azure OpenAI service.	required
`model_kwargs`	`dict[str, Any] \| None`	Additional model parameters. Defaults to None.	`None`
`default_hyperparameters`	`dict[str, Any] \| None`	Default hyperparameters for invoking the model. Defaults to None.	`None`

`GoogleGenerativeAIMultimodalLMInvoker(model_name, api_key, model_kwargs=None, default_hyperparameters=None, response_schema=None)`

Bases: BaseMultimodalLMInvoker[str | bytes, str]

An invoker to interact with multimodal language models hosted through Google's Generative AI API endpoints.

This class is deprecated as it has been replaced by the GoogleLMInvoker class.

Attributes:

Name	Type	Description
`model`	`GoogleLMInvoker`	The Google Gemini model instance.
`default_hyperparameters`	`dict[str, Any]`	Default hyperparameters for invoking the multimodal language model.

Initializes a new instance of the GoogleGenerativeAIMultimodalLMInvoker class.

Parameters:

Name	Type	Description	Default
`model_name`	`str`	The name of the Google Gemini model.	required
`api_key`	`str`	The API key for authenticating with Google Gemini.	required
`model_kwargs`	`dict[str, Any] \| None`	Additional model parameters. Defaults to None.	`None`
`default_hyperparameters`	`dict[str, Any] \| None`	Default hyperparameters for invoking the model. Defaults to None.	`None`
`response_schema`	`BaseModel \| None`	The response schema for the model. Defaults to None.	`None`

`GoogleVertexAIMultimodalLMInvoker(model_name, credentials_path, project_id=None, location='us-central1', model_kwargs=None, default_hyperparameters=None, response_schema=None)`

Bases: BaseMultimodalLMInvoker[str | bytes, str]

An invoker to interact with multimodal language models hosted through Google's Vertex AI API endpoints.

The GoogleVertexAIMultimodalLMInvoker class is designed to interact with multimodal language models hosted through Google's Vertex AI API endpoints. It provides a framework for invoking multimodal language models with the provided prompt and hyperparameters. It supports both standard and streaming invocation. Streaming mode is enabled if an event emitter is provided.

Attributes:

Name	Type	Description
`client`	`GenerativeModel`	The Google Vertex AI client instance.
`extra_kwargs`	`dict[str, Any]`	Additional keyword arguments for the `generate_content_async` method.
`default_hyperparameters`	`dict[str, Any]`	Default hyperparameters for invoking the multimodal language model.

Notes

The GoogleVertexAIMultimodalLMInvoker currently supports the following contents: 1. Text, which can be passed as plain strings. 2. Audio, which can be passed as: 1. Base64 encoded audio bytes. 2. URL pointing to an audio file. 3. Local audio file path. 3. Image, which can be passed as: 1. Base64 encoded image bytes. 2. URL pointing to an image. 3. Local image file path. 4. Video, which can be passed as: 1. Base64 encoded video bytes. 2. URL pointing to a video. 3. Local video file path. 5. Document, which can be passed as: 1. Base64 encoded document bytes. 2. URL pointing to a document. 3. Local document file path.

The GoogleVertexAIMultimodalLMInvoker also supports structured outputs through the response_schema argument. For more information, please refer to the following page: https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/control-generated-output.

Initializes a new instance of the GoogleVertexAIMultimodalLMInvoker class.

Parameters:

Name	Type	Description	Default
`model_name`	`str`	The name of the multimodal language model to be used.	required
`credentials_path`	`str`	The path to the Google Cloud service account credentials JSON file.	required
`project_id`	`str \| None`	The Google Cloud project ID. Defaults to None, in which case the project ID will be loaded from the credentials file.	`None`
`location`	`str`	The location of the Google Cloud project. Defaults to "us-central1".	`'us-central1'`
`model_kwargs`	`dict[str, Any] \| None`	Additional model parameters. Defaults to None.	`None`
`default_hyperparameters`	`dict[str, Any] \| None`	Default hyperparameters for invoking the model. Defaults to None.	`None`
`response_schema`	`dict[str, Any] \| None`	The response schema for the model. Defaults to None.	`None`

`OpenAIMultimodalLMInvoker(model_name, api_key, model_kwargs=None, default_hyperparameters=None)`

Bases: BaseMultimodalLMInvoker[str | bytes, str]

An invoker to interact with multimodal language models hosted through OpenAI API endpoints.

The OpenAIMultimodalLMInvoker class is designed to interact with multimodal language models hosted through OpenAI API endpoints. It provides a framework for invoking multimodal language models with the provided prompt and hyperparameters. It supports both standard and streaming invocation. Streaming mode is enabled if an event emitter is provided.

Attributes:

Name	Type	Description
`client`	`AsyncOpenAI`	The AsyncOpenAI client instance.
`model_name`	`str`	The name of the OpenAI model.
`default_hyperparameters`	`dict[str, Any]`	Default hyperparameters for invoking the multimodal language model.

Notes

The OpenAIMultimodalLMInvoker currently supports the following contents: 1. Text, which can be passed as plain strings. 2. Image, which can be passed as: 1. Base64 encoded image bytes. 2. URL pointing to an image. 3. Local image file path.

Initializes a new instance of the OpenAIMultimodalLMInvoker class.

Parameters:

Name	Type	Description	Default
`model_name`	`str`	The name of the OpenAI model.	required
`api_key`	`str`	The API key for authenticating with OpenAI.	required
`model_kwargs`	`dict[str, Any] \| None`	Additional model parameters. Defaults to None.	`None`
`default_hyperparameters`	`dict[str, Any] \| None`	Default hyperparameters for invoking the model. Defaults to None.	`None`