Skip to content

Multimodal Lm Invoker

Modules concerning the multimodal language model invokers used in Gen AI applications.

AnthropicMultimodalLMInvoker(model_name, api_key, model_kwargs=None, default_hyperparameters=None)

Bases: BaseMultimodalLMInvoker[str | bytes, str]

An invoker to interact with multimodal language models hosted through Anthropic API endpoints.

The AnthropicMultimodalLMInvoker class is designed to interact with multimodal language models hosted through Anthropic API endpoints. It provides a framework for invoking multimodal language models with the provided prompt and hyperparameters. It supports both standard and streaming invocation. Streaming mode is enabled if an event emitter is provided.

Attributes:

Name Type Description
client Anthropic

The Anthropic client instance.

model_name str

The name of the Anthropic model.

default_hyperparameters dict[str, Any]

Default hyperparameters for invoking the multimodal language model.

Notes

The AnthropicMultimodalLMInvoker currently supports the following contents: 1. Text, which can be passed as plain strings. 2. Image, which can be passed as: 1. Base64 encoded image bytes. 2. URL pointing to an image. 3. Local image file path.

Initializes a new instance of the AnthropicMultimodalLMInvoker class.

Parameters:

Name Type Description Default
model_name str

The name of the Anthropic model.

required
api_key str

The API key for authenticating with Anthropic.

required
model_kwargs dict[str, Any] | None

Additional model parameters. Defaults to None.

None
default_hyperparameters dict[str, Any] | None

Default hyperparameters for invoking the model. Defaults to None.

None

AzureOpenAIMultimodalLMInvoker(api_key, azure_endpoint, azure_deployment, api_version, model_kwargs=None, default_hyperparameters=None)

Bases: BaseMultimodalLMInvoker[str | bytes, str]

An invoker to interact with multimodal language models hosted through Azure OpenAI API endpoints.

The AzureOpenAIMultimodalLMInvoker class is designed to interact with multimodal language models hosted through Azure OpenAI API endpoints. It provides a framework for invoking multimodal language models with the provided prompt and hyperparameters. It supports both standard and streaming invocation. Streaming mode is enabled if an event emitter is provided.

Attributes:

Name Type Description
client AsyncAzureOpenAI

The AsyncAzureOpenAI client instance.

default_hyperparameters dict[str, Any]

Default hyperparameters for invoking the multimodal language model.

Notes

The AzureOpenAIMultimodalLMInvoker currently supports the following contents: 1. Text, which can be passed as plain strings. 2. Image, which can be passed as: 1. Base64 encoded image bytes. 2. URL pointing to an image. 3. Local image file path.

Initializes a new instance of the AzureOpenAIMultimodalLMInvoker class.

Parameters:

Name Type Description Default
api_key str

The API key for authenticating with Azure OpenAI.

required
azure_endpoint str

The endpoint of the Azure OpenAI service.

required
azure_deployment str

The deployment of the Azure OpenAI service.

required
api_version str

The API version of the Azure OpenAI service.

required
model_kwargs dict[str, Any] | None

Additional model parameters. Defaults to None.

None
default_hyperparameters dict[str, Any] | None

Default hyperparameters for invoking the model. Defaults to None.

None

GoogleGenerativeAIMultimodalLMInvoker(model_name, api_key, model_kwargs=None, default_hyperparameters=None, response_schema=None)

Bases: BaseMultimodalLMInvoker[str | bytes, str]

An invoker to interact with multimodal language models hosted through Google's Generative AI API endpoints.

This class is deprecated as it has been replaced by the GoogleLMInvoker class.

Attributes:

Name Type Description
model GoogleLMInvoker

The Google Gemini model instance.

default_hyperparameters dict[str, Any]

Default hyperparameters for invoking the multimodal language model.

Initializes a new instance of the GoogleGenerativeAIMultimodalLMInvoker class.

Parameters:

Name Type Description Default
model_name str

The name of the Google Gemini model.

required
api_key str

The API key for authenticating with Google Gemini.

required
model_kwargs dict[str, Any] | None

Additional model parameters. Defaults to None.

None
default_hyperparameters dict[str, Any] | None

Default hyperparameters for invoking the model. Defaults to None.

None
response_schema BaseModel | None

The response schema for the model. Defaults to None.

None

GoogleVertexAIMultimodalLMInvoker(model_name, credentials_path, project_id=None, location='us-central1', model_kwargs=None, default_hyperparameters=None, response_schema=None)

Bases: BaseMultimodalLMInvoker[str | bytes, str]

An invoker to interact with multimodal language models hosted through Google's Vertex AI API endpoints.

The GoogleVertexAIMultimodalLMInvoker class is designed to interact with multimodal language models hosted through Google's Vertex AI API endpoints. It provides a framework for invoking multimodal language models with the provided prompt and hyperparameters. It supports both standard and streaming invocation. Streaming mode is enabled if an event emitter is provided.

Attributes:

Name Type Description
client GenerativeModel

The Google Vertex AI client instance.

extra_kwargs dict[str, Any]

Additional keyword arguments for the generate_content_async method.

default_hyperparameters dict[str, Any]

Default hyperparameters for invoking the multimodal language model.

Notes

The GoogleVertexAIMultimodalLMInvoker currently supports the following contents: 1. Text, which can be passed as plain strings. 2. Audio, which can be passed as: 1. Base64 encoded audio bytes. 2. URL pointing to an audio file. 3. Local audio file path. 3. Image, which can be passed as: 1. Base64 encoded image bytes. 2. URL pointing to an image. 3. Local image file path. 4. Video, which can be passed as: 1. Base64 encoded video bytes. 2. URL pointing to a video. 3. Local video file path. 5. Document, which can be passed as: 1. Base64 encoded document bytes. 2. URL pointing to a document. 3. Local document file path.

The GoogleVertexAIMultimodalLMInvoker also supports structured outputs through the response_schema argument. For more information, please refer to the following page: https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/control-generated-output.

Initializes a new instance of the GoogleVertexAIMultimodalLMInvoker class.

Parameters:

Name Type Description Default
model_name str

The name of the multimodal language model to be used.

required
credentials_path str

The path to the Google Cloud service account credentials JSON file.

required
project_id str | None

The Google Cloud project ID. Defaults to None, in which case the project ID will be loaded from the credentials file.

None
location str

The location of the Google Cloud project. Defaults to "us-central1".

'us-central1'
model_kwargs dict[str, Any] | None

Additional model parameters. Defaults to None.

None
default_hyperparameters dict[str, Any] | None

Default hyperparameters for invoking the model. Defaults to None.

None
response_schema dict[str, Any] | None

The response schema for the model. Defaults to None.

None

OpenAIMultimodalLMInvoker(model_name, api_key, model_kwargs=None, default_hyperparameters=None)

Bases: BaseMultimodalLMInvoker[str | bytes, str]

An invoker to interact with multimodal language models hosted through OpenAI API endpoints.

The OpenAIMultimodalLMInvoker class is designed to interact with multimodal language models hosted through OpenAI API endpoints. It provides a framework for invoking multimodal language models with the provided prompt and hyperparameters. It supports both standard and streaming invocation. Streaming mode is enabled if an event emitter is provided.

Attributes:

Name Type Description
client AsyncOpenAI

The AsyncOpenAI client instance.

model_name str

The name of the OpenAI model.

default_hyperparameters dict[str, Any]

Default hyperparameters for invoking the multimodal language model.

Notes

The OpenAIMultimodalLMInvoker currently supports the following contents: 1. Text, which can be passed as plain strings. 2. Image, which can be passed as: 1. Base64 encoded image bytes. 2. URL pointing to an image. 3. Local image file path.

Initializes a new instance of the OpenAIMultimodalLMInvoker class.

Parameters:

Name Type Description Default
model_name str

The name of the OpenAI model.

required
api_key str

The API key for authenticating with OpenAI.

required
model_kwargs dict[str, Any] | None

Additional model parameters. Defaults to None.

None
default_hyperparameters dict[str, Any] | None

Default hyperparameters for invoking the model. Defaults to None.

None