Multimodal Lm Invoker
Modules concerning the multimodal language model invokers used in Gen AI applications.
AnthropicMultimodalLMInvoker(model_name, api_key, model_kwargs=None, default_hyperparameters=None)
Bases: BaseMultimodalLMInvoker[str | bytes, str]
An invoker to interact with multimodal language models hosted through Anthropic API endpoints.
The AnthropicMultimodalLMInvoker
class is designed to interact with multimodal language models hosted through
Anthropic API endpoints. It provides a framework for invoking multimodal language models with the provided prompt
and hyperparameters. It supports both standard and streaming invocation. Streaming mode is enabled if an event
emitter is provided.
Attributes:
Name | Type | Description |
---|---|---|
client |
Anthropic
|
The Anthropic client instance. |
model_name |
str
|
The name of the Anthropic model. |
default_hyperparameters |
dict[str, Any]
|
Default hyperparameters for invoking the multimodal language model. |
Notes
The AnthropicMultimodalLMInvoker
currently supports the following contents:
1. Text, which can be passed as plain strings.
2. Image, which can be passed as:
1. Base64 encoded image bytes.
2. URL pointing to an image.
3. Local image file path.
Initializes a new instance of the AnthropicMultimodalLMInvoker class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_name |
str
|
The name of the Anthropic model. |
required |
api_key |
str
|
The API key for authenticating with Anthropic. |
required |
model_kwargs |
dict[str, Any] | None
|
Additional model parameters. Defaults to None. |
None
|
default_hyperparameters |
dict[str, Any] | None
|
Default hyperparameters for invoking the model. Defaults to None. |
None
|
AzureOpenAIMultimodalLMInvoker(api_key, azure_endpoint, azure_deployment, api_version, model_kwargs=None, default_hyperparameters=None)
Bases: BaseMultimodalLMInvoker[str | bytes, str]
An invoker to interact with multimodal language models hosted through Azure OpenAI API endpoints.
The AzureOpenAIMultimodalLMInvoker
class is designed to interact with multimodal language models hosted through
Azure OpenAI API endpoints. It provides a framework for invoking multimodal language models with the provided prompt
and hyperparameters. It supports both standard and streaming invocation.
Streaming mode is enabled if an event emitter is provided.
Attributes:
Name | Type | Description |
---|---|---|
client |
AsyncAzureOpenAI
|
The AsyncAzureOpenAI client instance. |
default_hyperparameters |
dict[str, Any]
|
Default hyperparameters for invoking the multimodal language model. |
Notes
The AzureOpenAIMultimodalLMInvoker
currently supports the following contents:
1. Text, which can be passed as plain strings.
2. Image, which can be passed as:
1. Base64 encoded image bytes.
2. URL pointing to an image.
3. Local image file path.
Initializes a new instance of the AzureOpenAIMultimodalLMInvoker class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
api_key |
str
|
The API key for authenticating with Azure OpenAI. |
required |
azure_endpoint |
str
|
The endpoint of the Azure OpenAI service. |
required |
azure_deployment |
str
|
The deployment of the Azure OpenAI service. |
required |
api_version |
str
|
The API version of the Azure OpenAI service. |
required |
model_kwargs |
dict[str, Any] | None
|
Additional model parameters. Defaults to None. |
None
|
default_hyperparameters |
dict[str, Any] | None
|
Default hyperparameters for invoking the model. Defaults to None. |
None
|
GoogleGenerativeAIMultimodalLMInvoker(model_name, api_key, model_kwargs=None, default_hyperparameters=None, response_schema=None)
Bases: BaseMultimodalLMInvoker[str | bytes, str]
An invoker to interact with multimodal language models hosted through Google's Generative AI API endpoints.
This class is deprecated as it has been replaced by the GoogleLMInvoker
class.
Attributes:
Name | Type | Description |
---|---|---|
model |
GoogleLMInvoker
|
The Google Gemini model instance. |
default_hyperparameters |
dict[str, Any]
|
Default hyperparameters for invoking the multimodal language model. |
Initializes a new instance of the GoogleGenerativeAIMultimodalLMInvoker class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_name |
str
|
The name of the Google Gemini model. |
required |
api_key |
str
|
The API key for authenticating with Google Gemini. |
required |
model_kwargs |
dict[str, Any] | None
|
Additional model parameters. Defaults to None. |
None
|
default_hyperparameters |
dict[str, Any] | None
|
Default hyperparameters for invoking the model. Defaults to None. |
None
|
response_schema |
BaseModel | None
|
The response schema for the model. Defaults to None. |
None
|
GoogleVertexAIMultimodalLMInvoker(model_name, credentials_path, project_id=None, location='us-central1', model_kwargs=None, default_hyperparameters=None, response_schema=None)
Bases: BaseMultimodalLMInvoker[str | bytes, str]
An invoker to interact with multimodal language models hosted through Google's Vertex AI API endpoints.
The GoogleVertexAIMultimodalLMInvoker
class is designed to interact with multimodal language models hosted
through Google's Vertex AI API endpoints. It provides a framework for invoking multimodal language models with
the provided prompt and hyperparameters. It supports both standard and streaming invocation. Streaming mode is
enabled if an event emitter is provided.
Attributes:
Name | Type | Description |
---|---|---|
client |
GenerativeModel
|
The Google Vertex AI client instance. |
extra_kwargs |
dict[str, Any]
|
Additional keyword arguments for the |
default_hyperparameters |
dict[str, Any]
|
Default hyperparameters for invoking the multimodal language model. |
Notes
The GoogleVertexAIMultimodalLMInvoker
currently supports the following contents:
1. Text, which can be passed as plain strings.
2. Audio, which can be passed as:
1. Base64 encoded audio bytes.
2. URL pointing to an audio file.
3. Local audio file path.
3. Image, which can be passed as:
1. Base64 encoded image bytes.
2. URL pointing to an image.
3. Local image file path.
4. Video, which can be passed as:
1. Base64 encoded video bytes.
2. URL pointing to a video.
3. Local video file path.
5. Document, which can be passed as:
1. Base64 encoded document bytes.
2. URL pointing to a document.
3. Local document file path.
The GoogleVertexAIMultimodalLMInvoker
also supports structured outputs through the response_schema
argument. For more information, please refer to the following page:
https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/control-generated-output.
Initializes a new instance of the GoogleVertexAIMultimodalLMInvoker class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_name |
str
|
The name of the multimodal language model to be used. |
required |
credentials_path |
str
|
The path to the Google Cloud service account credentials JSON file. |
required |
project_id |
str | None
|
The Google Cloud project ID. Defaults to None, in which case the project ID will be loaded from the credentials file. |
None
|
location |
str
|
The location of the Google Cloud project. Defaults to "us-central1". |
'us-central1'
|
model_kwargs |
dict[str, Any] | None
|
Additional model parameters. Defaults to None. |
None
|
default_hyperparameters |
dict[str, Any] | None
|
Default hyperparameters for invoking the model. Defaults to None. |
None
|
response_schema |
dict[str, Any] | None
|
The response schema for the model. Defaults to None. |
None
|
OpenAIMultimodalLMInvoker(model_name, api_key, model_kwargs=None, default_hyperparameters=None)
Bases: BaseMultimodalLMInvoker[str | bytes, str]
An invoker to interact with multimodal language models hosted through OpenAI API endpoints.
The OpenAIMultimodalLMInvoker
class is designed to interact with multimodal language models hosted through
OpenAI API endpoints. It provides a framework for invoking multimodal language models with the provided prompt and
hyperparameters. It supports both standard and streaming invocation. Streaming mode is enabled if an event emitter
is provided.
Attributes:
Name | Type | Description |
---|---|---|
client |
AsyncOpenAI
|
The AsyncOpenAI client instance. |
model_name |
str
|
The name of the OpenAI model. |
default_hyperparameters |
dict[str, Any]
|
Default hyperparameters for invoking the multimodal language model. |
Notes
The OpenAIMultimodalLMInvoker
currently supports the following contents:
1. Text, which can be passed as plain strings.
2. Image, which can be passed as:
1. Base64 encoded image bytes.
2. URL pointing to an image.
3. Local image file path.
Initializes a new instance of the OpenAIMultimodalLMInvoker class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_name |
str
|
The name of the OpenAI model. |
required |
api_key |
str
|
The API key for authenticating with OpenAI. |
required |
model_kwargs |
dict[str, Any] | None
|
Additional model parameters. Defaults to None. |
None
|
default_hyperparameters |
dict[str, Any] | None
|
Default hyperparameters for invoking the model. Defaults to None. |
None
|