Em Invoker

Modules concerning the embedding model invokers used in Gen AI applications.

`AzureOpenAIEMInvoker(azure_endpoint, azure_deployment, api_key=None, api_version=None, model_kwargs=None, default_hyperparameters=None, retry_config=None, truncation_config=None, vector_fuser=None)`

Bases: OpenAIEMInvoker

An embedding model invoker to interact with Azure OpenAI embedding models.

Attributes:

Name	Type	Description
`model_id`	`str`	The model ID of the embedding model.
`model_provider`	`str`	The provider of the embedding model.
`model_name`	`str`	The name of the Azure OpenAI embedding model deployment.
`client_kwargs`	`dict[str, Any]`	The keyword arguments for the Azure OpenAI client.
`default_hyperparameters`	`dict[str, Any]`	Default hyperparameters for invoking the embedding model.
`retry_config`	`RetryConfig`	The retry configuration for the embedding model.
`truncation_config`	`TruncationConfig \| None`	The truncation configuration for the embedding model.
`vector_fuser`	`BaseVectorFuser \| None`	The vector fuser to handle mixed content.

Input types

The AzureOpenAIEMInvoker only supports text inputs.

Output format

The AzureOpenAIEMInvoker can embed either: 1. A single content. 1. A single content is a single text. 2. The output will be a Vector, representing the embedding of the content.

# Example 1: Embedding a text content. python text = "This is a text" result = await em_invoker.invoke(text)

The above examples will return a Vector with a size of (embedding_size,).

A list of contents.
A list of contents is a list of texts.
The output will be a list[Vector], where each element is a Vector representing the embedding of each single content.

# Example: Embedding a list of contents. python text1 = "This is a text" text2 = "This is another text" text3 = "This is yet another text" result = await em_invoker.invoke([text1, text2, text3])

The above examples will return a list[Vector] with a size of (3, embedding_size).

Vector fusion

The AzureOpenAIEMInvoker supports vector fusion, which allows fusing multiple results into a single vector. This feature allows the module to embed mixed modality contents, represented as tuples of contents. This feature can be enabled by providing a vector fuser to the vector_fuser parameter.

Usage example:

em_invoker = AzureOpenAIEMInvoker(..., vector_fuser=SumVectorFuser())  # Using a vector fuser class
em_invoker = AzureOpenAIEMInvoker(..., vector_fuser="sum")  # Using a vector fuser type

text = "What animal is in this image?"
image = Attachment.from_path("path/to/local/image.png")
mix_content = (text, image)
result = await em_invoker.invoke([text, image, mix_content])

The above example will return a list[Vector] with a size of (3, embedding_size).

Retry and timeout

The AzureOpenAIEMInvoker supports retry and timeout configuration. By default, the max retries is set to 0 and the timeout is set to 30.0 seconds. They can be customized by providing a custom RetryConfig object to the retry_config parameter.

Retry config examples:

retry_config = RetryConfig(max_retries=0, timeout=None)  # No retry, no timeout
retry_config = RetryConfig(max_retries=5, timeout=10.0)  # 5 max retries, 10.0 seconds timeout

Usage example:

em_invoker = AzureOpenAIEMInvoker(..., retry_config=retry_config)

Initializes a new instance of the AzureOpenAIEMInvoker class.

Parameters:

Name	Type	Description	Default
`azure_endpoint`	`str`	The endpoint of the Azure OpenAI service.	required
`azure_deployment`	`str`	The deployment name of the Azure OpenAI service.	required
`api_key`	`str \| None`	The API key for authenticating with Azure OpenAI. Defaults to None, in which case the `AZURE_OPENAI_API_KEY` environment variable will be used.	`None`
`api_version`	`str \| None`	Deprecated parameter to be removed in v0.6. Defaults to None.	`None`
`model_kwargs`	`dict[str, Any] \| None`	Additional model parameters. Defaults to None.	`None`
`default_hyperparameters`	`dict[str, Any] \| None`	Default hyperparameters for invoking the model. Defaults to None.	`None`
`retry_config`	`RetryConfig \| None`	The retry configuration for the embedding model. Defaults to None, in which case a default config with no retry and 30.0 seconds timeout will be used.	`None`
`truncation_config`	`TruncationConfig \| None`	Configuration for text truncation behavior. Defaults to None, in which case no truncation is applied.	`None`
`vector_fuser`	`BaseVectorFuser \| VectorFuserType \| None`	The vector fuser to handle mixed content. Defaults to None, in which case handling the mixed modality content depends on the EM's capabilities.	`None`

`BedrockEMInvoker(model_name, access_key_id=None, secret_access_key=None, region_name='us-east-1', model_kwargs=None, default_hyperparameters=None, retry_config=None, truncation_config=None, vector_fuser=None)`

Bases: BaseEMInvoker

An embedding model invoker to interact with AWS Bedrock embedding models.

Attributes:

Name	Type	Description
`model_id`	`str`	The model ID of the embedding model.
`model_provider`	`str`	The provider of the embedding model.
`model_name`	`str`	The name of the embedding model.
`session`	`Session`	The Bedrock client session.
`client_kwargs`	`dict[str, Any]`	The Bedrock client kwargs.
`default_hyperparameters`	`dict[str, Any]`	Default hyperparameters for invoking the embedding model.
`retry_config`	`RetryConfig`	The retry configuration for the embedding model.
`truncation_config`	`TruncationConfig \| None`	The truncation configuration for the embedding model.
`vector_fuser`	`BaseVectorFuser \| None`	The vector fuser to handle mixed content.

Input types

The BedrockEMInvoker supports: 1. Text inputs for Cohere, Titan, and Marengo models 2. Image inputs for Marengo models through Attachment objects

Output format

The BedrockEMInvoker can embed either: 1. A single content. 1. A single content is a single text or single image (image only supported for Marengo). 2. The output will be a Vector, representing the embedding of the content.

# Example 1: Embedding a text content. python text = "This is a text" result = await em_invoker.invoke(text)

# Example 2: Embedding an image with Marengo. python em_invoker = BedrockEMInvoker( model_name="us.twelvelabs.marengo-2.7" ) image = Attachment.from_path("path/to/local/image.png") result = await em_invoker.invoke(image)

The above examples will return a Vector with a size of (embedding_size,).

A list of contents.
A list of contents is a list of texts or images (images only supported for Marengo).
The output will be a list[Vector], where each element is a Vector representing the embedding of each single content.

# Example: Embedding a list of contents. python text1 = "This is a text" text2 = "This is another text" text3 = "This is yet another text" result = await em_invoker.invoke([text1, text2, text3])

The above examples will return a list[Vector] with a size of (3, embedding_size).

Vector fusion

The BedrockEMInvoker supports vector fusion, which allows fusing multiple results into a single vector. This feature allows the module to embed mixed modality contents, represented as tuples of contents. This feature can be enabled by providing a vector fuser to the vector_fuser parameter.

Usage example:

em_invoker = BedrockEMInvoker(..., vector_fuser=SumVectorFuser())  # Using a vector fuser class
em_invoker = BedrockEMInvoker(..., vector_fuser="sum")  # Using a vector fuser type

text = "What animal is in this image?"
image = Attachment.from_path("path/to/local/image.png")
mix_content = (text, image)
result = await em_invoker.invoke([text, image, mix_content])

The above example will return a list[Vector] with a size of (3, embedding_size).

Retry and timeout

The BedrockEMInvoker supports retry and timeout configuration. By default, the max retries is set to 0 and the timeout is set to 30.0 seconds. They can be customized by providing a custom RetryConfig object to the retry_config parameter.

Retry config examples:

retry_config = RetryConfig(max_retries=0, timeout=None)  # No retry, no timeout
retry_config = RetryConfig(max_retries=5, timeout=10.0)  # 5 max retries, 10.0 seconds timeout

Usage example:

em_invoker = BedrockEMInvoker(..., retry_config=retry_config)

Initializes a new instance of the BedrockEMInvoker class.

Parameters:

Name	Type	Description	Default
`model_name`	`str`	The name of the Bedrock embedding model to be used.	required
`access_key_id`	`str \| None`	The AWS access key ID. Defaults to None, in which case the `AWS_ACCESS_KEY_ID` environment variable will be used.	`None`
`secret_access_key`	`str \| None`	The AWS secret access key. Defaults to None, in which case the `AWS_SECRET_ACCESS_KEY` environment variable will be used.	`None`
`region_name`	`str`	The AWS region name. Defaults to "us-east-1".	`'us-east-1'`
`model_kwargs`	`dict[str, Any] \| None`	Additional keyword arguments for the Bedrock client. Defaults to None.	`None`
`default_hyperparameters`	`dict[str, Any] \| None`	Default hyperparameters for invoking the model. Defaults to None.	`None`
`retry_config`	`RetryConfig \| None`	The retry configuration for the embedding model. Defaults to None, in which case a default config with no retry and 30.0 seconds timeout will be used.	`None`
`truncation_config`	`TruncationConfig \| None`	Configuration for text truncation behavior. Defaults to None, in which case no truncation is applied.	`None`
`vector_fuser`	`BaseVectorFuser \| VectorFuserType \| None`	The vector fuser to handle mixed content. Defaults to None, in which case handling the mixed modality content depends on the EM's capabilities.	`None`

Raises:

Type	Description
`ValueError`	If the model name is not supported.
`ValueError`	If `access_key_id` or `secret_access_key` is neither provided nor set in the `AWS_ACCESS_KEY_ID` or `AWS_SECRET_ACCESS_KEY` environment variables, respectively.

`CohereEMInvoker(model_name, api_key=None, base_url=None, model_kwargs=None, default_hyperparameters=None, retry_config=None, truncation_config=None, vector_fuser=None, input_type=CohereInputType.SEARCH_DOCUMENT)`

Bases: BaseEMInvoker

An embedding model invoker to interact with Cohere embedding models.

Attributes:

Name	Type	Description
`model_id`	`str`	The model ID of the embedding model.
`model_provider`	`str`	The provider of the embedding model (Cohere).
`model_name`	`str`	The name of the Cohere embedding model.
`client`	`AsyncClient`	The asynchronous client for the Cohere API.
`default_hyperparameters`	`dict[str, Any]`	Default hyperparameters for invoking the embedding model.
`retry_config`	`RetryConfig`	The retry configuration for the embedding model.
`truncation_config`	`TruncationConfig \| None`	The truncation configuration for the embedding model.
`vector_fuser`	`BaseVectorFuser \| None`	The vector fuser to handle mixed content.
`input_type`	`CohereInputType`	The input type for the embedding model. Supported values include: 1. `CohereInputType.SEARCH_DOCUMENT`, 2. `CohereInputType.SEARCH_QUERY`, 3. `CohereInputType.CLASSIFICATION`, 4. `CohereInputType.CLUSTERING`, 5. `CohereInputType.IMAGE`.

Initialization

You can initialize the CohereEMInvoker as follows:

em_invoker = CohereEMInvoker(
    model_name="embed-english-v4.0",
    input_type="search_document"
)

Note: The input_type parameter can be one of the following: 1. "search_document" 2. "search_query" 3. "classification" 4. "clustering" 5. "image"

This parameter is optional and defaults to "search_document". For more information about input_type, please refer to https://docs.cohere.com/docs/embeddings#the-input_type-parameter.

Input types

The CohereEMInvoker supports the following input types: text and image. Non-text inputs must be passed as an Attachment object.

Output format

The CohereEMInvoker can embed either: 1. A single content. 1. A single content is either a text or an image. 2. The output will be a Vector, representing the embedding of the content.

# Example 1: Embedding a text content. python text = "What animal is in this image?" result = await em_invoker.invoke(text)

# Example 2: Embedding an image content. python image = Attachment.from_path("path/to/local/image.png") result = await em_invoker.invoke(image)

The above examples will return a Vector with a size of (embedding_size,).

A list of contents.
A list of contents is a list that consists of any of the above single contents.
The output will be a list[Vector], where each element is a Vector representing the embedding of each single content.

# Example: Embedding a list of contents. python text = "What animal is in this image?" image = Attachment.from_path("path/to/local/image.png") result = await em_invoker.invoke([text, image])

The above examples will return a list[Vector] with a size of (2, embedding_size).

Vector fusion

The CohereEMInvoker supports vector fusion, which allows fusing multiple results into a single vector. This feature allows the module to embed mixed modality contents, represented as tuples of contents. This feature can be enabled by providing a vector fuser to the vector_fuser parameter.

Usage example:

em_invoker = CohereEMInvoker(..., vector_fuser=SumVectorFuser())  # Using a vector fuser class
em_invoker = CohereEMInvoker(..., vector_fuser="sum")  # Using a vector fuser type

text = "What animal is in this image?"
image = Attachment.from_path("path/to/local/image.png")
mix_content = (text, image)
result = await em_invoker.invoke([text, image, mix_content])

The above example will return a list[Vector] with a size of (3, embedding_size).

Retry and timeout

The CohereEMInvoker supports retry and timeout configuration. By default, the max retries is set to 0 and the timeout is set to 30.0 seconds. They can be customized by providing a custom RetryConfig object to the retry_config parameter.

Retry config examples:

retry_config = RetryConfig(max_retries=0, timeout=None)  # No retry, no timeout
retry_config = RetryConfig(max_retries=5, timeout=10.0)  # 5 max retries, 10.0 seconds timeout

Usage example:

em_invoker = CohereEMInvoker(..., retry_config=retry_config)

Initializes a new instance of the CohereEMInvoker class.

Parameters:

Name	Type	Description	Default
`model_name`	`str`	The name of the Cohere embedding model to be used.	required
`api_key`	`str \| None`	The API key for authenticating with Cohere. Defaults to None, in which case the `COHERE_API_KEY` environment variable will be used.	`None`
`base_url`	`str \| None`	The base URL for a custom Cohere-compatible endpoint. Defaults to None, in which case Cohere's default URL will be used.	`None`
`model_kwargs`	`dict[str, Any] \| None`	Additional keyword arguments for the Cohere client. Defaults to None.	`None`
`default_hyperparameters`	`dict[str, Any] \| None`	Default hyperparameters for invoking the model. Defaults to None.	`None`
`retry_config`	`RetryConfig \| None`	The retry configuration for the embedding model. Defaults to None, in which case a default config with no retry and 30.0 seconds timeout will be used.	`None`
`truncation_config`	`TruncationConfig \| None`	Configuration for text truncation behavior. Defaults to None, in which case no truncation is applied.	`None`
`vector_fuser`	`BaseVectorFuser \| VectorFuserType \| None`	The vector fuser to handle mixed content. Defaults to None, in which case handling the mixed modality content depends on the EM's capabilities.	`None`
`input_type`	`CohereInputType`	The input type for the embedding model. Defaults to `CohereInputType.SEARCH_DOCUMENT`. Valid values are: "search_document", "search_query", "classification", "clustering", and "image".	`SEARCH_DOCUMENT`

`GoogleEMInvoker(model_name, api_key=None, credentials_path=None, project_id=None, location='us-central1', model_kwargs=None, default_hyperparameters=None, retry_config=None, truncation_config=None, vector_fuser=None)`

Bases: BaseEMInvoker

An embedding model invoker to interact with Google embedding models.

Attributes:

Name	Type	Description
`model_id`	`str`	The model ID of the embedding model.
`model_provider`	`str`	The provider of the embedding model.
`model_name`	`str`	The name of the embedding model.
`client_params`	`dict[str, Any]`	The Google client instance init parameters.
`default_hyperparameters`	`dict[str, Any]`	Default hyperparameters for invoking the embedding model.
`retry_config`	`RetryConfig`	The retry configuration for the embedding model.
`truncation_config`	`TruncationConfig \| None`	The truncation configuration for the embedding model.
`vector_fuser`	`BaseVectorFuser \| None`	The vector fuser to handle mixed content.

Initialization

The GoogleEMInvoker can use either Google Gen AI or Google Vertex AI.

Google Gen AI is recommended for quick prototyping and development. It requires a Gemini API key for authentication.

Usage example:

em_invoker = GoogleEMInvoker(
    model_name="text-embedding-004",
    api_key="your_api_key"
)

Google Vertex AI is recommended to build production-ready applications. It requires a service account JSON file for authentication.

Usage example:

em_invoker = GoogleEMInvoker(
    model_name="text-embedding-004",
    credentials_path="path/to/service_account.json"
)

If neither api_key nor credentials_path is provided, Google Gen AI will be used by default. The GOOGLE_API_KEY environment variable will be used for authentication.

Input types

The GoogleEMInvoker only supports text inputs.

Output format

The GoogleEMInvoker can embed either: 1. A single content. 1. A single content is a single text. 2. The output will be a Vector, representing the embedding of the content.

# Example 1: Embedding a text content. python text = "This is a text" result = await em_invoker.invoke(text)

The above examples will return a Vector with a size of (embedding_size,).

A list of contents.
A list of contents is a list of texts.
The output will be a list[Vector], where each element is a Vector representing the embedding of each single content.

# Example: Embedding a list of contents. python text1 = "This is a text" text2 = "This is another text" text3 = "This is yet another text" result = await em_invoker.invoke([text1, text2, text3])

The above examples will return a list[Vector] with a size of (3, embedding_size).

Vector fusion

The GoogleEMInvoker supports vector fusion, which allows fusing multiple results into a single vector. This feature allows the module to embed mixed modality contents, represented as tuples of contents. This feature can be enabled by providing a vector fuser to the vector_fuser parameter.

Usage example:

em_invoker = GoogleEMInvoker(..., vector_fuser=SumVectorFuser())  # Using a vector fuser class
em_invoker = GoogleEMInvoker(..., vector_fuser="sum")  # Using a vector fuser type

text = "What animal is in this image?"
image = Attachment.from_path("path/to/local/image.png")
mix_content = (text, image)
result = await em_invoker.invoke([text, image, mix_content])

The above example will return a list[Vector] with a size of (3, embedding_size).

Retry and timeout

The GoogleEMInvoker supports retry and timeout configuration. By default, the max retries is set to 0 and the timeout is set to 30.0 seconds. They can be customized by providing a custom RetryConfig object to the retry_config parameter.

Retry config examples:

retry_config = RetryConfig(max_retries=0, timeout=None)  # No retry, no timeout
retry_config = RetryConfig(max_retries=5, timeout=10.0)  # 5 max retries, 10.0 seconds timeout

Usage example:

em_invoker = GoogleEMInvoker(..., retry_config=retry_config)

Initializes a new instance of the GoogleEMInvoker class.

Parameters:

Name	Type	Description	Default
`model_name`	`str`	The name of the model to use.	required
`api_key`	`str \| None`	Required for Google Gen AI authentication. Cannot be used together with `credentials_path`. Defaults to None.	`None`
`credentials_path`	`str \| None`	Required for Google Vertex AI authentication. Path to the service account credentials JSON file. Cannot be used together with `api_key`. Defaults to None.	`None`
`project_id`	`str \| None`	The Google Cloud project ID for Vertex AI. Only used when authenticating with `credentials_path`. Defaults to None, in which case it will be loaded from the credentials file.	`None`
`location`	`str`	The location of the Google Cloud project for Vertex AI. Only used when authenticating with `credentials_path`. Defaults to "us-central1".	`'us-central1'`
`model_kwargs`	`dict[str, Any] \| None`	Additional keyword arguments for the Google client. Defaults to None.	`None`
`default_hyperparameters`	`dict[str, Any] \| None`	Default hyperparameters for invoking the model. Defaults to None.	`None`
`retry_config`	`RetryConfig \| None`	The retry configuration for the embedding model. Defaults to None, in which case a default config with no retry and 30.0 seconds timeout will be used.	`None`
`vector_fuser`	`BaseVectorFuser \| VectorFuserType \| None`	The vector fuser to handle mixed content. Defaults to None, in which case handling the mixed modality content depends on the EM's capabilities.	`None`
`truncation_config`	`TruncationConfig \| None`	Configuration for text truncation behavior. Defaults to None, in which case no truncation is applied.	`None`

Note

If neither api_key nor credentials_path is provided, Google Gen AI will be used by default. The GOOGLE_API_KEY environment variable will be used for authentication.

`JinaEMInvoker(model_name, api_key=None, base_url=JINA_DEFAULT_URL, model_kwargs=None, default_hyperparameters=None, retry_config=None, truncation_config=None, vector_fuser=None)`

Bases: BaseEMInvoker

An embedding model invoker to interact with Jina AI embedding models.

Attributes:

Name	Type	Description
`model_id`	`str`	The model ID of the embedding model.
`model_provider`	`str`	The provider of the embedding model.
`model_name`	`str`	The name of the embedding model.
`client`	`AsyncClient`	The client for the Jina AI API.
`default_hyperparameters`	`dict[str, Any]`	Default hyperparameters for invoking the embedding model.
`retry_config`	`RetryConfig`	The retry configuration for the embedding model.
`truncation_config`	`TruncationConfig \| None`	The truncation configuration for the embedding model.
`vector_fuser`	`BaseVectorFuser \| None`	The vector fuser to handle mixed content.

Input types

The JinaEMInvoker supports the following input types: text and image. Non-text inputs must be passed as a Attachment object.

Output format

The JinaEMInvoker can embed either: 1. A single content. 1. A single content is either a text or an image. 2. The output will be a Vector, representing the embedding of the content.

# Example 1: Embedding a text content. python text = "What animal is in this image?" result = await em_invoker.invoke(text)

# Example 2: Embedding an image content. python image = Attachment.from_path("path/to/local/image.png") result = await em_invoker.invoke(image)

The above examples will return a Vector with a size of (embedding_size,).

A list of contents.
A list of contents is a list that consists of any of the above single contents.
The output will be a list[Vector], where each element is a Vector representing the embedding of each single content.

# Example: Embedding a list of contents. python text = "What animal is in this image?" image = Attachment.from_path("path/to/local/image.png") result = await em_invoker.invoke([text, image])

The above examples will return a list[Vector] with a size of (2, embedding_size).

Vector fusion

The JinaEMInvoker supports vector fusion, which allows fusing multiple results into a single vector. This feature allows the module to embed mixed modality contents, represented as tuples of contents. This feature can be enabled by providing a vector fuser to the vector_fuser parameter.

Usage example:

em_invoker = JinaEMInvoker(..., vector_fuser=SumVectorFuser())  # Using a vector fuser class
em_invoker = JinaEMInvoker(..., vector_fuser="sum")  # Using a vector fuser type

text = "What animal is in this image?"
image = Attachment.from_path("path/to/local/image.png")
mix_content = (text, image)
result = await em_invoker.invoke([text, image, mix_content])

The above example will return a list[Vector] with a size of (3, embedding_size).

Retry and timeout

The JinaEMInvoker supports retry and timeout configuration. By default, the max retries is set to 0 and the timeout is set to 30.0 seconds. They can be customized by providing a custom RetryConfig object to the retry_config parameter.

Retry config examples:

retry_config = RetryConfig(max_retries=0, timeout=None)  # No retry, no timeout
retry_config = RetryConfig(max_retries=5, timeout=10.0)  # 5 max retries, 10.0 seconds timeout

Usage example:

em_invoker = JinaEMInvoker(..., retry_config=retry_config)

Initializes a new instance of the JinaEMInvoker class.

Parameters:

Name	Type	Description	Default
`model_name`	`str`	The name of the Jina embedding model to be used.	required
`api_key`	`str \| None`	The API key for authenticating with Jina AI. Defaults to None, in which case the `JINA_API_KEY` environment variable will be used.	`None`
`base_url`	`str`	The base URL for the Jina AI API. Defaults to "https://api.jina.ai/v1".	`JINA_DEFAULT_URL`
`model_kwargs`	`dict[str, Any] \| None`	Additional keyword arguments for the HTTP client. Defaults to None.	`None`
`default_hyperparameters`	`dict[str, Any] \| None`	Default hyperparameters for invoking the model. Defaults to None.	`None`
`retry_config`	`RetryConfig \| None`	The retry configuration for the embedding model. Defaults to None, in which case a default config with no retry and 30.0 seconds timeout will be used.	`None`
`truncation_config`	`TruncationConfig \| None`	Configuration for text truncation behavior. Defaults to None, in which case no truncation is applied.	`None`
`vector_fuser`	`BaseVectorFuser \| VectorFuserType \| None`	The vector fuser to handle mixed content. Defaults to None, in which case handling the mixed modality content depends on the EM's capabilities.	`None`

Raises:

Type	Description
`ValueError`	If neither `api_key` nor `JINA_API_KEY` environment variable is provided.

`LangChainEMInvoker(model=None, model_class_path=None, model_name=None, model_kwargs=None, default_hyperparameters=None, retry_config=None, truncation_config=None, vector_fuser=None)`

Bases: BaseEMInvoker

A language model invoker to interact with LangChain's Embeddings.

Attributes:

Name	Type	Description
`model_id`	`str`	The model ID of the embedding model.
`model_provider`	`str`	The provider of the embedding model.
`model_name`	`str`	The name of the embedding model.
`em`	`Embeddings`	The instance to interact with an embedding model defined using LangChain's Embeddings.
`retry_config`	`RetryConfig`	The retry configuration for the embedding model.
`truncation_config`	`TruncationConfig \| None`	The truncation configuration for the embedding model.
`vector_fuser`	`BaseVectorFuser \| None`	The vector fuser to handle mixed content.

Initializes a new instance of the LangChainEMInvoker class.

Parameters:

Name	Type	Description	Default
`model`	`Embeddings \| None`	The LangChain's Embeddings instance. If provided, will take precedence over the `model_class_path` parameter. Defaults to None.	`None`
`model_class_path`	`str \| None`	The LangChain's Embeddings class path. Must be formatted as "." (e.g. "langchain_openai.OpenAIEmbeddings"). Ignored if `model` is provided. Defaults to None.	`None`
`model_name`	`str \| None`	The model name. Only used if `model_class_path` is provided. Defaults to None.	`None`
`model_kwargs`	`dict[str, Any] \| None`	The additional keyword arguments. Only used if `model_class_path` is provided. Defaults to None.	`None`
`default_hyperparameters`	`dict[str, Any] \| None`	Default hyperparameters for invoking the model. Defaults to None.	`None`
`retry_config`	`RetryConfig \| None`	The retry configuration for the embedding model. Defaults to None, in which case a default config with no retry and 30.0 seconds timeout will be used.	`None`
`truncation_config`	`TruncationConfig \| None`	Configuration for text truncation behavior. Defaults to None, in which case no truncation is applied.	`None`
`vector_fuser`	`BaseVectorFuser \| VectorFuserType \| None`	The vector fuser to handle mixed content. Defaults to None, in which case handling the mixed modality content depends on the EM's capabilities.	`None`

`OpenAICompatibleEMInvoker(model_name, base_url, api_key=None, model_kwargs=None, default_hyperparameters=None, retry_config=None, truncation_config=None, vector_fuser=None)`

Bases: OpenAIEMInvoker

An embedding model invoker to interact with endpoints compatible with OpenAI's embedding API contract.

Attributes:

Name	Type	Description
`model_id`	`str`	The model ID of the embedding model.
`model_provider`	`str`	The provider of the embedding model.
`model_name`	`str`	The name of the embedding model.
`client_kwargs`	`dict[str, Any]`	The keyword arguments for the OpenAI client.
`default_hyperparameters`	`dict[str, Any]`	Default hyperparameters for invoking the embedding model.
`retry_config`	`RetryConfig`	The retry configuration for the embedding model.
`truncation_config`	`TruncationConfig \| None`	The truncation configuration for the embedding model.
`vector_fuser`	`BaseVectorFuser \| None`	The vector fuser to handle mixed content.

This class is deprecated and will be removed in v0.6. Please use the OpenAIEMInvoker class instead.

Initializes a new instance of the OpenAICompatibleEMInvoker class.

Parameters:

Name	Type	Description	Default
`model_name`	`str`	The name of the embedding model hosted on the OpenAI compatible endpoint.	required
`base_url`	`str`	The base URL for the OpenAI compatible endpoint.	required
`api_key`	`str \| None`	The API key for authenticating with the OpenAI compatible endpoint. Defaults to None, in which case the `OPENAI_API_KEY` environment variable will be used.	`None`
`model_kwargs`	`dict[str, Any] \| None`	Additional model parameters. Defaults to None.	`None`
`default_hyperparameters`	`dict[str, Any] \| None`	Default hyperparameters for invoking the model. Defaults to None.	`None`
`retry_config`	`RetryConfig \| None`	The retry configuration for the embedding model. Defaults to None, in which case a default config with no retry and 30.0 seconds timeout will be used.	`None`
`truncation_config`	`TruncationConfig \| None`	Configuration for text truncation behavior. Defaults to None, in which case no truncation is applied.	`None`
`vector_fuser`	`BaseVectorFuser \| VectorFuserType \| None`	The vector fuser to handle mixed content. Defaults to None, in which case handling the mixed modality content depends on the EM's capabilities.	`None`

`OpenAIEMInvoker(model_name, api_key=None, base_url=OPENAI_DEFAULT_URL, model_kwargs=None, default_hyperparameters=None, retry_config=None, truncation_config=None, vector_fuser=None)`

Bases: BaseEMInvoker

An embedding model invoker to interact with OpenAI embedding models.

Attributes:

Name	Type	Description
`model_id`	`str`	The model ID of the embedding model.
`model_provider`	`str`	The provider of the embedding model.
`model_name`	`str`	The name of the embedding model.
`client_kwargs`	`dict[str, Any]`	The keyword arguments for the OpenAI client.
`default_hyperparameters`	`dict[str, Any]`	Default hyperparameters for invoking the embedding model.
`retry_config`	`RetryConfig`	The retry configuration for the embedding model.
`truncation_config`	`TruncationConfig \| None`	The truncation configuration for the embedding model.
`vector_fuser`	`BaseVectorFuser \| None`	The vector fuser to handle mixed content.

Basic usage

The OpenAIEMInvoker can be used as follows:

em_invoker = OpenAIEMInvoker(model_name="text-embedding-3-small")
result = await em_invoker.invoke("Hi there!")

OpenAI compatible endpoints

The OpenAIEMInvoker can also be used to interact with endpoints that are compatible with OpenAI's Embeddings API schema. This includes but are not limited to: 1. Text Embeddings Inference (https://github.com/huggingface/text-embeddings-inference) 2. vLLM (https://vllm.ai/) Please note that the supported features and capabilities may vary between different endpoints and language models. Using features that are not supported by the endpoint will result in an error.

This customization can be done by setting the base_url parameter to the base URL of the endpoint:

em_invoker = OpenAIEMInvoker(
    model_name="<model-name>",
    api_key="<your-api-key>",
    base_url="<https://base-url>",
)
result = await em_invoker.invoke("Hi there!")

Input types

The OpenAIEMInvoker only supports text inputs.

Output format

The OpenAIEMInvoker can embed either: 1. A single content. 1. A single content is a single text. 2. The output will be a Vector, representing the embedding of the content.

# Example 1: Embedding a text content. python text = "This is a text" result = await em_invoker.invoke(text)

The above examples will return a Vector with a size of (embedding_size,).

A list of contents.
A list of contents is a list of texts.
The output will be a list[Vector], where each element is a Vector representing the embedding of each single content.

# Example: Embedding a list of contents. python text1 = "This is a text" text2 = "This is another text" text3 = "This is yet another text" result = await em_invoker.invoke([text1, text2, text3])

The above examples will return a list[Vector] with a size of (3, embedding_size).

Vector fusion

The OpenAIEMInvoker supports vector fusion, which allows fusing multiple results into a single vector. This feature allows the module to embed mixed modality contents, represented as tuples of contents. This feature can be enabled by providing a vector fuser to the vector_fuser parameter.

Usage example:

em_invoker = OpenAIEMInvoker(..., vector_fuser=SumVectorFuser())  # Using a vector fuser class
em_invoker = OpenAIEMInvoker(..., vector_fuser="sum")  # Using a vector fuser type

text = "What animal is in this image?"
image = Attachment.from_path("path/to/local/image.png")
mix_content = (text, image)
result = await em_invoker.invoke([text, image, mix_content])

The above example will return a list[Vector] with a size of (3, embedding_size).

Retry and timeout

The OpenAIEMInvoker supports retry and timeout configuration. By default, the max retries is set to 0 and the timeout is set to 30.0 seconds. They can be customized by providing a custom RetryConfig object to the retry_config parameter.

Retry config examples:

retry_config = RetryConfig(max_retries=0, timeout=None)  # No retry, no timeout
retry_config = RetryConfig(max_retries=5, timeout=10.0)  # 5 max retries, 10.0 seconds timeout

Usage example:

em_invoker = OpenAIEMInvoker(..., retry_config=retry_config)

Initializes a new instance of the OpenAIEMInvoker class.

Parameters:

Name	Type	Description	Default
`model_name`	`str`	The name of the OpenAI embedding model to be used.	required
`api_key`	`str \| None`	The API key for authenticating with OpenAI. Defaults to None, in which case the `OPENAI_API_KEY` environment variable will be used. If the endpoint does not require an API key, a dummy value can be passed (e.g. "").	`None`
`base_url`	`str`	The base URL of a custom endpoint that is compatible with OpenAI's Embeddings API schema. Defaults to OpenAI's default URL.	`OPENAI_DEFAULT_URL`
`model_kwargs`	`dict[str, Any] \| None`	Additional keyword arguments for the OpenAI client. Defaults to None.	`None`
`default_hyperparameters`	`dict[str, Any] \| None`	Default hyperparameters for invoking the model. Defaults to None.	`None`
`retry_config`	`RetryConfig \| None`	The retry configuration for the embedding model. Defaults to None, in which case a default config with no retry and 30.0 seconds timeout will be used.	`None`
`truncation_config`	`TruncationConfig \| None`	Configuration for text truncation behavior. Defaults to None, in which case no truncation is applied.	`None`
`vector_fuser`	`BaseVectorFuser \| VectorFuserType \| None`	The vector fuser to handle mixed content. Defaults to None, in which case handling the mixed modality content depends on the EM's capabilities.	`None`

`TwelveLabsEMInvoker(model_name, api_key=None, model_kwargs=None, default_hyperparameters=None, retry_config=None, truncation_config=None, vector_fuser=None)`

Bases: BaseEMInvoker

An embedding model invoker to interact with TwelveLabs embedding models.

Attributes:

Name	Type	Description
`model_id`	`str`	The model ID of the embedding model.
`model_provider`	`str`	The provider of the embedding model.
`model_name`	`str`	The name of the embedding model.
`client`	`Client`	The client for the TwelveLabs API.
`default_hyperparameters`	`dict[str, Any]`	Default hyperparameters for invoking the embedding model.
`retry_config`	`RetryConfig`	The retry configuration for the embedding model.
`truncation_config`	`TruncationConfig \| None`	The truncation configuration for the embedding model.
`vector_fuser`	`BaseVectorFuser \| None`	The vector fuser to handle mixed content.

Input types

The TwelveLabsEMInvoker supports the following input types: text, audio, and image. Non-text inputs must be passed as a Attachment object.

Output format

The TwelveLabsEMInvoker can embed either: 1. A single content. 1. A single content is either a text, an audio, or an image. 2. The output will be a Vector, representing the embedding of the content.

# Example 1: Embedding a text content.
```python
text = "What animal is in this image?"
result = await em_invoker.invoke(text)
```

# Example 2: Embedding an audio content.
```python
audio = Attachment.from_path("path/to/local/audio.mp3")
result = await em_invoker.invoke(audio)
```

# Example 3: Embedding an image content.
```python
image = Attachment.from_path("path/to/local/image.png")
result = await em_invoker.invoke(image)
```

# Example 4: Embedding a video content.
```python
video = Attachment.from_path("path/to/local/video.mp4")
result = await em_invoker.invoke(video)
```

The above examples will return a Vector with a size of (embedding_size,).

A list of contents.
A list of contents is a list that consists of any of the above single contents.
The output will be a list[Vector], where each element is a Vector representing the embedding of each single content.

# Example: Embedding a list of contents. python text = "What animal is in this image?" audio = Attachment.from_path("path/to/local/audio.mp3") image = Attachment.from_path("path/to/local/image.png") video = Attachment.from_path("path/to/local/video.mp4") result = await em_invoker.invoke([text, audio, image, video])

The above examples will return a list[Vector] with a size of (4, embedding_size).

Video embedding

For advanced video embedding customization, you can pass hyperparameters to control the embedding process:

hyperparameters={
    "video_embed_create_params": {}, # optional parameters for EmbedClient.TasksClient.create(),
    "video_embed_retrieve_params": {}, # optional parameters for EmbedClient.TasksClient.retrieve()
    "wait_for_done_params": {}, # optional parameters for EmbedClient.TasksClient.wait_for_done()
}

video = Attachment.from_path("path/to/local/video.mp4")
result = await em_invoker.invoke(video, hyperparameters)

For video related hyperparameters, see https://docs.twelvelabs.io/sdk-reference/python/create-embeddings-v-1/create-video-embeddings.

Furthermore, when embedding video: 1. The video format should meet the requirements of TwelveLabs embedding model. For more details, see Marengo requirements 2. VectorFuser is required to embed video content.

Vector fusion

The TwelveLabsEMInvoker supports vector fusion, which allows fusing multiple results into a single vector. This feature allows the module to embed mixed modality contents, represented as tuples of contents. This feature can be enabled by providing a vector fuser to the vector_fuser parameter.

Usage example:

em_invoker = TwelveLabsEMInvoker(..., vector_fuser=SumVectorFuser())  # Using a vector fuser class
em_invoker = TwelveLabsEMInvoker(..., vector_fuser="sum")  # Using a vector fuser type

text = "What animal is in this image?"
image = Attachment.from_path("path/to/local/image.png")
mix_content = (text, image)
result = await em_invoker.invoke([text, image, mix_content])

The above example will return a list[Vector] with a size of (3, embedding_size).

Retry and timeout

The TwelveLabsEMInvoker supports retry and timeout configuration. By default, the max retries is set to 0 and the timeout is set to 30.0 seconds. They can be customized by providing a custom RetryConfig object to the retry_config parameter.

Retry config examples:

retry_config = RetryConfig(max_retries=0, timeout=None)  # No retry, no timeout
retry_config = RetryConfig(max_retries=5, timeout=10.0)  # 5 max retries, 10.0 seconds timeout

Usage example:

em_invoker = TwelveLabsEMInvoker(..., retry_config=retry_config)

Initializes a new instance of the TwelveLabsEMInvoker class.

Parameters:

Name	Type	Description	Default
`model_name`	`str`	The name of the TwelveLabs embedding model to be used.	required
`api_key`	`str \| None`	The API key for the TwelveLabs API. Defaults to None, in which case the `TWELVELABS_API_KEY` environment variable will be used.	`None`
`model_kwargs`	`dict[str, Any] \| None`	Additional keyword arguments for the TwelveLabs client. Defaults to None.	`None`
`default_hyperparameters`	`dict[str, Any] \| None`	Default hyperparameters for invoking the model. Defaults to None.	`None`
`retry_config`	`RetryConfig \| None`	The retry configuration for the embedding model. Defaults to None, in which case a default config with no retry and 30.0 seconds timeout will be used.	`None`
`truncation_config`	`TruncationConfig \| None`	Configuration for text truncation behavior. Defaults to None, in which case no truncation is applied.	`None`
`vector_fuser`	`BaseVectorFuser \| VectorFuserType \| None`	The vector fuser to handle mixed content. Defaults to None, in which case handling the mixed modality content depends on the EM's capabilities.	`None`

`VoyageEMInvoker(model_name, api_key=None, model_kwargs=None, default_hyperparameters=None, retry_config=None, truncation_config=None, vector_fuser=None)`

Bases: BaseEMInvoker

An embedding model invoker to interact with Voyage embedding models.

Attributes:

Name	Type	Description
`model_id`	`str`	The model ID of the embedding model.
`model_provider`	`str`	The provider of the embedding model.
`model_name`	`str`	The name of the embedding model.
`client`	`Client`	The client for the Voyage API.
`default_hyperparameters`	`dict[str, Any]`	Default hyperparameters for invoking the embedding model.
`retry_config`	`RetryConfig`	The retry configuration for the embedding model.
`truncation_config`	`TruncationConfig \| None`	The truncation configuration for the embedding model.
`vector_fuser`	`BaseVectorFuser \| None`	The vector fuser to handle mixed content.

Input types

The VoyageEMInvoker supports the following input types: text, image, and a tuple containing text and image. Non-text inputs must be passed as a Attachment object.

Output format

The VoyageEMInvoker can embed either: 1. A single content. 1. A single content is either a text, an image, or a tuple containing a text and an image. 2. The output will be a Vector, representing the embedding of the content.

# Example 1: Embedding a text content. python text = "What animal is in this image?" result = await em_invoker.invoke(text)

# Example 2: Embedding an image content. python image = Attachment.from_path("path/to/local/image.png") result = await em_invoker.invoke(image)

# Example 3: Embedding a tuple containing a text and an image. python text = "What animal is in this image?" image = Attachment.from_path("path/to/local/image.png") result = await em_invoker.invoke((text, image))

The above examples will return a Vector with a size of (embedding_size,).

A list of contents.
A list of contents is a list that consists of any of the above single contents.
The output will be a list[Vector], where each element is a Vector representing the embedding of each single content.

# Example: Embedding a list of contents. python text = "What animal is in this image?" image = Attachment.from_path("path/to/local/image.png") mix_content = (text, image) result = await em_invoker.invoke([text, image, mix_content])

The above examples will return a list[Vector] with a size of (3, embedding_size).

Vector fusion

The VoyageEMInvoker supports vector fusion, which allows fusing multiple results into a single vector. Since Voyage supports embedding mixed modality contents natively, this feature is optional and will take precedence over Voyage's built-in mixed modality handling if provided. This feature can be enabled by providing a vector fuser to the vector_fuser parameter.

Usage example:

em_invoker = VoyageEMInvoker(..., vector_fuser=SumVectorFuser())  # Using a vector fuser class
em_invoker = VoyageEMInvoker(..., vector_fuser="sum")  # Using a vector fuser type

text = "What animal is in this image?"
image = Attachment.from_path("path/to/local/image.png")
mix_content = (text, image)
result = await em_invoker.invoke([text, image, mix_content])

The above example will return a list[Vector] with a size of (3, embedding_size).

Retry and timeout

The VoyageEMInvoker supports retry and timeout configuration. By default, the max retries is set to 0 and the timeout is set to 30.0 seconds. They can be customized by providing a custom RetryConfig object to the retry_config parameter.

Retry config examples:

retry_config = RetryConfig(max_retries=0, timeout=None)  # No retry, no timeout
retry_config = RetryConfig(max_retries=5, timeout=10.0)  # 5 max retries, 10.0 seconds timeout

Usage example:

em_invoker = VoyageEMInvoker(..., retry_config=retry_config)

Initializes a new instance of the VoyageEMInvoker class.

Parameters:

Name	Type	Description	Default
`model_name`	`str`	The name of the Voyage embedding model to be used.	required
`api_key`	`str \| None`	The API key for the Voyage API. Defaults to None, in which case the `VOYAGE_API_KEY` environment variable will be used.	`None`
`model_kwargs`	`dict[str, Any] \| None`	Additional keyword arguments for the Voyage client. Defaults to None.	`None`
`default_hyperparameters`	`dict[str, Any] \| None`	Default hyperparameters for invoking the model. Defaults to None.	`None`
`retry_config`	`RetryConfig \| None`	The retry configuration for the embedding model. Defaults to None, in which case a default config with no retry and 30.0 seconds timeout will be used.	`None`
`truncation_config`	`TruncationConfig \| None`	Configuration for text truncation behavior. Defaults to None, in which case no truncation is applied.	`None`
`vector_fuser`	`BaseVectorFuser \| VectorFuserType \| None`	The vector fuser to handle mixed content. Defaults to None, in which case handling the mixed modality content depends on the EM's capabilities.	`None`