Vector
ChromaDB implementation of vector search and CRUD capability.
This module provides a ChromaDB implementation of the VectorCapability protocol using langchain-chroma integration.
References
NONE
ChromaVectorCapability(collection_name, em_invoker, client, num_candidates=DEFAULT_NUM_CANDIDATES)
ChromaDB implementation of VectorCapability protocol.
This class provides document CRUD operations and vector search using ChromaDB.
Attributes:
| Name | Type | Description |
|---|---|---|
collection_name |
str
|
The name of the ChromaDB collection. |
collection |
Collection
|
The ChromaDB collection instance. |
vector_store |
Chroma
|
The langchain Chroma vector store instance. |
num_candidates |
int
|
The maximum number of candidates to consider during search. |
Initialize the ChromaDB vector capability.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
collection_name
|
str
|
The name of the ChromaDB collection. |
required |
em_invoker
|
BaseEMInvoker
|
The embedding model to perform vectorization. |
required |
client
|
ClientAPI
|
The ChromaDB client instance. |
required |
num_candidates
|
int
|
Maximum number of candidates to consider during search. Defaults to 50. |
DEFAULT_NUM_CANDIDATES
|
em_invoker
property
Returns the EM Invoker instance.
Returns:
| Name | Type | Description |
|---|---|---|
BaseEMInvoker |
BaseEMInvoker
|
The EM Invoker instance. |
clear()
async
Clear all records from the datastore.
create(data, **kwargs)
async
create_from_vector(chunk_vectors, **kwargs)
async
Add pre-computed embeddings directly.
Examples:
await datastore.vector.create_from_vector(chunk_vectors=[
(Chunk(content="text1", metadata={"source": "source1"}, id="id1"), [0.1, 0.2, 0.3]),
(Chunk(content="text2", metadata={"source": "source2"}, id="id2"), [0.4, 0.5, 0.6]),
])
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chunk_vectors
|
list[tuple[Chunk, Vector]]
|
List of tuples containing chunks and their corresponding vectors. |
required |
**kwargs
|
Any
|
Datastore-specific parameters. |
{}
|
delete(filters=None, **kwargs)
async
Delete records from the datastore.
Examples:
from gllm_datastore.core.filters import filter as F
# Direct FilterClause usage
await datastore.vector.delete(filters=F.eq("metadata.category", "tech"))
# Multiple filters
filters = F.and_(F.eq("metadata.source", "wikipedia"), F.eq("metadata.category", "tech"))
await datastore.vector.delete(filters=filters)
This will delete all chunks from the vector store that match the filters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filters
|
FilterClause | QueryFilter | None
|
Filters to select records to delete. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None, in which case no operation is performed (no-op). |
None
|
**kwargs
|
Any
|
Datastore-specific parameters. |
{}
|
ensure_index()
async
Ensure ChromaDB collection exists, creating it if necessary.
This method is idempotent - if the collection already exists, it will return the existing collection. The collection is automatically created during initialization, but this method can be called explicitly to ensure it exists.
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If collection creation fails. |
retrieve(query, filters=None, options=None)
async
Semantic search using text query converted to vector.
Examples:
from gllm_datastore.core.filters import filter as F
# Direct FilterClause usage
await datastore.vector.retrieve(
query="What is the capital of France?",
filters=F.eq("metadata.category", "tech")
)
# Multiple filters
filters = F.and_(F.eq("metadata.source", "wikipedia"), F.eq("metadata.category", "tech"))
await datastore.vector.retrieve(query="What is the capital of France?", filters=filters)
This will retrieve the top 10 chunks by similarity score from the vector store that match the query and the filters. The chunks will be sorted by score in descending order.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
Text query to embed and search for. |
required |
filters
|
FilterClause | QueryFilter | None
|
Filters to apply to the search. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None. |
None
|
options
|
QueryOptions | None
|
Options to apply to the search. Defaults to None. |
None
|
Returns:
| Type | Description |
|---|---|
list[Chunk]
|
list[Chunk]: List of chunks ordered by relevance score. |
retrieve_by_vector(vector, filters=None, options=None)
async
Direct vector similarity search.
Examples:
from gllm_datastore.core.filters import filter as F
# Direct FilterClause usage
await datastore.vector.retrieve_by_vector(
vector=[0.1, 0.2, 0.3],
filters=F.eq("metadata.category", "tech")
)
# Multiple filters
filters = F.and_(F.eq("metadata.source", "wikipedia"), F.eq("metadata.category", "tech"))
await datastore.vector.retrieve_by_vector(vector=[0.1, 0.2, 0.3], filters=filters)
This will retrieve the top 10 chunks by similarity score from the vector store that match the vector and the filters. The chunks will be sorted by score in descending order.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vector
|
Vector
|
Query embedding vector. |
required |
filters
|
FilterClause | QueryFilter | None
|
Filters to apply to the search. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None. |
None
|
options
|
QueryOptions | None
|
Options to apply to the search. Defaults to None. |
None
|
Returns:
| Type | Description |
|---|---|
list[Chunk]
|
list[Chunk]: List of chunks ordered by similarity score. |
update(update_values, filters=None)
async
Update existing records in the datastore.
Examples:
from gllm_datastore.core.filters import filter as F
# Direct FilterClause usage
await datastore.vector.update(
update_values={"metadata": {"status": "published"}},
filters=F.eq("metadata.category", "tech"),
)
# Multiple filters
filters = F.and_(F.eq("metadata.source", "wikipedia"), F.eq("metadata.category", "tech"))
await datastore.vector.update(
update_values={"metadata": {"status": "published"}},
filters=filters,
)
This will update the metadata of the chunks that match the filters to "published".
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
update_values
|
dict[str, Any]
|
Values to update. |
required |
filters
|
FilterClause | QueryFilter | None
|
Filters to select records to update. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None, in which case no operation is performed (no-op). |
None
|
Note
ChromaDB doesn't support direct update operations. This method requires filters to identify records and will update matching records.