Graph
Package containing Graph Indexer modules.
Modules:
| Name | Description |
|---|---|
LlamaIndexGraphRAGIndexer |
A class for indexing elements using LlamaIndex. |
LightRAGGraphRAGIndexer |
A class for indexing elements using LightRAG. |
LightRAGGraphRAGIndexer(graph_store)
Bases: BaseGraphRAGIndexer
Indexer abstract base class for LightRAG-based graph RAG.
How to run LightRAG with PostgreSQL using Docker:
docker run -p 5455:5432 -d --name postgres-LightRag shangor/postgres-for-rag:v1.0 sh -c "service postgresql start && sleep infinity"
Example
from gllm_inference.em_invoker import OpenAIEMInvoker
from gllm_inference.lm_invoker import OpenAILMInvoker
from gllm_docproc.indexer.graph.light_rag_graph_rag_indexer import LightRAGGraphRAGIndexer
from gllm_datastore.graph_data_store.light_rag_postgres_data_store import LightRAGPostgresDataStore
# Create the LightRAGPostgresDataStore instance
graph_store = LightRAGPostgresDataStore(
lm_invoker=OpenAILMInvoker(model_name="gpt-4o-mini"),
em_invoker=OpenAIEMInvoker(model_name="text-embedding-3-small"),
postgres_db_host="localhost",
postgres_db_port=5455,
postgres_db_user="rag",
postgres_db_password="rag",
postgres_db_name="rag",
postgres_db_workspace="default",
)
# Create the indexer
indexer = LightRAGGraphRAGIndexer(graph_store=graph_store)
# Create elements to index
elements = [
{
"text": "This is a sample document about AI.",
"structure": "uncategorized",
"metadata": {
"source": "sample.txt",
"source_type": "TEXT",
"loaded_datetime": "2025-07-10T12:00:00",
"chunk_id": "chunk_001",
"file_id": "file_001"
}
}
]
# Index the elements
indexer.index_file_chunks(elements, file_id="file_001")
Attributes:
| Name | Type | Description |
|---|---|---|
_graph_store |
BaseLightRAGDataStore
|
The LightRAG data store used for indexing and querying. |
Initialize the LightRAGGraphRAGIndexer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
graph_store
|
BaseLightRAGDataStore
|
The LightRAG instance to use for indexing. |
required |
delete_chunk(chunk_id, file_id, **kwargs)
Delete a single chunk by chunk ID and file ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chunk_id
|
str
|
The ID of the chunk to delete. |
required |
file_id
|
str
|
The ID of the file the chunk belongs to. |
required |
**kwargs
|
Any
|
Additional keyword arguments for customization. |
{}
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: Response with success status and error message. |
delete_file_chunks(file_id, **kwargs)
Delete all chunks for a specific file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_id
|
str
|
The ID of the file whose chunks should be deleted. |
required |
**kwargs
|
Any
|
Additional keyword arguments for customization. |
{}
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: Response with success status and error message. |
get_chunk(chunk_id, file_id, **kwargs)
Get a single chunk by chunk ID and file ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chunk_id
|
str
|
The ID of the chunk to retrieve. |
required |
file_id
|
str
|
The ID of the file the chunk belongs to. |
required |
**kwargs
|
Any
|
Additional keyword arguments for customization. |
{}
|
Returns:
| Type | Description |
|---|---|
dict[str, Any] | None
|
dict[str, Any] | None: The chunk data, or None if not found. |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
This method is not yet implemented. |
get_file_chunks(file_id, page=0, size=20, **kwargs)
Get chunks for a specific file with pagination support.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_id
|
str
|
The ID of the file to get chunks from. |
required |
page
|
int
|
The page number (0-indexed). Defaults to 0. |
0
|
size
|
int
|
The number of chunks per page. Defaults to 20. |
20
|
**kwargs
|
Any
|
Additional keyword arguments for customization. |
{}
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: Response with chunks list, total count, and pagination info. |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
This method is not yet implemented. |
index_chunk(element, **kwargs)
Index a single chunk.
This method only indexes the chunk. It does NOT update the metadata of neighboring chunks (previous_chunk/next_chunk). The caller is responsible for maintaining chunk relationships by updating adjacent chunks' metadata separately.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
element
|
dict[str, Any]
|
The chunk to be indexed. |
required |
**kwargs
|
Any
|
Additional keyword arguments for customization. |
{}
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: Response with success status, error message, and chunk_id. |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
This method is not yet implemented. |
index_chunks(elements, **kwargs)
Index multiple chunks.
This method enables indexing multiple chunks in a single operation without requiring file replacement semantics (i.e., it inserts or overwrites the provided chunks directly without first deleting existing chunks). The chunks provided can belong to multiple different files.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
elements
|
list[dict[str, Any]]
|
The chunks to be indexed. Each dict should follow the Element structure with 'text' and 'metadata' keys. Metadata must include 'file_id' and 'chunk_id'. |
required |
**kwargs
|
Any
|
Additional keyword arguments for customization. |
{}
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: The response from the indexing process. Should include: 1. success (bool): True if indexing succeeded, False otherwise. 2. error_message (str): Error message if indexing failed, empty string otherwise. 3. total (int): The total number of chunks indexed. |
index_file_chunks(elements, file_id, **kwargs)
Index chunks for a specific file.
This method extracts text and chunk IDs from the provided elements, inserts them into the LightRAG system, and creates a graph structure connecting files to chunks.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
elements
|
list[dict[str, Any]]
|
The chunks to be indexed. |
required |
file_id
|
str
|
The ID of the file these chunks belong to. |
required |
**kwargs
|
Any
|
Additional keyword arguments for customization. |
{}
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: Response with success status, error message, and total count. |
resolve_entities()
Resolve entities from the graph.
Currently, this method does nothing. Resolve entities has been implicitly implemented in the LightRAG instance.
update_chunk(element, **kwargs)
Update a chunk by chunk ID.
This method updates both the text content and metadata of a chunk. When text content is updated, the chunk should be re-processed through data generators and re-indexed with updated vector embeddings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
element
|
dict[str, Any]
|
The updated chunk data. |
required |
**kwargs
|
Any
|
Additional keyword arguments for customization. |
{}
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: Response with success status, error message, and chunk_id. |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
This method is not yet implemented. |
update_chunk_metadata(chunk_id, file_id, metadata, **kwargs)
Update metadata for a specific chunk.
This method patches new metadata into the existing chunk metadata. Existing metadata fields will be overwritten, and new fields will be added. Identity metadata fields file_id and chunk_id should be preserved and not overwritten.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chunk_id
|
str
|
The ID of the chunk to update. |
required |
file_id
|
str
|
The ID of the file the chunk belongs to. |
required |
metadata
|
dict[str, Any]
|
The metadata fields to update. |
required |
**kwargs
|
Any
|
Additional keyword arguments for customization. |
{}
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: Response with success status and error message. |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
This method is not yet implemented. |
LlamaIndexGraphRAGIndexer(graph_store, llama_index_llm=None, allowed_entity_types=None, allowed_relation_types=None, kg_validation_schema=None, strict_mode=False, kg_extractors=None, embed_model=None, vector_store=None, max_triplets_per_chunk=10, num_workers=4, **kwargs)
Bases: BaseGraphRAGIndexer
Indexer for graph RAG using LlamaIndex.
Attributes:
| Name | Type | Description |
|---|---|---|
_index |
PropertyGraphIndex
|
Property graph index. |
_graph_store |
LlamaIndexGraphRAGDataStore
|
Storage for property graph. |
_strict_mode |
bool
|
Whether strict schema validation is enabled. |
Initialize the LlamaIndexGraphRAGIndexer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
graph_store
|
LlamaIndexGraphRAGDataStore
|
Storage for property graph. |
required |
llama_index_llm
|
BaseLLM | None
|
Language model for LlamaIndex. Defaults to None. Deprecated: Use graph_store.llm instead. Instantiate the LLM via LlamaIndexGraphRAGDataStore (e.g., LlamaIndexGraphRAGDataStore(lm_invoker=...)). |
None
|
allowed_entity_types
|
list[str] | None
|
List of allowed entity types. When strict_mode=True, only these types are extracted. When strict_mode=False, serves as hints. Defaults to None. |
None
|
allowed_relation_types
|
list[str] | None
|
List of allowed relationship types. Behavior depends on strict_mode. Defaults to None. |
None
|
kg_validation_schema
|
dict[str, list[str]] | None
|
Validation schema for strict mode. Maps entity types to their allowed outgoing relationship types. Format: {"ENTITY_TYPE": ["ALLOWED_REL1", "ALLOWED_REL2"], ...} Example: {"PERSON": ["WORKS_AT", "FOUNDED"], "ORGANIZATION": ["LOCATED_IN"]} Defaults to None. |
None
|
strict_mode
|
bool
|
If True, uses SchemaLLMPathExtractor with strict validation. If False (default), uses DynamicLLMPathExtractor with optional guidance. Defaults to False. |
False
|
kg_extractors
|
list[TransformComponent] | None
|
Custom list of extractors. If provided, overrides automatic extractor selection based on strict_mode. Defaults to None. |
None
|
embed_model
|
BaseEmbedding | None
|
Embedding model for vector representations. Defaults to None. Deprecated: Use graph_store.embed_model instead. Instantiate the embedding model via LlamaIndexGraphRAGDataStore (e.g., LlamaIndexGraphRAGDataStore(em_invoker=...)). |
None
|
vector_store
|
BasePydanticVectorStore | None
|
Storage for vector data. Defaults to None. |
None
|
max_triplets_per_chunk
|
int
|
Maximum triplets to extract per chunk. Defaults to 10. |
10
|
num_workers
|
int
|
Number of parallel workers. Defaults to 4. |
4
|
**kwargs
|
Any
|
Additional keyword arguments. |
{}
|
delete_chunk(chunk_id, file_id, **kwargs)
Delete a single chunk by chunk ID and file ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chunk_id
|
str
|
The ID of the chunk to delete. |
required |
file_id
|
str
|
The ID of the file the chunk belongs to. |
required |
**kwargs
|
Any
|
Additional keyword arguments for customization. |
{}
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: Response with success status and error message. |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
This method is not yet implemented. |
delete_file_chunks(file_id, **kwargs)
Delete all chunks for a specific file.
This method deletes all chunks from the knowledge graph based on the provided file_id.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_id
|
str
|
The ID of the file whose chunks should be deleted. |
required |
**kwargs
|
Any
|
Additional keyword arguments for customization. |
{}
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: Response with success status and error message. |
get_chunk(chunk_id, file_id, **kwargs)
Get a single chunk by chunk ID and file ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chunk_id
|
str
|
The ID of the chunk to retrieve. |
required |
file_id
|
str
|
The ID of the file the chunk belongs to. |
required |
**kwargs
|
Any
|
Additional keyword arguments for customization. |
{}
|
Returns:
| Type | Description |
|---|---|
dict[str, Any] | None
|
dict[str, Any] | None: The chunk data, or None if not found. |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
This method is not yet implemented. |
get_file_chunks(file_id, page=0, size=20, **kwargs)
Get chunks for a specific file with pagination support.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_id
|
str
|
The ID of the file to get chunks from. |
required |
page
|
int
|
The page number (0-indexed). Defaults to 0. |
0
|
size
|
int
|
The number of chunks per page. Defaults to 20. |
20
|
**kwargs
|
Any
|
Additional keyword arguments for customization. |
{}
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: Response with chunks list, total count, and pagination info. |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
This method is not yet implemented. |
index_chunk(element, **kwargs)
Index a single chunk.
This method only indexes the chunk. It does NOT update the metadata of neighboring chunks (previous_chunk/next_chunk). The caller is responsible for maintaining chunk relationships by updating adjacent chunks' metadata separately.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
element
|
dict[str, Any]
|
The chunk to be indexed. |
required |
**kwargs
|
Any
|
Additional keyword arguments for customization. |
{}
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: Response with success status, error message, and chunk_id. |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
This method is not yet implemented. |
index_chunks(elements, **kwargs)
Index multiple chunks.
This method enables indexing multiple chunks in a single operation without requiring file replacement semantics (i.e., it inserts or overwrites the provided chunks directly without first deleting existing chunks). The chunks provided can belong to multiple different files.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
elements
|
list[dict[str, Any]]
|
The chunks to be indexed. Each dict should follow the Element structure with 'text' and 'metadata' keys. Metadata must include 'file_id' and 'chunk_id'. |
required |
**kwargs
|
Any
|
Additional keyword arguments for customization. |
{}
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: The response from the indexing process. Should include: 1. success (bool): True if indexing succeeded, False otherwise. 2. error_message (str): Error message if indexing failed, empty string otherwise. 3. total (int): The total number of chunks indexed. |
index_file_chunks(elements, file_id, **kwargs)
Index chunks for a specific file.
This method indexes chunks for a file.
Notes: - Currently only Neo4jPropertyGraphStore that is supported for indexing the metadata from the TextNode. - The 'chunk_id' parameter is used to specify the chunk ID for the elements.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
elements
|
list[dict[str, Any]]
|
The chunks to be indexed. |
required |
file_id
|
str
|
The ID of the file these chunks belong to. |
required |
**kwargs
|
Any
|
Additional keyword arguments for customization. |
{}
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: Response with success status, error message, and total count. |
resolve_entities()
Resolve entities in the graph.
Currently, this method does nothing.
update_chunk(element, **kwargs)
Update a chunk by chunk ID.
This method updates both the text content and metadata of a chunk. When text content is updated, the chunk should be re-processed through data generators and re-indexed with updated vector embeddings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
element
|
dict[str, Any]
|
The updated chunk data. |
required |
**kwargs
|
Any
|
Additional keyword arguments for customization. |
{}
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: Response with success status, error message, and chunk_id. |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
This method is not yet implemented. |
update_chunk_metadata(chunk_id, file_id, metadata, **kwargs)
Update metadata for a specific chunk.
This method patches new metadata into the existing chunk metadata. Existing metadata fields will be overwritten, and new fields will be added. Identity metadata fields file_id and chunk_id should be preserved and not overwritten.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chunk_id
|
str
|
The ID of the chunk to update. |
required |
file_id
|
str
|
The ID of the file the chunk belongs to. |
required |
metadata
|
dict[str, Any]
|
The metadata fields to update. |
required |
**kwargs
|
Any
|
Additional keyword arguments for customization. |
{}
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: Response with success status and error message. |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
This method is not yet implemented. |