Skip to content

Graph

Package containing Graph Indexer modules.

Modules:

Name Description
LlamaIndexGraphRAGIndexer

A class for indexing elements using LlamaIndex.

LightRAGGraphRAGIndexer

A class for indexing elements using LightRAG.

LightRAGGraphRAGIndexer(graph_store)

Bases: BaseGraphRAGIndexer

Indexer abstract base class for LightRAG-based graph RAG.

How to run LightRAG with PostgreSQL using Docker:

docker run         -p 5455:5432         -d         --name postgres-LightRag         shangor/postgres-for-rag:v1.0         sh -c "service postgresql start && sleep infinity"
Example
from gllm_inference.em_invoker import OpenAIEMInvoker
from gllm_inference.lm_invoker import OpenAILMInvoker
from gllm_docproc.indexer.graph.light_rag_graph_rag_indexer import LightRAGGraphRAGIndexer
from gllm_datastore.graph_data_store.light_rag_postgres_data_store import LightRAGPostgresDataStore

# Create the LightRAGPostgresDataStore instance
graph_store = LightRAGPostgresDataStore(
    lm_invoker=OpenAILMInvoker(model_name="gpt-4o-mini"),
    em_invoker=OpenAIEMInvoker(model_name="text-embedding-3-small"),
    postgres_db_host="localhost",
    postgres_db_port=5455,
    postgres_db_user="rag",
    postgres_db_password="rag",
    postgres_db_name="rag",
    postgres_db_workspace="default",
)


# Create the indexer
indexer = LightRAGGraphRAGIndexer(graph_store=graph_store)

# Create elements to index
elements = [
    {
        "text": "This is a sample document about AI.",
        "structure": "uncategorized",
        "metadata": {
            "source": "sample.txt",
            "source_type": "TEXT",
            "loaded_datetime": "2025-07-10T12:00:00",
            "chunk_id": "chunk_001",
            "file_id": "file_001"
        }
    }
]

# Index the elements
indexer.index_file_chunks(elements, file_id="file_001")

Attributes:

Name Type Description
_graph_store BaseLightRAGDataStore

The LightRAG data store used for indexing and querying.

Initialize the LightRAGGraphRAGIndexer.

Parameters:

Name Type Description Default
graph_store BaseLightRAGDataStore

The LightRAG instance to use for indexing.

required

delete_chunk(chunk_id, file_id, **kwargs)

Delete a single chunk by chunk ID and file ID.

Parameters:

Name Type Description Default
chunk_id str

The ID of the chunk to delete.

required
file_id str

The ID of the file the chunk belongs to.

required
**kwargs Any

Additional keyword arguments for customization.

{}

Returns:

Type Description
dict[str, Any]

dict[str, Any]: Response with success status and error message.

delete_file_chunks(file_id, **kwargs)

Delete all chunks for a specific file.

Parameters:

Name Type Description Default
file_id str

The ID of the file whose chunks should be deleted.

required
**kwargs Any

Additional keyword arguments for customization.

{}

Returns:

Type Description
dict[str, Any]

dict[str, Any]: Response with success status and error message.

get_chunk(chunk_id, file_id, **kwargs)

Get a single chunk by chunk ID and file ID.

Parameters:

Name Type Description Default
chunk_id str

The ID of the chunk to retrieve.

required
file_id str

The ID of the file the chunk belongs to.

required
**kwargs Any

Additional keyword arguments for customization.

{}

Returns:

Type Description
dict[str, Any] | None

dict[str, Any] | None: The chunk data, or None if not found.

Raises:

Type Description
NotImplementedError

This method is not yet implemented.

get_file_chunks(file_id, page=0, size=20, **kwargs)

Get chunks for a specific file with pagination support.

Parameters:

Name Type Description Default
file_id str

The ID of the file to get chunks from.

required
page int

The page number (0-indexed). Defaults to 0.

0
size int

The number of chunks per page. Defaults to 20.

20
**kwargs Any

Additional keyword arguments for customization.

{}

Returns:

Type Description
dict[str, Any]

dict[str, Any]: Response with chunks list, total count, and pagination info.

Raises:

Type Description
NotImplementedError

This method is not yet implemented.

index_chunk(element, **kwargs)

Index a single chunk.

This method only indexes the chunk. It does NOT update the metadata of neighboring chunks (previous_chunk/next_chunk). The caller is responsible for maintaining chunk relationships by updating adjacent chunks' metadata separately.

Parameters:

Name Type Description Default
element dict[str, Any]

The chunk to be indexed.

required
**kwargs Any

Additional keyword arguments for customization.

{}

Returns:

Type Description
dict[str, Any]

dict[str, Any]: Response with success status, error message, and chunk_id.

Raises:

Type Description
NotImplementedError

This method is not yet implemented.

index_chunks(elements, **kwargs)

Index multiple chunks.

This method enables indexing multiple chunks in a single operation without requiring file replacement semantics (i.e., it inserts or overwrites the provided chunks directly without first deleting existing chunks). The chunks provided can belong to multiple different files.

Parameters:

Name Type Description Default
elements list[dict[str, Any]]

The chunks to be indexed. Each dict should follow the Element structure with 'text' and 'metadata' keys. Metadata must include 'file_id' and 'chunk_id'.

required
**kwargs Any

Additional keyword arguments for customization.

{}

Returns:

Type Description
dict[str, Any]

dict[str, Any]: The response from the indexing process. Should include: 1. success (bool): True if indexing succeeded, False otherwise. 2. error_message (str): Error message if indexing failed, empty string otherwise. 3. total (int): The total number of chunks indexed.

index_file_chunks(elements, file_id, **kwargs)

Index chunks for a specific file.

This method extracts text and chunk IDs from the provided elements, inserts them into the LightRAG system, and creates a graph structure connecting files to chunks.

Parameters:

Name Type Description Default
elements list[dict[str, Any]]

The chunks to be indexed.

required
file_id str

The ID of the file these chunks belong to.

required
**kwargs Any

Additional keyword arguments for customization.

{}

Returns:

Type Description
dict[str, Any]

dict[str, Any]: Response with success status, error message, and total count.

resolve_entities()

Resolve entities from the graph.

Currently, this method does nothing. Resolve entities has been implicitly implemented in the LightRAG instance.

update_chunk(element, **kwargs)

Update a chunk by chunk ID.

This method updates both the text content and metadata of a chunk. When text content is updated, the chunk should be re-processed through data generators and re-indexed with updated vector embeddings.

Parameters:

Name Type Description Default
element dict[str, Any]

The updated chunk data.

required
**kwargs Any

Additional keyword arguments for customization.

{}

Returns:

Type Description
dict[str, Any]

dict[str, Any]: Response with success status, error message, and chunk_id.

Raises:

Type Description
NotImplementedError

This method is not yet implemented.

update_chunk_metadata(chunk_id, file_id, metadata, **kwargs)

Update metadata for a specific chunk.

This method patches new metadata into the existing chunk metadata. Existing metadata fields will be overwritten, and new fields will be added. Identity metadata fields file_id and chunk_id should be preserved and not overwritten.

Parameters:

Name Type Description Default
chunk_id str

The ID of the chunk to update.

required
file_id str

The ID of the file the chunk belongs to.

required
metadata dict[str, Any]

The metadata fields to update.

required
**kwargs Any

Additional keyword arguments for customization.

{}

Returns:

Type Description
dict[str, Any]

dict[str, Any]: Response with success status and error message.

Raises:

Type Description
NotImplementedError

This method is not yet implemented.

LlamaIndexGraphRAGIndexer(graph_store, llama_index_llm=None, allowed_entity_types=None, allowed_relation_types=None, kg_validation_schema=None, strict_mode=False, kg_extractors=None, embed_model=None, vector_store=None, max_triplets_per_chunk=10, num_workers=4, **kwargs)

Bases: BaseGraphRAGIndexer

Indexer for graph RAG using LlamaIndex.

Attributes:

Name Type Description
_index PropertyGraphIndex

Property graph index.

_graph_store LlamaIndexGraphRAGDataStore

Storage for property graph.

_strict_mode bool

Whether strict schema validation is enabled.

Initialize the LlamaIndexGraphRAGIndexer.

Parameters:

Name Type Description Default
graph_store LlamaIndexGraphRAGDataStore

Storage for property graph.

required
llama_index_llm BaseLLM | None

Language model for LlamaIndex. Defaults to None. Deprecated: Use graph_store.llm instead. Instantiate the LLM via LlamaIndexGraphRAGDataStore (e.g., LlamaIndexGraphRAGDataStore(lm_invoker=...)).

None
allowed_entity_types list[str] | None

List of allowed entity types. When strict_mode=True, only these types are extracted. When strict_mode=False, serves as hints. Defaults to None.

None
allowed_relation_types list[str] | None

List of allowed relationship types. Behavior depends on strict_mode. Defaults to None.

None
kg_validation_schema dict[str, list[str]] | None

Validation schema for strict mode. Maps entity types to their allowed outgoing relationship types. Format: {"ENTITY_TYPE": ["ALLOWED_REL1", "ALLOWED_REL2"], ...} Example: {"PERSON": ["WORKS_AT", "FOUNDED"], "ORGANIZATION": ["LOCATED_IN"]} Defaults to None.

None
strict_mode bool

If True, uses SchemaLLMPathExtractor with strict validation. If False (default), uses DynamicLLMPathExtractor with optional guidance. Defaults to False.

False
kg_extractors list[TransformComponent] | None

Custom list of extractors. If provided, overrides automatic extractor selection based on strict_mode. Defaults to None.

None
embed_model BaseEmbedding | None

Embedding model for vector representations. Defaults to None. Deprecated: Use graph_store.embed_model instead. Instantiate the embedding model via LlamaIndexGraphRAGDataStore (e.g., LlamaIndexGraphRAGDataStore(em_invoker=...)).

None
vector_store BasePydanticVectorStore | None

Storage for vector data. Defaults to None.

None
max_triplets_per_chunk int

Maximum triplets to extract per chunk. Defaults to 10.

10
num_workers int

Number of parallel workers. Defaults to 4.

4
**kwargs Any

Additional keyword arguments.

{}

delete_chunk(chunk_id, file_id, **kwargs)

Delete a single chunk by chunk ID and file ID.

Parameters:

Name Type Description Default
chunk_id str

The ID of the chunk to delete.

required
file_id str

The ID of the file the chunk belongs to.

required
**kwargs Any

Additional keyword arguments for customization.

{}

Returns:

Type Description
dict[str, Any]

dict[str, Any]: Response with success status and error message.

Raises:

Type Description
NotImplementedError

This method is not yet implemented.

delete_file_chunks(file_id, **kwargs)

Delete all chunks for a specific file.

This method deletes all chunks from the knowledge graph based on the provided file_id.

Parameters:

Name Type Description Default
file_id str

The ID of the file whose chunks should be deleted.

required
**kwargs Any

Additional keyword arguments for customization.

{}

Returns:

Type Description
dict[str, Any]

dict[str, Any]: Response with success status and error message.

get_chunk(chunk_id, file_id, **kwargs)

Get a single chunk by chunk ID and file ID.

Parameters:

Name Type Description Default
chunk_id str

The ID of the chunk to retrieve.

required
file_id str

The ID of the file the chunk belongs to.

required
**kwargs Any

Additional keyword arguments for customization.

{}

Returns:

Type Description
dict[str, Any] | None

dict[str, Any] | None: The chunk data, or None if not found.

Raises:

Type Description
NotImplementedError

This method is not yet implemented.

get_file_chunks(file_id, page=0, size=20, **kwargs)

Get chunks for a specific file with pagination support.

Parameters:

Name Type Description Default
file_id str

The ID of the file to get chunks from.

required
page int

The page number (0-indexed). Defaults to 0.

0
size int

The number of chunks per page. Defaults to 20.

20
**kwargs Any

Additional keyword arguments for customization.

{}

Returns:

Type Description
dict[str, Any]

dict[str, Any]: Response with chunks list, total count, and pagination info.

Raises:

Type Description
NotImplementedError

This method is not yet implemented.

index_chunk(element, **kwargs)

Index a single chunk.

This method only indexes the chunk. It does NOT update the metadata of neighboring chunks (previous_chunk/next_chunk). The caller is responsible for maintaining chunk relationships by updating adjacent chunks' metadata separately.

Parameters:

Name Type Description Default
element dict[str, Any]

The chunk to be indexed.

required
**kwargs Any

Additional keyword arguments for customization.

{}

Returns:

Type Description
dict[str, Any]

dict[str, Any]: Response with success status, error message, and chunk_id.

Raises:

Type Description
NotImplementedError

This method is not yet implemented.

index_chunks(elements, **kwargs)

Index multiple chunks.

This method enables indexing multiple chunks in a single operation without requiring file replacement semantics (i.e., it inserts or overwrites the provided chunks directly without first deleting existing chunks). The chunks provided can belong to multiple different files.

Parameters:

Name Type Description Default
elements list[dict[str, Any]]

The chunks to be indexed. Each dict should follow the Element structure with 'text' and 'metadata' keys. Metadata must include 'file_id' and 'chunk_id'.

required
**kwargs Any

Additional keyword arguments for customization.

{}

Returns:

Type Description
dict[str, Any]

dict[str, Any]: The response from the indexing process. Should include: 1. success (bool): True if indexing succeeded, False otherwise. 2. error_message (str): Error message if indexing failed, empty string otherwise. 3. total (int): The total number of chunks indexed.

index_file_chunks(elements, file_id, **kwargs)

Index chunks for a specific file.

This method indexes chunks for a file.

Notes: - Currently only Neo4jPropertyGraphStore that is supported for indexing the metadata from the TextNode. - The 'chunk_id' parameter is used to specify the chunk ID for the elements.

Parameters:

Name Type Description Default
elements list[dict[str, Any]]

The chunks to be indexed.

required
file_id str

The ID of the file these chunks belong to.

required
**kwargs Any

Additional keyword arguments for customization.

{}

Returns:

Type Description
dict[str, Any]

dict[str, Any]: Response with success status, error message, and total count.

resolve_entities()

Resolve entities in the graph.

Currently, this method does nothing.

update_chunk(element, **kwargs)

Update a chunk by chunk ID.

This method updates both the text content and metadata of a chunk. When text content is updated, the chunk should be re-processed through data generators and re-indexed with updated vector embeddings.

Parameters:

Name Type Description Default
element dict[str, Any]

The updated chunk data.

required
**kwargs Any

Additional keyword arguments for customization.

{}

Returns:

Type Description
dict[str, Any]

dict[str, Any]: Response with success status, error message, and chunk_id.

Raises:

Type Description
NotImplementedError

This method is not yet implemented.

update_chunk_metadata(chunk_id, file_id, metadata, **kwargs)

Update metadata for a specific chunk.

This method patches new metadata into the existing chunk metadata. Existing metadata fields will be overwritten, and new fields will be added. Identity metadata fields file_id and chunk_id should be preserved and not overwritten.

Parameters:

Name Type Description Default
chunk_id str

The ID of the chunk to update.

required
file_id str

The ID of the file the chunk belongs to.

required
metadata dict[str, Any]

The metadata fields to update.

required
**kwargs Any

Additional keyword arguments for customization.

{}

Returns:

Type Description
dict[str, Any]

dict[str, Any]: Response with success status and error message.

Raises:

Type Description
NotImplementedError

This method is not yet implemented.