Graph

Package containing Graph Indexer modules.

Modules:

Name	Description
`LlamaIndexGraphRAGIndexer`	A class for indexing elements using LlamaIndex.
`LightRAGGraphRAGIndexer`	A class for indexing elements using LightRAG.

Authors

Devita (devita1@gdplabs.id) Berty C L Tobing (berty.c.l.tobing@gdplabs.id)

`LightRAGGraphRAGIndexer(graph_store)`

Bases: BaseGraphRAGIndexer

Indexer abstract base class for LightRAG-based graph RAG.

How to run LightRAG with PostgreSQL using Docker:

docker run         -p 5455:5432         -d         --name postgres-LightRag         shangor/postgres-for-rag:v1.0         sh -c "service postgresql start && sleep infinity"

Example

from gllm_inference.em_invoker import OpenAIEMInvoker
from gllm_inference.lm_invoker import OpenAILMInvoker
from gllm_docproc.indexer.graph.light_rag_graph_rag_indexer import LightRAGGraphRAGIndexer
from gllm_datastore.graph_data_store.light_rag_postgres_data_store import LightRAGPostgresDataStore

# Create the LightRAGPostgresDataStore instance
graph_store = LightRAGPostgresDataStore(
    lm_invoker=OpenAILMInvoker(model_name="gpt-4o-mini"),
    em_invoker=OpenAIEMInvoker(model_name="text-embedding-3-small"),
    postgres_db_host="localhost",
    postgres_db_port=5455,
    postgres_db_user="rag",
    postgres_db_password="rag",
    postgres_db_name="rag",
    postgres_db_workspace="default",
)


# Create the indexer
indexer = LightRAGGraphRAGIndexer(graph_store=graph_store)

# Create elements to index
elements = [
    {
        "text": "This is a sample document about AI.",
        "structure": "uncategorized",
        "metadata": {
            "source": "sample.txt",
            "source_type": "TEXT",
            "loaded_datetime": "2025-07-10T12:00:00",
            "chunk_id": "chunk_001",
            "file_id": "file_001"
        }
    }
]

# Index the elements
indexer.index(elements)

Attributes:

Name	Type	Description
`_graph_store`	`BaseLightRAGDataStore`	The LightRAG data store used for indexing and querying.

Initialize the LightRAGGraphRAGIndexer.

Parameters:

Name	Type	Description	Default
`graph_store`	`BaseLightRAGDataStore`	The LightRAG instance to use for indexing.	required

`delete(file_id=None, chunk_id=None, entity_id=None, **kwargs)`

Delete entities from the LightRAG system and graph.

Supports multiple deletion modes based on the provided keyword arguments. Exactly one of the supported deletion parameters must be provided.

Parameters:

Name	Type	Description	Default
`file_id`	`str`	Delete a file and all its associated chunks. Defaults to None.	`None`
`chunk_id`	`str`	Delete a specific chunk entity. Defaults to None.	`None`
`entity_id`	`str`	Delete a specific entity or node. Defaults to None.	`None`
`**kwargs`	`Any`	Additional keyword arguments.	`{}`

Raises:

Type	Description
`ValueError`	If no deletion parameter is provided or multiple are provided.

`index(elements, **kwargs)`

Index elements into the LightRAG system and create graph relationships.

This method extracts text and chunk IDs from the provided elements, inserts them into the LightRAG system, and creates a graph structure connecting files to chunks.

Parameters:

Name	Type	Description	Default
`elements`	`list[dict[str, Any]]`	List of Element objects containing text and metadata. Each element should have a metadata attribute with a chunk_id and a file_id.	required
`**kwargs`	`Any`	Additional keyword arguments.	`{}`

`resolve_entities()`

Resolve entities from the graph.

Currently, this method does nothing. Resolve entities has been implicitly implemented in the LightRAG instance.

`LlamaIndexGraphRAGIndexer(graph_store, llama_index_llm=None, allowed_entity_types=None, allowed_relation_types=None, kg_validation_schema=None, strict_mode=False, kg_extractors=None, embed_model=None, vector_store=None, max_triplets_per_chunk=10, num_workers=4, **kwargs)`

Bases: BaseGraphRAGIndexer

Indexer for graph RAG using LlamaIndex.

Attributes:

Name	Type	Description
`_index`	`PropertyGraphIndex`	Property graph index.
`_graph_store`	`LlamaIndexGraphRAGDataStore`	Storage for property graph.
`_strict_mode`	`bool`	Whether strict schema validation is enabled.

Initialize the LlamaIndexGraphRAGIndexer.

Parameters:

Name	Type	Description	Default
`graph_store`	`LlamaIndexGraphRAGDataStore`	Storage for property graph.	required
`llama_index_llm`	`BaseLLM \| None`	Language model for LlamaIndex. Defaults to None.	`None`
`allowed_entity_types`	`list[str] \| None`	List of allowed entity types. When strict_mode=True, only these types are extracted. When strict_mode=False, serves as hints. Defaults to None.	`None`
`allowed_relation_types`	`list[str] \| None`	List of allowed relationship types. Behavior depends on strict_mode. Defaults to None.	`None`
`kg_validation_schema`	`dict[str, list[str]] \| None`	Validation schema for strict mode. Maps entity types to their allowed outgoing relationship types. Format: {"ENTITY_TYPE": ["ALLOWED_REL1", "ALLOWED_REL2"], ...} Example: {"PERSON": ["WORKS_AT", "FOUNDED"], "ORGANIZATION": ["LOCATED_IN"]} Defaults to None.	`None`
`strict_mode`	`bool`	If True, uses SchemaLLMPathExtractor with strict validation. If False (default), uses DynamicLLMPathExtractor with optional guidance. Defaults to False.	`False`
`kg_extractors`	`list[TransformComponent] \| None`	Custom list of extractors. If provided, overrides automatic extractor selection based on strict_mode. Defaults to None.	`None`
`embed_model`	`BaseEmbedding \| None`	Embedding model for vector representations. Defaults to None.	`None`
`vector_store`	`BasePydanticVectorStore \| None`	Storage for vector data. Defaults to None.	`None`
`max_triplets_per_chunk`	`int`	Maximum triplets to extract per chunk. Defaults to 10.	`10`
`num_workers`	`int`	Number of parallel workers. Defaults to 4.	`4`
`**kwargs`	`Any`	Additional keyword arguments.	`{}`

`delete(**kwargs)`

Delete elements from the knowledge graph.

This method deletes elements from the knowledge graph based on the provided document_id.

Parameters:

Name	Type	Description	Default
`**kwargs`	`Any`	Additional keyword arguments.	`{}`

Raises:

Type	Description
`ValueError`	If document_id is not provided.
`Exception`	If an error occurs during deletion.

`index(elements, **kwargs)`

Index elements into the graph.

This method indexes elements into the graph.

Notes: - Currently only Neo4jPropertyGraphStore that is supported for indexing the metadata from the TextNode. - The 'document_id' parameter is used to specify the document ID for the elements. - The 'chunk_id' parameter is used to specify the chunk ID for the elements.

Parameters:

Name	Type	Description	Default
`elements`	`list[Element] \| list[dict[str, Any]]`	List of elements or list of dictionaries representing elements to be indexed.	required
`**kwargs`	`Any`	Additional keyword arguments.	`{}`

`resolve_entities()`

Resolve entities in the graph.

Currently, this method does nothing.