Lm based graph transformer
Module for transforming text documents into graph-based knowledge representations using LLMs.
This module is ported from Langchain's implementation of LLMBasedGraphTransformer.
References
[1] https://github.com/langchain-ai/langchain-experimental/blob/5cdbf02e3771da35d1438bb7597be4575610684e/libs/experimental/langchain_experimental/graph_transformers/llm.py
LMBasedGraphTransformer(lm_invoker, allowed_nodes=None, allowed_relationships=None, prompt_builder=None, strict_mode=True, use_structured_output=True)
Transform documents into graph-based documents using a language model.
This class orchestrates the extraction of knowledge graphs from text documents using a language model. It handles prompt construction, model invocation, and parsing of results into a standardized graph format.
The transformer can be configured with constraints on node and relationship types, and supports both structured and unstructured output formats from the language model.
Attributes:
Name | Type | Description |
---|---|---|
lm_invoker |
The language model invoker used to generate graph data. |
|
allowed_nodes |
List of allowed node types to constrain the output. |
|
allowed_relationships |
List of allowed relationship types to constrain the output. |
|
strict_mode |
Whether to strictly enforce node and relationship type constraints. |
|
use_structured_output |
Whether to use the LM's structured output capabilities. |
|
prompt_builder |
The prompt builder used to generate prompts for the LM. |
Example
from gllm_inference.lm_invoker import OpenAILMInvoker
from gllm_misc.graph_transformer import LMBasedGraphTransformer
from gllm_core.schema import Chunk
# Create an LM invoker
invoker = OpenAILMInvoker(model_name="gpt-4o-mini", api_key="sk-proj-123")
# Create a graph transformer with constraints
transformer = LMBasedGraphTransformer(
lm_invoker=invoker,
allowed_nodes=["Person", "Organization", "Event"],
allowed_relationships=["WORKS_AT", "PARTICIPATES_IN"]
)
# Extract graph from text
chunks = [Chunk(content="Elon Musk is the CEO of SpaceX and Tesla.")]
graph_docs = await transformer.convert_to_graph_documents(chunks)
Initialize the LMBasedGraphTransformer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
lm_invoker |
BaseLMInvoker
|
The language model invoker to use for generating graph data. |
required |
allowed_nodes |
list[str] | None
|
Optional list of allowed node types. If provided, the transformer will constrain output to only use these node types. Defaults to None. |
None
|
allowed_relationships |
list[str] | list[tuple[str, str, str]] | None
|
Optional list of allowed relationship types. Can be either a list of strings or a list of tuples in the format (source_type, relationship_type, target_type). Defaults to None. |
None
|
prompt_builder |
PromptBuilder | None
|
Optional custom prompt builder. Defaults to a prompt builder created using get_default_prompt(). |
None
|
strict_mode |
bool
|
Determines whether the transformer should apply filtering to strictly adhere to allowed_nodes and allowed_relationships. Defaults to True. |
True
|
use_structured_output |
bool
|
Indicates whether the transformer should use the language model's native structured output functionality. Defaults to True. |
True
|
convert_to_graph_documents(documents)
async
Asynchronously convert a sequence of text chunks into graph documents.
This method processes multiple chunks in parallel by creating asyncio tasks for each document and gathering their results. Each chunk is processed independently using the process_response method.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
documents |
list[Chunk]
|
A list of Chunk objects containing the text content to transform into graphs. |
required |
Returns:
Type | Description |
---|---|
list[GraphDocument]
|
list[GraphDocument]: A list of GraphDocument objects, each containing nodes and relationships extracted from the corresponding input chunk. |
Example
chunks = [
Chunk(content="Alice works at Acme Corp."),
Chunk(content="Bob is the CEO of TechStart.")
]
graph_docs = await transformer.convert_to_graph_documents(chunks)
# Returns a list of two GraphDocument objects
process_response(chunk)
async
Process a single text chunk and transform it into a graph document.
This method handles the core transformation logic, including: 1. Formatting the prompt with the chunk content 2. Invoking the language model 3. Parsing the response into nodes and relationships 4. Applying filtering based on allowed node and relationship types if strict_mode is enabled
Parameters:
Name | Type | Description | Default |
---|---|---|---|
chunk |
Chunk
|
A Chunk object containing the text content to transform into a graph |
required |
Returns:
Name | Type | Description |
---|---|---|
GraphDocument |
GraphDocument
|
A GraphDocument containing the extracted nodes and relationships |
Raises:
Type | Description |
---|---|
JSONDecodeError
|
If the unstructured output cannot be parsed as valid JSON |
ValueError
|
If the structured output does not conform to the expected schema |