Graph Transformer

Graph transformer module for extracting structured graph data from text.

`GraphDocument`

Bases: BaseModel

Represents a graph document consisting of nodes and relationships.

Attributes:

Name	Type	Description
`nodes`	`list[Node]`	A list of nodes in the graph.
`relationships`	`list[Relationship]`	A list of relationships in the graph.
`source`	`Chunk \| None`	The document from which the graph information is derived.

`LMBasedGraphTransformer(lm_invoker=None, model_id=None, credentials=None, config=None, allowed_nodes=None, allowed_relationships=None, prompt_builder=None, strict_mode=True, use_structured_output=False)`

Transform documents into graph-based documents using a language model.

This class orchestrates the extraction of knowledge graphs from text documents using a language model. It handles prompt construction, model invocation, and parsing of results into a standardized graph format.

The transformer can be configured with constraints on node and relationship types, and supports both structured and unstructured output formats from the language model.

Attributes:

Name	Type	Description
`lm_invoker`		The language model invoker used to generate graph data.
`allowed_nodes`		List of allowed node types to constrain the output.
`allowed_relationships`		List of allowed relationship types to constrain the output.
`strict_mode`		Whether to strictly enforce node and relationship type constraints.
`use_structured_output`		Whether to use the LM's structured output capabilities.
`prompt_builder`		The prompt builder used to generate prompts for the LM.

Example

Using an existing LM invoker:

from gllm_inference.lm_invoker import OpenAILMInvoker
from gllm_misc.graph_transformer import LMBasedGraphTransformer
from gllm_core.schema import Chunk

# Create an LM invoker
invoker = OpenAILMInvoker(model_name="gpt-4o-mini", api_key="sk-proj-123")

# Create a graph transformer with constraints
transformer = LMBasedGraphTransformer(
    lm_invoker=invoker,
    allowed_nodes=["Person", "Organization", "Event"],
    allowed_relationships=["WORKS_AT", "PARTICIPATES_IN"]
)

# Extract graph from text
chunks = [Chunk(content="Elon Musk is the CEO of SpaceX and Tesla.")]
graph_docs = await transformer.convert_to_graph_documents(chunks)

Building LM invoker from model ID:

from gllm_misc.graph_transformer import LMBasedGraphTransformer
from gllm_core.schema import Chunk

# Create a graph transformer by specifying model ID
transformer = LMBasedGraphTransformer(
    model_id="openai/gpt-4o-mini",
    allowed_nodes=["Person", "Organization", "Event"],
    allowed_relationships=["WORKS_AT", "PARTICIPATES_IN"]
)

# Extract graph from text
chunks = [Chunk(content="Elon Musk is the CEO of SpaceX and Tesla.")]
graph_docs = await transformer.convert_to_graph_documents(chunks)

Initialize the LMBasedGraphTransformer.

Parameters:

Name	Type	Description	Default
`lm_invoker`	`BaseLMInvoker \| None`	The language model invoker to use for generating graph data. Either lm_invoker or model_id must be provided. Defaults to None.	`None`
`model_id`	`str \| ModelId \| None`	The model ID to use for building the LM invoker. Required if lm_invoker is not provided. Defaults to None.	`None`
`credentials`	`str \| dict[str, Any] \| None`	Credentials for the LM model. Used when building the LM invoker. Defaults to None.	`None`
`config`	`dict[str, Any] \| None`	Configuration for the LM model. Used when building the LM invoker. Defaults to None.	`None`
`allowed_nodes`	`list[str] \| None`	Optional list of allowed node types. If provided, the transformer will constrain output to only use these node types. Defaults to None.	`None`
`allowed_relationships`	`list[str] \| list[tuple[str, str, str]] \| None`	Optional list of allowed relationship types. Can be either a list of strings or a list of tuples in the format (source_type, relationship_type, target_type). Defaults to None.	`None`
`prompt_builder`	`PromptBuilder \| None`	Optional custom prompt builder. Defaults to a prompt builder created using get_default_prompt().	`None`
`strict_mode`	`bool`	Determines whether the transformer should apply filtering to strictly adhere to allowed_nodes and allowed_relationships. Defaults to True.	`True`
`use_structured_output`	`bool`	Indicates whether the transformer should use the language model's native structured output functionality. Defaults to False.	`False`

`convert_to_graph_documents(documents)` `async`

Asynchronously convert a sequence of text chunks into graph documents.

This method processes multiple chunks in parallel by creating asyncio tasks for each document and gathering their results. Each chunk is processed independently using the process_response method.

Parameters:

Name	Type	Description	Default
`documents`	`list[Chunk]`	A list of Chunk objects containing the text content to transform into graphs.	required

Returns:

Type	Description
`list[GraphDocument]`	list[GraphDocument]: A list of GraphDocument objects, each containing nodes and relationships extracted from the corresponding input chunk.

Example

chunks = [
    Chunk(content="Alice works at Acme Corp."),
    Chunk(content="Bob is the CEO of TechStart.")
]
graph_docs = await transformer.convert_to_graph_documents(chunks)
# Returns a list of two GraphDocument objects

`process_response(chunk)` `async`

Process a single text chunk and transform it into a graph document.

This method handles the core transformation logic, including: 1. Formatting the prompt with the chunk content 2. Invoking the language model 3. Parsing the response into nodes and relationships 4. Applying filtering based on allowed node and relationship types if strict_mode is enabled

Parameters:

Name	Type	Description	Default
`chunk`	`Chunk`	A Chunk object containing the text content to transform into a graph	required

Returns:

Name	Type	Description
`GraphDocument`	`GraphDocument`	A GraphDocument containing the extracted nodes and relationships

`LMBasedMindMapTransformer(lm_invoker=None, model_id=None, credentials=None, config=None, allowed_nodes=None, allowed_relationships=None, root_node_type=None, prompt_builder=None, strict_mode=True, use_structured_output=True)`

Bases: LMBasedGraphTransformer

Transform documents into mindmap-based knowledge representations using a language model.

This class extends LMBasedGraphTransformer to specifically handle the extraction of hierarchical mind map structures from text documents. It uses a specialized prompt and default node/relationship types designed for mind mapping, organizing content into central themes, main ideas, and sub-ideas.

The transformer maintains the hierarchical structure of ideas while converting text into a graph representation that can be visualized as a mind map.

The transformer extends LMBasedGraphTransformer by adding specialized functionality for merging root nodes and pruning disconnected graph documents to ensure the mind map forms a connected graph stemming from a single root node.

Attributes:

Name	Type	Description
`lm_invoker`	`BaseLMInvoker`	The language model invoker used to generate graph data.
`allowed_nodes`	`list[str]`	List of allowed node types to constrain the output. Defaults to MINDMAP_DEFAULT_ALLOWED_NODES (CentralTheme, MainIdea, SubIdea).
`allowed_relationships`	`list[str] \| list[tuple[str, str, str]]`	List of allowed relationship types to constrain the output. Defaults to MINDMAP_DEFAULT_ALLOWED_RELATIONSHIPS which define the hierarchical structure of a mind map.
`root_node_type`	`str`	The type of the root node. Defaults to MINDMAP_DEFAULT_ROOT_NODE_TYPE (CentralTheme).
`strict_mode`	`bool`	Whether to strictly enforce node and relationship type constraints. Defaults to True.
`use_structured_output`	`bool`	Whether to use the LM's structured output capabilities. Defaults to True.
`prompt_builder`	`PromptBuilder`	The prompt builder used to generate prompts for the LM. If None, creates a default prompt builder with the mind map extraction prompt.

Example

Using an existing LM invoker:

from gllm_inference.lm_invoker import OpenAILMInvoker
from gllm_misc.graph_transformer import LMBasedMindMapTransformer
from gllm_core.schema import Chunk

# Create an LM invoker
invoker = OpenAILMInvoker(model_name="gpt-4o-mini", api_key="sk-proj-123")

# Create a mind map transformer
transformer = LMBasedMindMapTransformer(lm_invoker=invoker)

# Extract mind map from text
chunks = [Chunk(content="Artificial Intelligence is transforming industries...")]
mind_map_docs = await transformer.convert_to_graph_documents(chunks)

Building LM invoker from model ID:

from gllm_misc.graph_transformer import LMBasedMindMapTransformer
from gllm_core.schema import Chunk

# Create a mind map transformer by specifying model ID
transformer = LMBasedMindMapTransformer(
    model_id="openai/gpt-4o-mini",
)

# Extract mind map from text
chunks = [Chunk(content="Artificial Intelligence is transforming industries...")]
mind_map_docs = await transformer.convert_to_graph_documents(chunks)

Initialize the LMBasedMindMapTransformer with the specified configuration.

This constructor sets up the mind map transformer with appropriate defaults for mind map extraction if not explicitly provided.

Parameters:

Name	Type	Description	Default
`lm_invoker`	`BaseLMInvoker \| None`	The language model invoker to use for generating graph data. Either lm_invoker or model_id must be provided. Defaults to None.	`None`
`model_id`	`str \| ModelId \| None`	The model ID to use for building the LM invoker. Required if lm_invoker is not provided. Defaults to None.	`None`
`credentials`	`str \| dict[str, Any] \| None`	Credentials for the LM model. Used when building the LM invoker. Defaults to None.	`None`
`config`	`dict[str, Any] \| None`	Configuration for the LM model. Used when building the LM invoker. Defaults to None.	`None`
`allowed_nodes`	`list[str] \| None`	Optional list of allowed node types. If None, defaults to MINDMAP_DEFAULT_ALLOWED_NODES (CentralTheme, MainIdea, SubIdea). Defaults to None.	`None`
`allowed_relationships`	`list[str] \| list[tuple[str, str, str]] \| None`	Optional list of allowed relationship types. If None, defaults to MINDMAP_DEFAULT_ALLOWED_RELATIONSHIPS which define the hierarchical structure of a mind map. Defaults to None.	`None`
`root_node_type`	`str \| None`	Optional root node type. If None, uses the default root node type MINDMAP_DEFAULT_ROOT_NODE_TYPE. Defaults to None.	`None`
`prompt_builder`	`PromptBuilder \| None`	Optional prompt builder. If None, creates a default prompt builder with the mind map extraction prompt. Defaults to None.	`None`
`strict_mode`	`bool`	Whether to strictly enforce node and relationship type constraints. Defaults to True.	`True`
`use_structured_output`	`bool`	Whether to use the LM's structured output capabilities. Defaults to True.	`True`

Raises:

Type	Description
`ValueError`	If root_node_type is not in allowed_nodes or if both/none of lm_invoker and lm_model_id are provided.

`convert_to_graph_documents(documents)` `async`

Asynchronously convert a sequence of text chunks into graph documents.

This method overrides the parent class to ensure that each graph document has its root nodes merged after processing.

Parameters:

Name	Type	Description	Default
`documents`	`list[Chunk]`	A list of Chunk objects containing the text content to transform into graphs.	required

Returns:

Type	Description
`list[GraphDocument]`	list[GraphDocument]: A list of GraphDocument objects, each containing nodes and relationships extracted from the corresponding input chunk, with merged root nodes.

`Node`

Bases: BaseModel

Represents a node in a graph with associated properties.

Attributes:

Name	Type	Description
`id`	`str`	A unique identifier for the node.
`type`	`str`	The type or label of the node. Defaults to "Node".
`properties`	`dict`	Additional properties and metadata associated with the node.

`Relationship`

Bases: BaseModel

Represents a directed relationship between two nodes in a graph.

Attributes:

Name	Type	Description
`source`	`Node`	The source node of the relationship.
`target`	`Node`	The target node of the relationship.
`type`	`str`	The type of the relationship. Defaults to "Relationship".
`properties`	`dict`	Additional properties associated with the relationship.

Graph Transformer

GraphDocument

LMBasedGraphTransformer(lm_invoker=None, model_id=None, credentials=None, config=None, allowed_nodes=None, allowed_relationships=None, prompt_builder=None, strict_mode=True, use_structured_output=False)

convert_to_graph_documents(documents) async

process_response(chunk) async

LMBasedMindMapTransformer(lm_invoker=None, model_id=None, credentials=None, config=None, allowed_nodes=None, allowed_relationships=None, root_node_type=None, prompt_builder=None, strict_mode=True, use_structured_output=True)

convert_to_graph_documents(documents) async

Node

Relationship

`GraphDocument`

`LMBasedGraphTransformer(lm_invoker=None, model_id=None, credentials=None, config=None, allowed_nodes=None, allowed_relationships=None, prompt_builder=None, strict_mode=True, use_structured_output=False)`

`convert_to_graph_documents(documents)` `async`

`process_response(chunk)` `async`

`LMBasedMindMapTransformer(lm_invoker=None, model_id=None, credentials=None, config=None, allowed_nodes=None, allowed_relationships=None, root_node_type=None, prompt_builder=None, strict_mode=True, use_structured_output=True)`

`convert_to_graph_documents(documents)` `async`

`Node`

`Relationship`