Skip to content

Utils

Utility modules for use in the GLLM Core package.

BinaryHandlingStrategy

Bases: StrEnum

Enum for binary data handling options.

Attributes:

Name Type Description
SKIP str

Skip binary data.

BASE64 str

Encode binary data to base64.

HEX str

Encode binary data to hex.

SHOW_SIZE str

Show the size of binary data.

ChunkMetadataMerger(merger_func_map=None, default_merger_func=None, retained_keys=None)

A helper class to merge metadata from multiple chunks.

Attributes:

Name Type Description
merger_func_map dict[str, Callable[[list[Any]], Any]]

A mapping of metadata keys to merger functions.

default_merger_func Callable[[list[Any]], Any]

The default merger function for metadata keys that are not present in the merger_func_map.

retained_keys set[str] | None

The keys that should be retained in the merged metadata. If None, all intersection keys are retained.

Initializes a new instance of the ChunkMetadataMerger class.

Parameters:

Name Type Description Default
merger_func_map dict[str, Callable[[list[Any]], Any]] | None

A mapping of metadata keys to merger functions. Defaults to None, in which case a default merger map is used. The default merger map: 1. Picks the first value of the PREV_CHUNK_ID key. 2. Picks the last value of the NEXT_CHUNK_ID key.

None
default_merger_func Callable[[list[Any]], Any] | None

The default merger for metadata keys that are not present in the merger_func_map. Defaults to None, in which case a default merger that picks the first value is used.

None
retained_keys set[str] | None

The keys that should be retained in the merged metadata. Defaults to None, in which case all intersection keys are retained.

None

merge(metadatas)

Merges metadata from multiple chunks.

Parameters:

Name Type Description Default
metadatas list[dict[str, Any]]

The metadata to merge.

required

Returns:

Type Description
dict[str, Any]

dict[str, Any]: The merged metadata.

LoggerManager

A singleton class to manage logging configuration.

This class ensures that the root logger is initialized only once and is used across the application.

There are two logging modes: 1. TEXT (Default): Uses the ColoredFormatter to add colors based on log level. 2. JSON: Uses the JsonFormatter to format the log record as a JSON object.

Switch between the two by setting the environment variable LOG_FORMAT to json or text.

Get and use the logger:

manager = LoggerManager()

logger = manager.get_logger()

logger.info("This is an info message")

Set logging configuration:

manager = LoggerManager()

manager.set_level(logging.DEBUG)
manager.set_log_format(custom_log_format)
manager.set_date_format(custom_date_format)

Add a custom handler:

manager = LoggerManager()

handler = logging.FileHandler("app.log")
manager.add_handler(handler)

Log stack traces and exceptions:

try:
    1 / 0
except Exception as e:
    logger.error("Exception occurred", exc_info=True)

During errors, pass the error code using the extra parameter:

logger.error("I am dead!", extra={"error_code": "ERR_CONN_REFUSED"})

Output format in text mode:

[16/04/2025 15:08:18.323 Test INFO] Message

Output format in JSON mode:

{"timestamp": "2025-08-11T19:40:30+07:00", "name": "Test", "level": "INFO", "message": "Message"}

__new__()

Initialize the singleton instance.

Returns:

Name Type Description
LoggerManager LoggerManager

The singleton instance.

add_handler(handler)

Add a custom handler to the root logger.

Parameters:

Name Type Description Default
handler Handler

The handler to add to the root logger.

required

get_logger(name=None)

Get a logger instance.

This method returns a logger instance that is a child of the root logger. If name is not provided, the root logger will be returned instead.

Parameters:

Name Type Description Default
name str | None

The name of the child logger. If None, the root logger will be returned. Defaults to None.

None

Returns:

Type Description
Logger

logging.Logger: Configured logger instance.

set_date_format(date_format)

Set date format for all loggers in the hierarchy.

Parameters:

Name Type Description Default
date_format str

The date format to set.

required

set_level(level)

Set logging level for all loggers in the hierarchy.

Parameters:

Name Type Description Default
level int

The logging level to set (e.g., logging.INFO, logging.DEBUG).

required

set_log_format(log_format)

Set logging format for all loggers in the hierarchy.

Parameters:

Name Type Description Default
log_format str

The log format to set.

required

MergerMethod

A collection of merger methods.

concatenate(delimiter='-') staticmethod

Creates a function that concatenates a list of values with a delimiter.

Parameters:

Name Type Description Default
delimiter str

The delimiter to use when concatenating the values. Defaults to "-".

'-'

Returns:

Type Description
Callable[[list[Any]], str]

Callable[[list[Any]], str]: A function that concatenates a list of values with the delimiter.

merge_overlapping_strings(delimiter='\n') staticmethod

Creates a function that merges a list of strings, handling common prefixes and overlaps.

The created function will: - Identify and remove any common prefix shared by the strings. - Process each pair of adjacent strings to remove overlapping strings. - Join the cleaned strings together, including the common prefix at the beginning.

Parameters:

Name Type Description Default
delimiter str

The delimiter to use when merging the values. Defaults to "\n".

'\n'

Returns:

Type Description
Callable[[list[str]], str]

Callable[[list[str]], str]: A function that merges a list of strings, handling common prefixes and overlaps.

pick_first(values) staticmethod

Picks the first value from a list of values.

Parameters:

Name Type Description Default
values list[Any]

The values to pick from.

required

Returns:

Name Type Description
Any Any

The first value from the list.

pick_last(values) staticmethod

Picks the last value from a list of values.

Parameters:

Name Type Description Default
values list[Any]

The values to pick from.

required

Returns:

Name Type Description
Any Any

The last value from the list.

RunAnalyzer(cls)

Bases: NodeVisitor

AST NodeVisitor that analyzes a class to build a RunProfile.

The run analyzer visits the AST nodes of a class to analyze the _run method and build a RunProfile. It will look for the usage of the **kwargs parameter in method calls and subscript expressions. The traversal result is stored as a RunProfile object.

Attributes:

Name Type Description
cls type

The class to analyze.

profile RunProfile

The profile of the run being analyzed.

Initialize the RunAnalyzer with a class.

Parameters:

Name Type Description Default
cls type

The class to analyze.

required

visit_Call(node)

Visit a Call node in the AST.

This node represents a function call in the source code. Here, we are looking for calls to methods that fully pass the kwargs.

Parameters:

Name Type Description Default
node Call

The Call node to visit.

required

visit_Subscript(node)

Visit a Subscript node in the AST.

The Subscript node represents a subscripted value in the source code. Example: kwargs["key"]

Parameters:

Name Type Description Default
node Subscript

The Subscript node to visit.

required

format_chunk_message(chunk, rank=None, include_score=True, include_metadata=True)

Formats a log to display a single chunk.

Parameters:

Name Type Description Default
chunk Chunk

The chunk to be formatted.

required
rank int

The optional rank of the formatted chunk. Defaults to None.

None
include_score bool

Whether to include the score in the formatted message. Defaults to True.

True
include_metadata bool

Whether to include the metadata in the formatted message. Defaults to True.

True

Returns:

Name Type Description
str str

A formatted log message that displays information about the logged chunk.

get_placeholder_keys(template)

Extracts keys from a template string based on a regex pattern.

This function searches the template for placeholders enclosed in single curly braces {} and ignores any placeholders within double curly braces {{}}. It returns a list of the keys found.

Parameters:

Name Type Description Default
template str

The template string containing placeholders.

required

Returns:

Type Description
list[str]

list[str]: A list of keys extracted from the template.

load_gsheets(client_email, private_key, sheet_id, worksheet_id)

Loads data from a Google Sheets worksheet.

This function retrieves data from a Google Sheets worksheet using service account credentials. It authorizes the client, selects the specified worksheet, and reads the worksheet data. The first row of the worksheet will be treated as the column names.

Parameters:

Name Type Description Default
client_email str

The client email associated with the service account.

required
private_key str

The private key used for authentication.

required
sheet_id str

The ID of the Google Sheet.

required
worksheet_id str

The ID of the worksheet within the Google Sheet.

required

Returns:

Type Description
list[dict[str, str]]

list[dict[str, str]]: A list of dictionaries containing the Google Sheets content.

setup_logger(name=None, level=logging.INFO, log_format=TEXT_LOG_FORMAT, date_format=DEFAULT_DATE_FORMAT)

Set up and configure a logger instance.

This function is a wrapper around the logging.getLogger function.

Parameters:

Name Type Description Default
name str | None

The name of the logger. Defaults to None.

None
level int

The logging level. Defaults to logging.INFO.

INFO
log_format str

Custom log format. Defaults to TEXT_LOG_FORMAT.

TEXT_LOG_FORMAT
date_format str

Custom date format. Defaults to DEFAULT_DATE_FORMAT.

DEFAULT_DATE_FORMAT

Returns:

Type Description
Logger

logging.Logger: Configured logger instance.