lightbench.metrics¶

class lightbench.metrics.llm_judge.LLMJudge(judge_model_name='gpt-4o-mini')¶

Bases: object

get_score(prompt, response, max_attempts=2)¶

Module for measuring time to first token (TTFT), GPU memory usage, and GPU power usage.

This module provides the following classes:

TTFT: Measures the time to first token during text generation.
VRAM_NVML: Monitors GPU VRAM usage using NVIDIA’s NVML library.
VRAM_TORCH: Measures GPU VRAM usage using PyTorch CUDA utilities.
PowerUsage: Measures and tracks GPU power usage via NVML.

Dependencies:

time: For performance timing.
torch: For GPU memory measurements.
pynvml: For interfacing with NVIDIA Management Library.
transformers: For text streaming with transformer models.

class lightbench.metrics.metrics.PowerUsage(gpu_index=0, DEBUG: bool = False)¶

Bases: object

Class to measure and track GPU power usage using NVML.

DEBUG¶

Flag to enable debug output.

Type:: bool

handle¶: NVML handle for the specified GPU.

power_samples¶

List to store power usage measurements in watts.

Type:: list

get_average()¶

Calculate and return the average GPU power usage from the recorded samples.

Returns:: Average power usage in watts. Returns 0.0 if no samples exist.
Return type:: float

kill()¶: Shutdown NVML to clean up resources.

measure_power()¶

Measure the current GPU power usage and record the sample.

Returns:: Current GPU power usage in watts. Returns 0 if measurement is unsupported or fails.
Return type:: float

class lightbench.metrics.metrics.TTFT(tokenizer)¶

Bases: object

Class for measuring Time To First Token (TTFT) during text generation.

tokenizer¶: A tokenizer instance used for encoding text.

streamer¶: A TextIteratorStreamer instance that streams text output.

ttft¶: Float representing the time (in seconds) from start to first token.

measure_ttft(start_time)¶

Measure the time to first token (TTFT) for a text generation stream.

This function iterates over the text streamer and sets the TTFT value based on the elapsed time since the provided start_time.

Parameters:: start_time – The starting time (from time.perf_counter()) when generation began.

class lightbench.metrics.metrics.VRAM_NVML¶

Bases: object

Class to monitor GPU VRAM usage using NVIDIA’s NVML.

device_handle¶: NVML handle for the first GPU device.

_max_memory¶: Tracks the maximum memory used (in bytes).

measure_vram()¶

Measure and return the peak VRAM usage in gigabytes (GB).

Returns:: Maximum VRAM usage (in GB) observed so far.
Return type:: float

reset()¶: Reset the maximum memory usage by reading the current used memory.

class lightbench.metrics.metrics.VRAM_TORCH(device: str, DEBUG: bool = False)¶

Bases: object

Class to measure GPU VRAM usage using PyTorch’s utilities.

DEBUG¶

Flag to enable debug output.

Type:: bool

device¶

The device to monitor (‘cuda’ or ‘mps’).

Type:: torch.device

device: str = 'cuda'¶

measure_vram()¶

Measure the memory usage in gigabytes.

Returns:: Memory usage (in GB), either peak or current depending on backend.
Return type:: float

reset()¶: Reset memory usage statistics based on the device type.