lightbench.metrics#
- class lightbench.metrics.llm_judge.LLMJudge(judge_model_name='gpt-4o-mini')#
Bases:
object- get_score(prompt, response, max_attempts=2)#
Module for measuring time to first token (TTFT), GPU memory usage, and GPU power usage.
- This module provides the following classes:
GenerationMetrics: Monitors TTFT, GPU VRAM, and power usage.
VRAM_TORCH: Measures GPU VRAM usage using PyTorch CUDA utilities.
PowerUsage: Measures and tracks GPU power usage via NVML.
- class lightbench.metrics.metrics.GenerationMetrics(tokenizer, sample_every: int = 1, device: str = 'cuda', DEBUG: bool = False)#
Bases:
BaseStreamerCollects several generation-time metrics in a single place.
- It measures:
TTFT (Time-To-First-Token)
Average VRAM usage (either via NVML or PyTorch utilities)
Average GPU power consumption (via NVML)
Sampling happens during token streaming. Set
sample_everyto decide how often the measurements are taken, defaults tosample_every = 5.- Parameters:
tokenizer (transformers.PreTrainedTokenizerBase) – Tokenizer used by your model; required to build a
TextIteratorStreamer.sample_every (int, default = 5) – Frequency (in tokens) at which VRAM & power are sampled. Must be > 0.
device (str, default = "cuda") – Device string, NVIDIA
torchand AppleMetal (mps)are supported.use_nvml (bool, default = False) – If
Truewe try to use NVML for VRAM. If NVML is unavailable oruse_nvml=Falsewe fall back to the PyTorch memory utilities.DEBUG (bool, default = False) – Emit verbose messages when something goes wrong.
- property avg_power: float#
- property avg_vram: float#
- end()#
Function that is called by .generate() to signal the end of generation
- put(value)#
Function that is called by .generate() to push new tokens
- reset()#
- set_start_time()#
- property ttft: float | None#
- class lightbench.metrics.metrics.PowerUsage(gpu_index=0, DEBUG: bool = False)#
Bases:
objectClass to measure and track GPU power usage using NVML.
- DEBUG#
Flag to enable debug output.
- Type:
bool
- handle#
NVML handle for the specified GPU.
- power_samples#
List to store power usage measurements in watts.
- Type:
list
- get_average()#
Calculate and return the average GPU power usage from the recorded samples.
- Returns:
Average power usage in watts. Returns 0.0 if no samples exist.
- Return type:
float
- kill()#
Shutdown NVML to clean up resources.
- measure_power()#
Measure the current GPU power usage and record the sample.
- Returns:
Current GPU power usage in watts. Returns 0 if measurement is unsupported or fails.
- Return type:
float
- class lightbench.metrics.metrics.VRAM_NVML#
Bases:
objectClass to monitor GPU VRAM usage using NVIDIA’s NVML.
Deprecated since version 0.1.0: The VRAM_NVML class is deprecated. Use VRAM_TORCH instead.
- device_handle#
NVML handle for the first GPU device.
- _max_memory#
Tracks the maximum memory used (in bytes).
- measure_vram()#
Measure and return the peak VRAM usage in gigabytes (GB).
- Returns:
Maximum VRAM usage (in GB) observed so far.
- Return type:
float
- reset()#
Reset the maximum memory usage by reading the current used memory.
- class lightbench.metrics.metrics.VRAM_TORCH(device: str, DEBUG: bool = False)#
Bases:
objectClass to measure GPU VRAM usage using PyTorch’s utilities.
- DEBUG#
Flag to enable debug output.
- Type:
bool
- device#
The device to monitor (‘cuda’ or ‘mps’).
- Type:
torch.device
- measure_vram() float#
Measure the memory usage in gigabytes.
- Returns:
Memory usage (in GB), either peak or current depending on backend.
- Return type:
float
- reset()#
Reset memory usage statistics based on the device type.