lightbench.metrics¶
- class lightbench.metrics.llm_judge.LLMJudge(judge_model_name='gpt-4o-mini')¶
Bases:
object
- get_score(prompt, response, max_attempts=2)¶
Module for measuring time to first token (TTFT), GPU memory usage, and GPU power usage.
- This module provides the following classes:
TTFT: Measures the time to first token during text generation.
VRAM_NVML: Monitors GPU VRAM usage using NVIDIA’s NVML library.
VRAM_TORCH: Measures GPU VRAM usage using PyTorch CUDA utilities.
PowerUsage: Measures and tracks GPU power usage via NVML.
- Dependencies:
time: For performance timing.
torch: For GPU memory measurements.
pynvml: For interfacing with NVIDIA Management Library.
transformers: For text streaming with transformer models.
- class lightbench.metrics.metrics.PowerUsage(gpu_index=0, DEBUG: bool = False)¶
Bases:
object
Class to measure and track GPU power usage using NVML.
- DEBUG¶
Flag to enable debug output.
- Type:
bool
- handle¶
NVML handle for the specified GPU.
- power_samples¶
List to store power usage measurements in watts.
- Type:
list
- get_average()¶
Calculate and return the average GPU power usage from the recorded samples.
- Returns:
Average power usage in watts. Returns 0.0 if no samples exist.
- Return type:
float
- kill()¶
Shutdown NVML to clean up resources.
- measure_power()¶
Measure the current GPU power usage and record the sample.
- Returns:
Current GPU power usage in watts. Returns 0 if measurement is unsupported or fails.
- Return type:
float
- class lightbench.metrics.metrics.TTFT(tokenizer)¶
Bases:
object
Class for measuring Time To First Token (TTFT) during text generation.
- tokenizer¶
A tokenizer instance used for encoding text.
- streamer¶
A TextIteratorStreamer instance that streams text output.
- ttft¶
Float representing the time (in seconds) from start to first token.
- measure_ttft(start_time)¶
Measure the time to first token (TTFT) for a text generation stream.
This function iterates over the text streamer and sets the TTFT value based on the elapsed time since the provided start_time.
- Parameters:
start_time – The starting time (from time.perf_counter()) when generation began.
- class lightbench.metrics.metrics.VRAM_NVML¶
Bases:
object
Class to monitor GPU VRAM usage using NVIDIA’s NVML.
- device_handle¶
NVML handle for the first GPU device.
- _max_memory¶
Tracks the maximum memory used (in bytes).
- measure_vram()¶
Measure and return the peak VRAM usage in gigabytes (GB).
- Returns:
Maximum VRAM usage (in GB) observed so far.
- Return type:
float
- reset()¶
Reset the maximum memory usage by reading the current used memory.
- class lightbench.metrics.metrics.VRAM_TORCH(device: str, DEBUG: bool = False)¶
Bases:
object
Class to measure GPU VRAM usage using PyTorch’s utilities.
- DEBUG¶
Flag to enable debug output.
- Type:
bool
- device¶
The device to monitor (‘cuda’ or ‘mps’).
- Type:
torch.device
- device: str = 'cuda'¶
- measure_vram()¶
Measure the memory usage in gigabytes.
- Returns:
Memory usage (in GB), either peak or current depending on backend.
- Return type:
float
- reset()¶
Reset memory usage statistics based on the device type.