API Reference#
Data Handling#
- class llmSHAP.data_handler.DataHandler(data, permanent_keys=None, mask_token='')[source]#
Bases:
object- Parameters:
data (DataMapping | str)
permanent_keys (Set[str] | Set[Index] | None)
mask_token (str)
- get_data(indexes, *, mask=True, exclude_permanent_keys=False)[source]#
Return a dict view according to the supplied options.
- Parameters:
indexes (int | Iterable[int])
mask (bool)
exclude_permanent_keys (bool)
- Return type:
Dict[Any, Any]
- get_keys(*, exclude_permanent_keys=False)[source]#
List of indexes, optionally excluding permanent ones.
- Parameters:
exclude_permanent_keys (bool)
- Return type:
list[int]
- image_list(indexes, *, exclude_permanent_keys=False)[source]#
Return a list of the available images at the given indexes.
- Parameters:
indexes (int | Iterable[int])
exclude_permanent_keys (bool)
- Return type:
list[Image]
- remove(indexes, *, mask=True)[source]#
Return a copy where the chosen indexes are either masked (mask=True) or removed (mask=False). self.data is unchanged.
- Parameters:
indexes (int | Iterable[int])
mask (bool)
- Return type:
Dict[Any, Any]
- remove_hard(indexes)[source]#
Delete the selected indexes in-place. Returns the live mapping.
- Parameters:
indexes (int | Iterable[int])
- Return type:
Dict[Any, Any]
Prompt Codecs#
- class llmSHAP.prompt_codec.BasicPromptCodec(system='')[source]#
Bases:
PromptCodec- Parameters:
system (str)
- build_prompt(data_handler, indexes)[source]#
(Encode) Build prompt to send to the model.
- Parameters:
data_handler (DataHandler)
indexes (int | Iterable[int])
- Return type:
Any
- get_images(data_handler, indexes)[source]#
Retreive the available images at the given indexes. Defaults to an emty list.
- Parameters:
data_handler (DataHandler)
indexes (int | Iterable[int])
- Return type:
list[Any]
- get_tools(data_handler, indexes)[source]#
Retreive the available tools at the given indexes. Defaults to an emty list.
- Parameters:
data_handler (DataHandler)
indexes (int | Iterable[int])
- Return type:
list[Any]
- class llmSHAP.prompt_codec.PromptCodec[source]#
Bases:
ABC- abstractmethod build_prompt(data_handler, indexes)[source]#
(Encode) Build prompt to send to the model.
- Parameters:
data_handler (DataHandler)
indexes (int | Iterable[int])
- Return type:
Any
- get_images(data_handler, indexes)[source]#
Retreive the available images at the given indexes. Defaults to an emty list.
- Parameters:
data_handler (DataHandler)
indexes (int | Iterable[int])
- Return type:
list[Any]
- get_tools(data_handler, indexes)[source]#
Retreive the available tools at the given indexes. Defaults to an emty list.
- Parameters:
data_handler (DataHandler)
indexes (int | Iterable[int])
- Return type:
list[Any]
LLM Interfaces#
- class llmSHAP.llm.openai.OpenAIInterface(*, model_name, temperature=None, max_tokens=512, reasoning=None, max_retries=5, timeout=600.0, backoff_base=1.0, backoff_max=30.0)[source]#
Bases:
LLMInterfaceOpenAI Responses API interface with llmSHAP-managed retry behavior.
Retries are handled entirely by llmSHAP. The underlying
OpenAIclient is constructed withmax_retries=1, otherwise the retry budget and backoff are controlled bymax_retries,backoff_base, andbackoff_maxon this interface.Requests use an explicit default timeout of
600.0seconds (10 minutes) rather than inheriting the OpenAI SDK’s default timeout.- Parameters:
model_name (str) – OpenAI model identifier to use for generation.
temperature (float | None) – Sampling temperature. Set to None (default) to omit the parameter for models that do not support temperature.
max_tokens (int) – Maximum number of output tokens to request.
reasoning (str | None) – Optional reasoning effort for reasoning-capable models.
max_retries (int) – Number of llmSHAP-managed retries after the initial request.
timeout (float) – Request timeout in seconds passed to the underlying OpenAI client.
backoff_base (float) – Base delay in seconds for exponential backoff.
backoff_max (float) – Maximum backoff delay in seconds.
Generations#
Value Functions#
- class llmSHAP.value_functions.EmbeddingCosineSimilarity(model_name=None, api_url_endpoint=None)[source]#
Bases:
ValueFunctionEmbedding-based cosine similarity between two generations.
This value function supports two backends:
Local
sentence-transformersmodel (default).OpenAI-compatible embeddings API when
api_url_endpointis provided.
- Parameters:
model_name (str | None) – Embedding model identifier. Defaults to
sentence-transformers/all-MiniLM-L6-v2in local mode. In API mode, if this argument is omitted or left at the local default, it is mapped automatically totext-embedding-3-small.api_url_endpoint (str | None) – Optional base URL for an OpenAI-compatible embeddings API endpoint, for example
https://api.openai.com/v1or a self-hosted proxy. When set, localsentence-transformersare not initialized. RequiresOPENAI_API_KEYwhen provided.
Notes
Returns
0.0if either compared output is empty/whitespace.Uses an internal LRU cache to avoid recomputing repeated pairs.
Local mode loads the sentence-transformers model lazily and shares it across instances.
- DEFAULT_API_EMBEDDING_MODEL: ClassVar[str] = 'text-embedding-3-small'#
- DEFAULT_LOCAL_EMBEDDING_MODEL: ClassVar[str] = 'sentence-transformers/all-MiniLM-L6-v2'#
- class llmSHAP.value_functions.TFIDFCosineSimilarity[source]#
Bases:
ValueFunctionMinimal TF-IDF cosine similarity between two generation outputs.
Notes
Computes TF-IDF on the compared pair only (2-document corpus).
Returns 0.0 if either text is empty/whitespace.
Tokenization uses the regex (?u)bww+b: includes 2+ character word tokens and splits on punctuation.
Example
“hello, world!” -> [“hello”, “world”] “state-of-the-art” -> [“state”, “of”, “the”, “art”] “a b c test” -> [“test”]
Attribution Methods#
- class llmSHAP.attribution_methods.attribution_function.AttributionFunction(model, data_handler, prompt_codec, use_cache=False, verbose=True, logging=False, log_filename='log', value_function=None)[source]#
Bases:
object- Parameters:
model (LLMInterface)
data_handler (DataHandler)
prompt_codec (PromptCodec)
use_cache (bool)
verbose (bool)
logging (bool)
log_filename (str)
value_function (ValueFunction | None)
- class llmSHAP.attribution_methods.coalition_sampler.CounterfactualSampler[source]#
Bases:
CoalitionSampler
- class llmSHAP.attribution_methods.coalition_sampler.FullEnumerationSampler(num_players)[source]#
Bases:
CoalitionSampler- Parameters:
num_players (int)
- class llmSHAP.attribution_methods.coalition_sampler.RandomSampler(sampling_ratio, seed=None)[source]#
Bases:
CoalitionSampler- Parameters:
sampling_ratio (float)
seed (int | None)
- class llmSHAP.attribution_methods.coalition_sampler.SlidingWindowSampler(ordered_keys, w_size, stride=1)[source]#
Bases:
CoalitionSampler- Parameters:
ordered_keys (List[Index])
w_size (int)
stride (int)
- class llmSHAP.attribution_methods.shapley_attribution.ShapleyAttribution(model, data_handler, prompt_codec, sampler=None, use_cache=False, verbose=True, logging=False, num_threads=1, value_function=None)[source]#
Bases:
AttributionFunction- Parameters:
model (LLMInterface)
data_handler (DataHandler)
prompt_codec (PromptCodec)
sampler (CoalitionSampler | None)
use_cache (bool)
verbose (bool)
logging (bool)
num_threads (int)
value_function (ValueFunction | None)