API Reference#

Data Handling#

class llmSHAP.data_handler.DataHandler(data, permanent_keys=None, mask_token='')[source]#

Bases: object

Parameters:
  • data (DataMapping | str)

  • permanent_keys (Set[str] | Set[Index] | None)

  • mask_token (str)

get_data(indexes, *, mask=True, exclude_permanent_keys=False)[source]#

Return a dict view according to the supplied options.

Parameters:
  • indexes (int | Iterable[int])

  • mask (bool)

  • exclude_permanent_keys (bool)

Return type:

Dict[Any, Any]

get_feature_enumeration()[source]#
Return type:

Dict[int, str]

get_keys(*, exclude_permanent_keys=False)[source]#

List of indexes, optionally excluding permanent ones.

Parameters:

exclude_permanent_keys (bool)

Return type:

list[int]

image_list(indexes, *, exclude_permanent_keys=False)[source]#

Return a list of the available images at the given indexes.

Parameters:
  • indexes (int | Iterable[int])

  • exclude_permanent_keys (bool)

Return type:

list[Image]

remove(indexes, *, mask=True)[source]#

Return a copy where the chosen indexes are either masked (mask=True) or removed (mask=False). self.data is unchanged.

Parameters:
  • indexes (int | Iterable[int])

  • mask (bool)

Return type:

Dict[Any, Any]

remove_hard(indexes)[source]#

Delete the selected indexes in-place. Returns the live mapping.

Parameters:

indexes (int | Iterable[int])

Return type:

Dict[Any, Any]

to_string(indexes=None, *, mask=True, exclude_permanent_keys=False)[source]#

Join the chosen indexes into one space-separated string.

Parameters:
  • indexes (int | Iterable[int] | None)

  • mask (bool)

  • exclude_permanent_keys (bool)

Return type:

str

tool_list(indexes, *, exclude_permanent_keys=False)[source]#

Return a list of the available tools at the given indexes.

Parameters:
  • indexes (int | Iterable[int])

  • exclude_permanent_keys (bool)

Return type:

list[Any]

Prompt Codecs#

class llmSHAP.prompt_codec.BasicPromptCodec(system='')[source]#

Bases: PromptCodec

Parameters:

system (str)

build_prompt(data_handler, indexes)[source]#

(Encode) Build prompt to send to the model.

Parameters:
  • data_handler (DataHandler)

  • indexes (int | Iterable[int])

Return type:

Any

get_images(data_handler, indexes)[source]#

Retreive the available images at the given indexes. Defaults to an emty list.

Parameters:
  • data_handler (DataHandler)

  • indexes (int | Iterable[int])

Return type:

list[Any]

get_tools(data_handler, indexes)[source]#

Retreive the available tools at the given indexes. Defaults to an emty list.

Parameters:
  • data_handler (DataHandler)

  • indexes (int | Iterable[int])

Return type:

list[Any]

parse_generation(model_output)[source]#

(Decode) Parse model generation into a structured result.

Parameters:

model_output (str)

Return type:

Generation

class llmSHAP.prompt_codec.PromptCodec[source]#

Bases: ABC

abstractmethod build_prompt(data_handler, indexes)[source]#

(Encode) Build prompt to send to the model.

Parameters:
  • data_handler (DataHandler)

  • indexes (int | Iterable[int])

Return type:

Any

get_images(data_handler, indexes)[source]#

Retreive the available images at the given indexes. Defaults to an emty list.

Parameters:
  • data_handler (DataHandler)

  • indexes (int | Iterable[int])

Return type:

list[Any]

get_tools(data_handler, indexes)[source]#

Retreive the available tools at the given indexes. Defaults to an emty list.

Parameters:
  • data_handler (DataHandler)

  • indexes (int | Iterable[int])

Return type:

list[Any]

abstractmethod parse_generation(model_output)[source]#

(Decode) Parse model generation into a structured result.

Parameters:

model_output (str)

Return type:

Generation

LLM Interfaces#

class llmSHAP.llm.llm_interface.LLMInterface[source]#

Bases: ABC

abstractmethod generate(prompt, tools=None, images=None)[source]#
Parameters:
  • prompt (Any)

  • tools (list[Any] | None)

  • images (list[Any] | None)

Return type:

str

class llmSHAP.llm.openai.OpenAIInterface(*, model_name, temperature=None, max_tokens=512, reasoning=None, max_retries=5, timeout=600.0, backoff_base=1.0, backoff_max=30.0)[source]#

Bases: LLMInterface

OpenAI Responses API interface with llmSHAP-managed retry behavior.

Retries are handled entirely by llmSHAP. The underlying OpenAI client is constructed with max_retries=1, otherwise the retry budget and backoff are controlled by max_retries, backoff_base, and backoff_max on this interface.

Requests use an explicit default timeout of 600.0 seconds (10 minutes) rather than inheriting the OpenAI SDK’s default timeout.

Parameters:
  • model_name (str) – OpenAI model identifier to use for generation.

  • temperature (float | None) – Sampling temperature. Set to None (default) to omit the parameter for models that do not support temperature.

  • max_tokens (int) – Maximum number of output tokens to request.

  • reasoning (str | None) – Optional reasoning effort for reasoning-capable models.

  • max_retries (int) – Number of llmSHAP-managed retries after the initial request.

  • timeout (float) – Request timeout in seconds passed to the underlying OpenAI client.

  • backoff_base (float) – Base delay in seconds for exponential backoff.

  • backoff_max (float) – Maximum backoff delay in seconds.

generate(prompt, tools=None, images=None)[source]#
Parameters:
  • prompt (Any)

  • tools (list[Any] | None)

  • images (list[Any] | None)

Return type:

str

Generations#

class llmSHAP.generation.Generation(output: str)[source]#

Bases: object

Parameters:

output (str)

output: str#

Value Functions#

class llmSHAP.value_functions.EmbeddingCosineSimilarity(model_name=None, api_url_endpoint=None)[source]#

Bases: ValueFunction

Embedding-based cosine similarity between two generations.

This value function supports two backends:

  1. Local sentence-transformers model (default).

  2. OpenAI-compatible embeddings API when api_url_endpoint is provided.

Parameters:
  • model_name (str | None) – Embedding model identifier. Defaults to sentence-transformers/all-MiniLM-L6-v2 in local mode. In API mode, if this argument is omitted or left at the local default, it is mapped automatically to text-embedding-3-small.

  • api_url_endpoint (str | None) – Optional base URL for an OpenAI-compatible embeddings API endpoint, for example https://api.openai.com/v1 or a self-hosted proxy. When set, local sentence-transformers are not initialized. Requires OPENAI_API_KEY when provided.

Notes

  • Returns 0.0 if either compared output is empty/whitespace.

  • Uses an internal LRU cache to avoid recomputing repeated pairs.

  • Local mode loads the sentence-transformers model lazily and shares it across instances.

DEFAULT_API_EMBEDDING_MODEL: ClassVar[str] = 'text-embedding-3-small'#
DEFAULT_LOCAL_EMBEDDING_MODEL: ClassVar[str] = 'sentence-transformers/all-MiniLM-L6-v2'#
class llmSHAP.value_functions.TFIDFCosineSimilarity[source]#

Bases: ValueFunction

Minimal TF-IDF cosine similarity between two generation outputs.

Notes

  • Computes TF-IDF on the compared pair only (2-document corpus).

  • Returns 0.0 if either text is empty/whitespace.

  • Tokenization uses the regex (?u)bww+b: includes 2+ character word tokens and splits on punctuation.

Example

“hello, world!” -> [“hello”, “world”] “state-of-the-art” -> [“state”, “of”, “the”, “art”] “a b c test” -> [“test”]

class llmSHAP.value_functions.ValueFunction[source]#

Bases: ABC

Attribution Methods#

class llmSHAP.attribution_methods.attribution_function.AttributionFunction(model, data_handler, prompt_codec, use_cache=False, verbose=True, logging=False, log_filename='log', value_function=None)[source]#

Bases: object

Parameters:
class llmSHAP.attribution_methods.coalition_sampler.CoalitionSampler[source]#

Bases: ABC

class llmSHAP.attribution_methods.coalition_sampler.CounterfactualSampler[source]#

Bases: CoalitionSampler

class llmSHAP.attribution_methods.coalition_sampler.FullEnumerationSampler(num_players)[source]#

Bases: CoalitionSampler

Parameters:

num_players (int)

class llmSHAP.attribution_methods.coalition_sampler.RandomSampler(sampling_ratio, seed=None)[source]#

Bases: CoalitionSampler

Parameters:
  • sampling_ratio (float)

  • seed (int | None)

class llmSHAP.attribution_methods.coalition_sampler.SlidingWindowSampler(ordered_keys, w_size, stride=1)[source]#

Bases: CoalitionSampler

Parameters:
  • ordered_keys (List[Index])

  • w_size (int)

  • stride (int)

class llmSHAP.attribution_methods.shapley_attribution.ShapleyAttribution(model, data_handler, prompt_codec, sampler=None, use_cache=False, verbose=True, logging=False, num_threads=1, value_function=None)[source]#

Bases: AttributionFunction

Parameters:
attribution()[source]#