SHAP LIME Model Explainability Explainable AI Implementation

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.
Showing 1 of 1 servicesAll 1566 services
SHAP LIME Model Explainability Explainable AI Implementation
Medium
~3-5 business days
FAQ
AI Development Areas
AI Solution Development Stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1215
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_websites_belfingroup_462_0.webp
    Website development for BELFINGROUP
    852
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_logo-advance_0.png
    B2B Advance company logo design
    561
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    823

SHAP and LIME: ML Model Explainability

An XGBoost model achieves AUC 0.91 on validation. In production, unexpected predictions appear — high scores for clearly irrelevant objects. Feature importance from the boosting itself shows top-10 features, but doesn't explain a specific prediction. Why did this specific object get a score of 0.87?

SHAP and LIME answer different versions of this question. It's important to understand when to apply each method and where they break down.

SHAP: Theory and Practice

SHAP (SHapley Additive exPlanations, Lundberg & Lee, 2017) is based on Shapley value theory from cooperative game theory. Idea: the contribution of each feature to the prediction = its average marginal impact across all possible feature coalitions.

Key property: additivity. The sum of SHAP values for all features + base value (average model prediction) = the specific prediction. This is a mathematically exact decomposition, not an approximation.

TreeSHAP — Why Architectural Specialization Matters

For tree-based models (XGBoost, LightGBM, CatBoost, sklearn RandomForest), there's TreeSHAP — an algorithm with polynomial complexity O(TLD²), where T is the number of trees, L is the maximum number of leaves, D is depth. This is orders of magnitude faster than naive KernelSHAP.

import shap
import xgboost as xgb

model = xgb.XGBClassifier()
model.fit(X_train, y_train)

explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)

# Waterfall plot for a specific prediction
shap.plots.waterfall(explainer(X_test)[0])

# Summary plot — global importance
shap.summary_plot(shap_values, X_test)

In practice, TreeSHAP on a LightGBM model with 500 trees processes 10,000 examples in 2-3 seconds on CPU. Perfectly acceptable for batch inference.

DeepSHAP and GradientSHAP for Neural Networks

For neural networks, TreeSHAP is inapplicable. We use:

  • GradientSHAP (DeepLIFT + SHAP): integrates gradients along the path from baseline to input. Faster than KernelSHAP but requires differentiability.
  • KernelSHAP: model-agnostic, works for any black-box model. Slow — explaining one object requires 2^n-2 queries to the model. In practice, sampling is used (nsamples=100-1000).

For BERT and transformers — a separate story. SHAP on transformers via partition explainer works, but latency for explaining one text can reach 30-60 seconds with 512 tokens. For production, a trade-off is usually needed: explanations are generated asynchronously on request.

LIME: Local Approximation

LIME (Locally Interpretable Model-agnostic Explanations, Ribeiro et al., 2016) works differently: around an object, a random cloud of perturbations is generated, predictions are obtained for each from the black-box model, then a simple interpretable model (linear regression or decision tree) is trained on this cloud.

When LIME is better than SHAP:

  • Model isn't supported by TreeSHAP and is too slow for KernelSHAP
  • Need explanations in terms of "super-pixels" for images (LIME for CV)
  • Need text explainability with word highlighting (LIME for NLP)

Stability problem: LIME is a stochastic algorithm. With different random_state values, explanations for the same object can differ significantly. In production, we fix the seed and use a large number of perturbations (num_samples=5000+).

from lime.lime_tabular import LimeTabularExplainer

explainer = LimeTabularExplainer(
    X_train.values,
    feature_names=feature_names,
    class_names=['negative', 'positive'],
    mode='classification'
)

explanation = explainer.explain_instance(
    X_test.values[0],
    model.predict_proba,
    num_features=10,
    num_samples=5000
)
explanation.show_in_notebook()

Method Comparison

Characteristic TreeSHAP KernelSHAP LIME
Applicability Trees only Any model Any model
Mathematical accuracy Exact Exact Approximation
Stability Deterministic Deterministic Stochastic
Speed (10k objects) Seconds Hours Minutes
Text/image support No No natively Yes

Integration into Production ML Pipeline

Explanations are needed not only for auditing — they're part of the operational pipeline.

Real case: client is an insurance company, premium calculation model (LightGBM, 120 features). Requirement: agent must be able to explain to customer over the phone why the premium is high.

Solution: TreeSHAP in inference API. For each prediction, top-3 features with highest SHAP values are returned + automatic text template. "Your premium is above average for the following reasons: vehicle age (impact +12%), registration region (impact +8%), payment history (impact +6%)".

Latency overhead: 35ms for TreeSHAP with average inference of 18ms — acceptable.

Monitoring: SHAP values are logged to ClickHouse. Once a week we aggregate — drift in SHAP value distribution signals feature drift earlier than AUC drop.

Limitations to Know About

SHAP ≠ causality. High SHAP value for a feature means correlation with the prediction, not causation. "Feature X impacts prediction" ≠ "changing X will change the result in reality".

Multicollinearity breaks interpretation. If two features correlate (r > 0.8), SHAP divides their influence arbitrarily. When interpreting results, correlation analysis is needed.

For LLMs — both methods give rough estimates. Attention weights are more informative for generation tasks, but also aren't a strict proxy for importance.

Timeline: implementing SHAP/LIME in an existing pipeline — 1-2 weeks. Building a monitoring pipeline with SHAP-based drift detection — 3-4 weeks.