Axiomatic Explanations for Visual Search,
Retrieval, and Similarity Learning

ICLR 2022

Paper Code Talk

Mark Hamilton, Scott Lundberg, Lei Zhang, Stephanie Fu, William T. Freeman

Visual search, recommendation, and contrastive similarity learning power technologies that impact billions of users worldwide. Modern model architectures can be complex and difficult to interpret, and there are several competing techniques one can use to explain a search engine's behavior. We show that the theory of fair credit assignment provides a unique axiomatic solution that generalizes several existing recommendation- and metric-explainability techniques in the literature. Using this formalism, we show when existing approaches violate "fairness" and derive methods that sidestep these shortcomings and naturally handle counterfactual information. More specifically, we show existing approaches implicitly approximate second-order Shapley-Taylor indices and extend CAM, GradCAM, LIME, SHAP, SBSM, and other methods to search engines. These extensions can extract pairwise correspondences between images from trained opaque-box models. We also introduce a fast kernel-based method for estimating Shapley-Taylor indices that require orders of magnitude fewer function evaluations to converge. Finally, we show that these game-theoretic measures yield more consistent explanations for image similarity architectures.

Explaining search, contrastive learning, and recommendation systems

Search, recommendation, retrieval, and contrastive similarity learning help us organize information at scales that no human could match. The recent surge in million and billion parameter contrastive architectures for vision and language underscore the growing need to understand these classes of systems. This work identifies two distinct classes of explainability methods for these systems. "First order" approaches highlight the most important pixels that contribute to the similarity of objects and "Second order'' explanations provide a full correspondence between the parts of query and retrieved image. Our approaches allow one to understand the predictions and similarity judgements of contrastive architectures without  access to the model weights (opaque-box).

Generalizing Shapley Values with the Shapley-Taylor Indices

Shapley Values connect the fields of economics and machine learning and provide a principled framework for explaining classifiers and regressors. In economics, these values are the unique, fair way to pay employees based on their contributions. In machine learning, these values provide the only fair way to attribute a model's predictions to input features. We show that this connection between economics and ML extends to a generalization of the Shapley Values called the Shapley-Taylor Index. In economics, this allows us to calculate fair payments for groups of employees. In ML, this measures the interaction between features enabling the extraction of dense image correspondences from search, contrastive, and recommendation systems without knowing the inner workings of the model.

Fast Approximations of Shapley-Taylor Indices

Though Shapley Values and Shapley-Taylor indices are the unique measures we can uses to understand models, they are difficult to compute for nonlinear models. We show that existing methods such as CAM, GradCAM, and LIME are implicitly approximating these values, and use this to design new generalizations of these methods for contrastive, search, and recommendation architectures. When generalizing LIME to handle higher-order interactions, we discovered a new estimator for Shapley-Taylor indices using Kernel-weighted quadratic regression which converges with 10x fewer function evaluations.



  title={Axiomatic Explanations for Visual Search, Retrieval, and Similarity Learning},
   author={Hamilton, Mark and Lundberg, Scott and Zhang, Lei and Fu, Stephanie and Freeman, William T},
  journal={arXiv preprint arXiv:2103.00370},