Clarity: a Deep Ensemble for Visual Counterfactual Explanations

Room A008

explainability

frugality

Author

Brieuc Conan-Guez

Published

March 4, 2025

Counterfactual visual explanations are aimed at identifying changes in an image that will modify the prediction of a classifier. Unlike adversarial images, counterfactuals are required to be realistic. For this reason generative models such as variational autoencoders (VAE) have been used to restrain the search of counterfactuals on the data manifold.

However such gradient-based approaches remain limited even when they deal with simple datasets such as MNIST. Conjecturing that these limitations result from a plateau effect which makes the gradient noisy and less informative, we improve the gradient estimation by training an ensemble of classifiers directly in the latent space of VAEs. Several experiments show that the resulting method called Clarity delivers counterfactual images of high-quality, competitive with the state-of-the-art.

All the concepts (VAE, ensemble of models, epistemic uncertainty) will be explained during the presentation.