Over the past decade or so, deep neural networks have achieved very promising results on a variety of tasks, including image recognition tasks. Despite their advantages, these networks are very complex and sophisticated, which makes interpreting what they learned and determining the processes behind their predictions difficult or sometimes impossible. This lack of interpretability makes deep neural networks somewhat untrustworthy and unreliable.
Concept whitening: A strategy to improve the interpretability of image recognition models