Model Interpretability with Integrated Gradients


< 목차 >


Motivation

Ingegrated Gradients (IG)

Integrated Gradients (IG) 란 무엇일까요?

IG_talk_slide_fig1 Fig.

IG_talk_slide_fig2 Fig.

IG_talk_slide_fig3 Fig.

IG_talk_slide_fig4 Fig.

Various Explanation Methods

Rethinking Attention-Model Explainability through Faithfulness Violation Test 라는 논문에서는 모델이 왜그런 결과를 냈는지를 해석하는 걸 IG를 포함한 다양한 기존의 방법론들로 비교해봤습니다.

Generic Attention-based Explanation Methods

  • Inherent Attention Explanation (RawAtt)
  • Attention Gradient (AttGrad)
  • Attention InputNorm (AttIN)

Transformer-based Explanation Methods

  • Partial Layer-wise Relevance Propagation (PLRP)
  • Rollout
  • Transformer Attention Attribution (TransAtt)
  • Generic Attention Attribution (GenAtt)

Gradient-based Attribution Method

  • Input Gradient (InputGrad)
  • Integrated Gradients (IG)

References