予測の解釈がしたい(SHAPを使う) - あれもPython,これもPython

Limeでレコード単位での予測の把握をすることはできますが、それを全体化したものを見てみたい場合があります。そうした時はSHAPを使うと便利です。

コード

セットアップ

import shap
shap.initjs()

#ランダムフォレストなどのモデルを渡す
ex = shap.TreeExplainer(mdl)
#トレインデータを渡す
shap_v = ex.shap_values(boston_X)

可視化

個別データの可視化

まずはLimeと同様に個別のデータを見てみます

shap.force_plot(
    ex.expected_value,
    shap_v[15,:],
    boston_train_X[15,:], #対象レコード
    feature_names = boston['feature_names']
)

f:id:esu-ko:20200920104314p:plain

各レコードを同時に可視化してみます。元の順番や、目的変数の大きさで並べることができます。

shap.force_plot(
    base_value=ex.expected_value,
    shap_values=shap_v, 
    feature_names=boston['feature_names']
)

f:id:esu-ko:20200920104354p:plain f:id:esu-ko:20200920104406p:plain ]

特徴そのものの確認

特徴のインパクトの大きさや、特徴量内の大きさがどう影響しているのかを平面で表現できます。

#特徴量のインパクト
shap.summary_plot(
    shap_v,
    boston_test_X, #予測データ
    plot_type="bar",
    feature_names = boston['feature_names']
)

shap.summary_plot(
    shap_v,
    boston_test_X,
    feature_names = boston['feature_names']
)

f:id:esu-ko:20200920104436p:plain f:id:esu-ko:20200920104449p:plain

個別の変数の増減、他の変数との関係、SHAP値との関係も見ることができます。

shap.dependence_plot(
    ind='ZN',
    interaction_index = 'AGE',
    shap_values=shap_v,
    #features=pd.DataFrame(boston_X,columns =load_boston()['feature_names'])
    features = boston_test_X,
    feature_names = boston['feature_names']
)

f:id:esu-ko:20200920104502p:plain ZNが40くらいまでは加工し、そこからは上昇していき、その群はAGEが40以下くらい、といったところでしょうか。