Assessing the robustness of models is an essential step in developing machine-learning systems. To determine if a model is sound, it often helps to know which and how many input features its output hinges on. This talk introduces the fundamentals of “anchor” explanations that aim to provide that information.
Many data scientists are familiar with algorithms like Integrated Gradients, SHAP, or LIME that determine the importance of input features. But that’s not always the information we need to determine whether a model’s output is sound. Is there a specific feature value that will make or break the decision? Does the outcome solely depend on artifacts in an image? These questions require a different explanation method.
First introduced in 2018, “anchors” are a model-agnostic method to uncover what parts of the input a machine-learning model's output hinges on. Their computation is based on a search-based approach that can be applied to different modalities such as image, text, and tabular data.
In this talk, to truly grok the concept of anchor explanations, we will implement a basic anchor algorithm from scratch. Starting with nothing but a text document and a machine learning model, we will create a sampling, encoding, and search component and finally compute an anchor.
No knowledge of machine learning is required to follow this talk. Aside from familiarity with the basics of `numpy` arrays, all you need is your curiosity.