У нас вы можете посмотреть бесплатно The effectiveness of feature attribution methods & its correlation with automatic evaluation scores или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
Paper: The effectiveness of feature attribution methods & its correlation with automatic evaluation scores @ NeurIPS2021. Authors: Giang Nguyen, Daeyoung Kim, Anh Nguyen Link to paper: https://arxiv.org/abs/2105.14944 Link to project: http://anhnguyen.me/project/feature-a... Link to slide: https://www.dropbox.com/s/ksps2dgvik3... User experiment interface: • NeurIPS 2021 paper user interface demo Any inquiries are welcome to: (1) nguyengiangbkhn@gmail.com (2) anh.ng8@gmail.com Abstract: Explaining the decisions of an Artificial Intelligence (AI) model is increasingly critical in many real-world, high-stake applications. Hundreds of papers have either proposed new feature attribution methods, discussed or harnessed these tools in their work. However, despite humans being the target end-users, most attribution methods were only evaluated on proxy automatic-evaluation metrics (Zhang et al. 2018; Zhou et al. 2016; Petsiuk et al. 2018). In this paper, we conduct the first user study to measure attribution map effectiveness in assisting humans in ImageNet classification and Stanford Dogs fine-grained classification, and when an image is natural or adversarial (i.e., contains adversarial perturbations). Overall, feature attribution is surprisingly not more effective than showing humans nearest training-set examples. On a harder task of fine-grained dog categorization, presenting attribution maps to humans does not help, but instead hurts the performance of human-AI teams compared to AI alone. Importantly, we found automatic attribution-map evaluation measures to correlate poorly with the actual human-AI team performance. Our findings encourage the community to rigorously test their methods on the downstream human-in-the-loop applications and to rethink the existing evaluation metrics.