Opening the "black box" of deep learning in automated screening of eye diseases

C. González-Gonzalo, B. Liefers, A. Vaidyanathan, H. van Zeeland, C. Klaver and C. Sánchez

Association for Research in Vision and Ophthalmology 2019.

Purpose: Systems based on deep learning (DL) have demonstrated to provide a scalable and high-performance solution for screening of eye diseases. However, DL is usually considered a "black box? due to lack of interpretability. We propose a deep visualization framework to explain the decisions made by a DL system, iteratively unveiling abnormalities responsible for referable predictions without needing lesion-level annotations. We apply the framework to automated screening of diabetic retinopathy (DR) in color fundus images (CFIs).

Methods: The proposed framework consists of a baseline deep convolutional neural network to classify CFIs by DR stage. For each CFI classified as referable DR, the framework extracts initial visual evidence of the predicted stage by computing a saliency map, which indicates regions in the image that would contribute the most to changes in the prediction if modified. This provides localization of abnormalities that are then removed through selective inpainting. The image is again classified, expecting reduced referability. We iteratively apply this procedure to increase attention to less discriminative areas and generate refined visual evidence. The Kaggle DR database, with CFIs graded regarding DR severity (stages 0 and 1: non-referable DR, stages 2 to 4: referable DR), is used for training and validation of the image-level classification task. For validation of the obtained visual evidence, we used the DiaretDB1 dataset, which contains CFIs with manually-delineated areas for 4 types of lesions: hemorrhages, microaneurysms, hard and soft exudates.

Results: The baseline classifier obtained an area under the Receiver Operating Characteristic (ROC) curve of 0.93 and a quadratic weighted kappa of 0.77 on the Kaggle test set (53576 CFIs). Free-response ROC (FROC) curves (Figure 2) analyze the correspondence between highlighted areas and each type of lesion for those images classified as referable DR in the DiaretDB1 dataset (62 CFIs), comparing between initial and refined visual evidence.

Conclusions : The proposed framework provides visual evidence for the decisions made by a DL system, iteratively unveiling abnormalities in CFIs based on the prediction of a classifier trained only with image-level labels. This provides a "key? to open the "black box? of artificial intelligence in screening of eye diseases, aiming to increase experts' trust and facilitate its integration in screening settings.