A Swedish prospective study evaluated Lunit INSIGHT MMG (Lunit), an AI tool for assisting breast cancer screening, using mammograms from 54,991 women aged 40–74 years. The AI system was integrated into the ScreenTrustCAD trial, a real-world screening workflow at Capio Sankt Görans Hospital, where each mammogram was interpreted independently by two radiologists and the AI CAD. If any reader flagged an exam as suspicious, it was referred to a two-radiologist consensus discussion to determine recall for further diagnostic evaluation.
The study compared recall rates and positive predictive value (PPV) based on who flagged the exam. When AI alone flagged a case, only 4.6% were recalled, but the PPV was 22%. In contrast, one radiologist alone led to a 14.2% recall rate with a PPV of 3.4%. For cases flagged by AI plus one radiologist, 38.6% were recalled (PPV = 25.0%), versus 57.2% recall (PPV = 2.5%) for two radiologists. When all three flagged a case, 82.6% were recalled, with a PPV of 34.2%.
The study concludes that AI-flagged exams were more precise, leading to fewer recalls but a higher proportion of true cancers, highlighting the potential of AI to reduce unnecessary recalls in breast cancer screening.
Read full study
Human-AI Interaction in the ScreenTrustCAD Trial: Recall Proportion and Positive Predictive Value Related to Screening Mammograms Flagged by AI CAD versus a Human Reader
Radiology, 2025
Abstract
Background
The ScreenTrustCAD trial was a prospective study that evaluated the cancer detection rates for combinations of artificial intelligence (AI) computer-aided detection (CAD) and two radiologists. The results raised concerns about the tendency of radiologists to agree with AI CAD too much (when AI CAD made an erroneous flagging) or too little (when AI CAD made a correct flagging).
Purpose
To evaluate differences in recall proportion and positive predictive value (PPV) related to which reader flagged the mammogram for consensus discussion: AI CAD and/or radiologists.
Materials and Methods
Participants were enrolled from April 2021 to June 2022, and each examination was interpreted by three independent readers: two radiologists and AI CAD, after which positive findings were forwarded to the consensus discussion. For each combination of readers flagging an examination, the proportion recalled and the PPV were calculated by dividing the number of pathologic evaluation–verified cancers by the number of positive examinations.
Results
The study included 54 991 women (median age, 55 years [IQR, 46–65 years]), among whom 5489 were flagged for consensus discussion and 1348 were recalled. For examinations flagged by one reader, the proportion recalled after flagging by one radiologist was larger (14.2% [263 of 1858]) compared with flagging by AI CAD (4.6% [86 of 1886]) (P < .001), whereas the PPV of breast cancer was lower (3.4% [nine of 263] vs 22% [19 of 86]) (P < .001). For examinations flagged by two readers, the proportion recalled after flagging by two radiologists was larger (57.2% [360 of 629]) compared with flagging by AI CAD and one radiologist (38.6% [244 of 632]) (P < .001), whereas the PPV was lower (2.5% [nine of 360] vs 25.0% [61 of 244]) (P < .001). For examinations flagged by all three readers, the proportion recalled was 82.6% (400 of 484) and the PPV was 34.2 (137 of 400).
Conclusion
A larger proportion of participants were recalled after initial flagging by radiologists compared with those flagged by AI CAD, with a lower proportion of cancer.