Verify any claim · lenz.io
Claim analyzed
Health“An artificial intelligence model can detect early-stage breast cancer with approximately 94% accuracy, surpassing the average performance of radiologists.”
The conclusion
The claim conflates AUC/AUROC scores (~0.93) with "accuracy," which are different metrics. The best available meta-analytic evidence reports pooled AI sensitivity of 0.85 and AUC of 0.89 — not 94%. Critically, 2025 RSNA studies show AI misses approximately 14% of cancers, with false negatives concentrated in smaller, early-stage tumors in dense breasts — the very cases the claim highlights. While AI can match or modestly exceed average radiologists in some contexts, the specific "~94% accuracy for early-stage detection" framing significantly overstates the evidence.
Caveats
- The '~94% accuracy' figure conflates AUC/AUROC (a discriminative metric) with overall accuracy — these are not interchangeable, and the best meta-analytic evidence reports pooled AUC of 0.89 and sensitivity of 0.85.
- AI's documented false negatives disproportionately affect smaller, lower-grade, early-stage tumors in dense breast tissue (RSNA 2025), directly undermining the 'early-stage detection' framing of the claim.
- The comparison to 'average radiologist performance' is highly context-dependent — radiologist sensitivity ranges from 63% to 97% across studies, and the most cited AI-vs-radiologist comparison (AUROC 0.93 vs 0.90) was not statistically significant (P=.21).
Sources
Sources used in the analysis
Conclusion: Future studies should evaluate models using digital breast tomosynthesis, examine performance for aggressive or advanced breast cancer, include ... (Note: Review indicates ongoing improvements but does not specify 94% accuracy or direct radiologist comparison in abstract).
Overall, AI failed to detect approximately 14% of cancers (154 of 1,097 cases). These missed cases were more likely to involve younger patients, tumors 2 cm or smaller, low histologic grade, limited lymph node involvement, and Breast Imaging Reporting and Data System (BI-RADS) category 4 assessments.
In the study, AI missed 14% of cancers, with the highest false-negative rate occurring in HR-positive cancers. The technology was more likely to miss cancers that were smaller, lower grade, in dense breast tissue or located outside typical mammary zones. The findings suggest that relying solely on AI could lead to overlooked early-stage but clinically significant cancers, especially in younger women and those with dense breasts.
Of the 1,223 radiologists studied, 31.7% of radiologists had acceptable performance for all metrics, based on external benchmarks. For each metrics individually, 52-77% of radiologists demonstrated performance in the acceptable range.
This meta-analysis of 8 studies involving 120,950 patients compared the diagnostic performance of AI and radiologists, demonstrating that AI exhibits a higher pooled sensitivity (0.85 vs. 0.77 for radiologists) and similar specificity (0.89 vs. 0.90 for radiologists), with significant variability observed in both sensitivity and specificity across studies. AI also showed a superior pooled AUC of 0.89 compared to 0.82 for radiologists, indicating better overall diagnostic accuracy despite variability, suggesting AI's potential to surpass human radiologists in certain contexts.
The AI model achieved an AUROC of 0.93 (95% CI 0.91‐0.95), slightly higher than consultant radiologists (AUROC 0.90, 95% CI 0.89‐0.92; P=. 21). Use of the AI system standalone demonstrated an AUROC of 0.93 (95% CI 0.91-0.95), as compared to 0.90 (95% CI 0.89‐0.92) for consultant radiologists.
AI algorithms have shown improved diagnostic performance, surpassing radiologists in terms of sensitivity and area under the curve (AUC) values. In DBT, AI reduced recall rates by 2–27% and reading times by up to 53%.
Han et al. [75] reported a DL model with an average accuracy of 93.2% across eight classes (four benign and four malignant) in a test dataset. Moreover, a standalone AI system outperformed an average of ten board-certified BRs, with an AUROC improvement of 0.038 (95% CI, 0.028–0.052; p < 0.001).
Most radiologists accurately estimated their recall (78%) and cancer detection (72%) rates, but only 19% and 26% accurately estimated their false-positive and PPV2 rates, respectively. Radiologists perceive their performance to be better than it actually is and at least as good as their peers.
Based on the interpretations of these radiologists, the false-positive rate was 36% (1728 of 4750) and the false-negative rate was 16% (132 of 837). Therefore, the average sensitivity was 84% (705 of 837) (range, 63–97%) and the average specificity was 64% (3022 of 4750) (range, 34–85%).
In the AI implementation group, 66.9% (38,977) of the screenings were single-read, and 33.1% (19,269) were double-read with AI assistance. Compared to screening without AI, screening with the AI system detected significantly more breast cancers (0.82% versus 0.70%) and had a lower false-positive rate (1.63% versus 2.39%).
The use of artificial intelligence in breast cancer screening reduces the rate of a cancer diagnosis by 12% in subsequent years and leads to a higher rate of early detection, according to the first trial of its kind. More than four in five cancer cases (81%) in the AI-supported mammography group were detected at the screening stage, compared with just under three quarters (74%) in the control group.
Using an artificial intelligence (AI)–integrated workflow, DeepHealth, in computer-aided detection of breast cancer from digital breast tomosynthesis exams found 21.6% more cases than the usual standard of care, according to findings from the AI-Supported Safeguard Review Evaluation (ASSURE) study published in Nature Health.
The UK's first comprehensive evaluation... found that it can increase breast cancer detection by 10.4%... compared to the current clinical process. Combining AI as a second reader... resulted in the best combination of workload savings and increased early cancer detection.
A global analysis found that an artificial intelligence (AI) system was able to predict breast cancer more accurately than humans. The authors independently compared the AI system against six radiologists, and the AI system was more accurate in each instance; the area under the receiver operating characteristic curve (AUC-ROC) for the AI system, compared to the AUC-ROC of the average radiologist, was greater by an absolute margin of 11.5%.
A new independent, peer-reviewed study published in the journal Clinical Breast Cancer reinforces the impact of iCAD's ProFound AI in clinical breast imaging, showing meaningful improvements in detection, diagnostic accuracy, and workflow efficiency for radiologists. Radiologists using ProFound AI identified 65% more cancers than those reading without AI support (cancer detection rate of 6.1 vs. 3.7 per 1,000).
To address this, an interdisciplinary team of researchers from MIT and ETH Zurich developed an AI model that can identify the different stages of DCIS from a cheap and easy-to-obtain breast tissue image. When they compared its predictions to conclusions of a pathologist, they found clear agreement in many instances.
Even the most skilled radiologists may miss subtle signs of cancer, especially in dense breasts where tumors are harder to distinguish from normal tissue. In addition, radiologists often face heavy workloads, which increases the possibility of fatigue-related errors. These challenges create a space for AI to assist in screening and interpretation.
AI improves breast cancer detection accuracy for radiologists when reading screening mammograms, helping them devote more of their attention to suspicious areas, according to a study published in Radiology. Breast cancer detection accuracy among the radiologists was higher with AI support compared with unaided reading.
A 2020 Nature study reported an AI model achieving 94.5% AUC on screening mammograms, surpassing average radiologist performance of 90.2% AUC in a reader study with 25,856 images. This early benchmark showed AI exceeding radiologists but lacked prospective RCT validation until recent trials.
Expert review
How each expert evaluated the evidence and arguments
Support for the claim relies mainly on interpreting ~0.93–0.945 AUC/AUROC results (Sources 6, 20) as “~94% accuracy” and then inferring this means AI “surpasses” radiologists, but the broader evidence shows pooled AUC closer to 0.89 and sensitivity ~0.85 (Source 5) and also reports meaningful false negatives (~14%) especially in small/dense-breast cases (Sources 2–3), so the inference to a general early-stage 94% accuracy advantage over radiologists overreaches. Because the claim conflates metrics (AUC vs accuracy), overgeneralizes across contexts, and is not established specifically for early-stage detection at ~94% while consistently beating radiologists, the claim is misleading rather than strictly true or false.
The claim conflates multiple distinct metrics — AUC/AUROC, sensitivity, and overall accuracy — into a single "~94% accuracy" figure, which is misleading framing. The closest supporting evidence (Source 6: AUROC 0.93; Source 20: AUC 94.5%) refers to area-under-the-curve, a discriminative metric, not a simple accuracy percentage, and the meta-analysis in Source 5 reports pooled sensitivity of only 0.85 and AUC of 0.89 — well below 94%. Critically, Sources 2 and 3 (RSNA, 2025) show AI missed ~14% of cancers, with the highest false-negative rates in smaller, lower-grade, early-stage tumors in dense breasts — precisely the cases the claim highlights — directly undermining the "early-stage" framing. The claim also presents AI's superiority over radiologists as settled, while Source 6 shows a non-significant difference (P=.21) and Source 4 notes wide variability in radiologist performance, making the comparison baseline highly context-dependent. While the broader trend of AI performing comparably to or better than average radiologists is supported by the evidence, the specific "~94% accuracy" figure is not robustly established for early-stage detection, and the omission of AI's documented early-stage blind spots makes the overall impression misleading.
The most reliable evidence here is the peer‑reviewed literature indexed/hosted on PubMed/PMC (Sources 1, 5, 6) and RSNA's reporting on Radiology studies (Sources 2, 3, 11, 19): these sources support that some AI systems can match or modestly exceed radiologists on certain metrics (e.g., AUROC ~0.93 in one preliminary study, Source 6) but do not establish a generalizable “~94% accuracy for early-stage breast cancer,” and RSNA reports AI missing ~14% of cancers with misses skewing toward smaller/dense-breast cases (Sources 2–3). Taken together, trustworthy sources indicate AI can be competitive and sometimes better than average radiologists, but the specific, blanket claim of ~94% accuracy for early-stage detection and clear radiologist-surpassing performance is not consistently supported and is therefore misleading.
Expert summary
What do you think of the claim?
The arguments
Two AI advocates debated this claim using the research gathered.
The claim that AI can detect early-stage breast cancer with approximately 94% accuracy surpassing radiologists is well-supported by a convergence of high-authority evidence: Source 6 (PMC) documents an AI model achieving an AUROC of 0.93 (95% CI 0.91–0.95), directly outperforming consultant radiologists (AUROC 0.90), while Source 5 (PMC) — a rigorous meta-analysis of 8 studies covering 120,950 patients — confirms AI's superior pooled sensitivity (0.85 vs. 0.77) and AUC (0.89 vs. 0.82) over radiologists, and Source 20 (LLM Background Knowledge) cites a landmark 2020 Nature study reporting a 94.5% AUC for AI versus 90.2% for radiologists. This body of evidence, further corroborated by Source 8 (PMC) showing a standalone AI system outperforming ten board-certified radiologists and Source 15 (Docwire News) reporting AI exceeding all six compared radiologists by an 11.5% AUC margin, establishes that the ~94% accuracy figure and radiologist-surpassing performance are not outliers but reflect a consistent, reproducible pattern across independent, peer-reviewed studies.
You keep swapping in AUROC/AUC as if it were “~94% accuracy,” but your own best clinical evidence doesn't reach that: the meta-analysis you cite reports pooled sensitivity 0.85 and AUC 0.89 (Source 5, PMC), and the Singapore study is 0.93 AUROC with a non-significant difference vs radiologists (P=.21), which is not “surpassing” in any robust sense (Source 6, PMC).More importantly, you ignore the higher-authority RSNA reports that AI missed ~14% of cancers and disproportionately missed smaller, lower-grade tumors in dense breasts—exactly the early-stage cases the motion claims it detects at ~94%—so your “convergence” is cherry-picked and not about early-stage accuracy at all (Source 2, RSNA; Source 3, RSNA).
The motion's headline “~94% accuracy” is not supported by the higher-authority clinical summaries here: RSNA reports AI missed about 14% of cancers (i.e., sensitivity ~86%), with false negatives concentrated in smaller, early-stage–leaning tumors and dense breasts—exactly where early detection matters—undercutting any blanket 94% accuracy claim (Source 2, RSNA; Source 3, RSNA). Even the supportive peer-reviewed evidence cited does not establish “94% accuracy” or clear superiority over radiologists—one meta-analysis reports pooled sensitivity 0.85 (not 0.94) (Source 5, PMC) and a comparative study shows AUROC 0.93 vs 0.90 with non-significant difference (P=.21), so the claim of surpassing radiologists at ~94% accuracy is overstated and misleading (Source 6, PMC; Source 1, PubMed).
Your conflation of "sensitivity" with "accuracy" is a critical category error — the ~94% figure refers to AUC/AUROC (a holistic accuracy metric), not sensitivity alone, and Source 6 explicitly documents an AI AUROC of 0.93 matching that benchmark while outperforming consultant radiologists, and Source 20 records a 94.5% AUC versus radiologists' 90.2%, neither of which your argument meaningfully refutes. Furthermore, your appeal to the P=.21 non-significance in Source 6 is a textbook argument from silence fallacy — a preliminary study's underpowered p-value does not negate the consistent directional superiority of AI over radiologists confirmed across a meta-analysis of 120,950 patients in Source 5, the 11.5% AUC margin in Source 15, and the standalone outperformance of ten board-certified radiologists in Source 8.