Verify any claim · lenz.io
Claim analyzed
Health“AI-generated deepfake X-ray images are sufficiently realistic to cause radiologists to make incorrect diagnoses.”
The conclusion
The evidence confirms that AI-generated deepfake X-rays can deceive radiologists — with only 41% spontaneously detecting fakes in a major 2026 study — but it does not demonstrate that this deception causes incorrect diagnoses. The same study found comparable diagnostic accuracy on real versus synthetic images (91.3% vs. 92.4%), undermining the claim's causal assertion. The claim conflates "hard to detect" with "causes misdiagnosis," an inferential leap the available research does not support.
Based on 15 sources: 10 supporting, 0 refuting, 5 neutral.
Caveats
- The claim conflates radiologists' inability to detect deepfake X-rays with a demonstrated increase in incorrect diagnoses — the key study does not measure real-world misdiagnosis rates caused by deepfakes.
- Diagnostic accuracy on AI-generated X-rays was comparable to authentic images (92.4% vs. 91.3%), which does not support the assertion that deepfakes cause more diagnostic errors.
- CT scan tampering evidence (e.g., 99.2% success injecting fake lung cancers) involves a different imaging modality and attack scenario than whole-image X-ray deepfakes, and should not be treated as direct proof of the X-ray-specific claim.
Sources
Sources used in the analysis
Neither radiologists nor multimodal LLMs could easily distinguish AI-generated deepfake X-ray images from authentic ones. Radiologists' accuracy was 75% in detecting ChatGPT-generated images (range 58-92%) and 62-78% for RoentGen chest X-rays, even when aware; no correlation with experience except musculoskeletal subspecialists performed better. Lead author: 'These deepfake X-rays are realistic enough to deceive radiologists.'
Although synthetic data address crucial shortages of real-world training data, their overuse might propagate biases, accelerate model degradation, and compromise generalisability across populations. A concerning consequence of the rapid adoption of synthetic data in medical AI is the emergence of synthetic trust—an unwarranted confidence in models trained on artificially generated datasets that fail to preserve clinical validity or demographic realities.
The generation of synthetic medical images poses a unique set of challenges compared to other domains due to the intricate nature of biological structures and the subtle nuances of imaging biomarkers. Without faithful representation of these biomarkers, synthetic images risk being of limited utility in clinical settings, hindering their adoption for tasks such as training AI models, augmenting datasets, and validating imaging algorithms.
As proof of principle, Mirsky et al [3] showed that they were able to tamper with CT scans and artificially inject or remove lung cancers on the images. When the radiologists were blinded to the attack, this hack had a 99.2% success rate for cancer injection and a 95.8% success rate for cancer removal. Even when the radiologists were warned about the attack, the success of cancer injection decreased to 70%, but the cancer removal success rate remained high at 90%. This illustrates the sophistication and realistic appearance of such artificial images.
A study in Radiology found realistic AI-generated X-rays not easily distinguished by radiologists (moderate performance) or LLMs. Diagnostic accuracy was 91.3% for authentic and 92.4% for AI-generated radiographs, showing AI images sufficiently realistic to maintain high diagnostic reliability. Only 41% of radiologists spontaneously identified AI images when blinded.
We quantify X-ray clinical realism by asking radiologists to distinguish between real and fake scans and find that generates are more likely to be classed as real than by chance, but there is still progress required to achieve true realism. We confirm these findings by evaluating synthetic classification model performance on real scans.
17 radiologists could only differentiate real X-rays from ChatGPT-generated ones 75% accurately even when alerted; only 41% noticed issues blindly while diagnosing. Images generated easily with simple prompts, fooling even the generating model (57-85% detection accuracy), disrupting medical care potential.
“Our study demonstrates that these deepfake X-rays are realistic enough to deceive radiologists, the most highly trained medical image specialists,” the study's lead author Dr. Mickael Tordjman said, “even when they were aware that AI-generated images were present.” Radiologists who were made aware of the fact that these datasets contained AI images fared better than those exposed to the images without any indication of the test's actual purpose, but still not great, showing a mean accuracy of 75%.
AI algorithms can analyse medical images with remarkable accuracy and speed, often surpassing human capabilities, by identifying subtle patterns and anomalies that may be missed by the human eye. This can improve diagnostic accuracy and reduce the risk of misdiagnosis and false negatives. However, the use of AI in healthcare raises ethical questions, and AI algorithms require high-quality, labelled data to train effectively, with biased datasets potentially leading to incorrect or discriminatory outcomes.
A multi-center international study reveals that neither experienced radiologists nor advanced multimodal large language models (LLMs) can reliably distinguish “deepfake” X-rays from authentic ones. Even when warned that synthetic images were present, radiologists only averaged 75% accuracy in identifying them. The findings expose a high-stakes vulnerability in healthcare, ranging from fraudulent litigation (fabricated injuries) to cybersecurity threats where hackers could inject synthetic images into digital medical records to cause clinical chaos.
Both radiologists and advanced AI models have difficulty distinguishing AI-generated X-ray images from real ones. When they did not know that deepfakes were included, only 41 percent recognized them spontaneously. After an explicit warning, the average accuracy rose to 75 percent, with significant variation among individual evaluators. The researchers warn that these so-called deepfakes pose serious risks to the reliability of medical imaging and the safety of healthcare processes. For instance, manipulated images could be used in fraud or legal claims, such as by presenting a nonexistent fracture as real.
A new study published on March 24 in Radiology, the journal of the Radiological Society of North America (RSNA), shows that both radiologists and multimodal large language models (LLMs) have difficulty telling real X-rays apart from artificial intelligence (AI)-generated "deepfake" images. Once they were informed that synthetic images were present, their average accuracy in distinguishing real from fake rose to 75%. Performance varied widely among individuals. Radiologists correctly identified between 58% and 92% of the ChatGPT-generated images.
X-ray images generated by Artificial Intelligence (AI) are now so good they can fool expert radiologists, according to a new study. The findings, published in the journal Radiology come amid a huge boom in AI use in healthcare and raise concerns that these “deepfake” X-rays could be used to deceive insurance companies, employers and even interfere with legal cases. The study asked 17 radiologists from six different countries to assess X-ray images of different parts of the body.
Synthetic radiographs, or AI-generated X-ray images, are becoming increasingly realistic, raising questions about their detectability in clinical practice. While they can be valuable for training or research, their high realism can occasionally blur the line between genuine and artificial images, potentially affecting diagnostic accuracy if unnoticed. When unaware of the study's true purpose, 41% of radiologists spontaneously suspected some images to be AI-generated. After being informed, overall detection accuracy was 75% for GPT-4o radiographs and 70% for those produced by RoentGen.
Prior to 2026, smaller studies (e.g., 2023-2024 GAN-based fakes) showed radiologists detecting ~80-90% but with lower realism; 2026 Radiology study marks advance in diffusion/LLM realism, yet explicitly notes comparable diagnostic accuracy (91.3% vs 92.4%), implying potential but not measuring actual incorrect diagnoses caused.
What do you think of the claim?
Your challenge will appear immediately.
Challenge submitted!
Expert review
How each expert evaluated the evidence and arguments
Expert 1 — The Logic Examiner
Sources 1/7/8/12/14 show radiologists often cannot reliably distinguish AI-generated X-rays from real ones (e.g., ~75% detection when warned; ~41% spontaneous suspicion when blinded), and Source 4 shows analogous realism-driven deception in CT tampering, but none of these logically entails that radiologists therefore make incorrect diagnoses from deepfake X-rays; the only directly diagnosis-linked metric provided (Source 5, echoed by Source 15) reports similar diagnostic accuracy on authentic vs AI-generated radiographs, which does not demonstrate deepfakes causing incorrect diagnoses. Because the pro side's inference largely equates “undetectable/realistic” with “causes incorrect diagnosis” (a non sequitur) and the evidence pool does not directly establish increased misdiagnosis attributable to deepfake X-rays, the claim as stated is not proven and is more likely false on this record.
Expert 2 — The Context Analyst
The claim frames “hard to detect” deepfake X-rays as proof they “cause incorrect diagnoses,” but the key 2026 Radiology/RSNA reporting emphasizes detectability (75% detection when warned; 41% spontaneous suspicion) and even reports similar diagnostic accuracy on authentic vs AI-generated radiographs (91.3% vs 92.4%), which does not demonstrate increased incorrect diagnoses attributable to deepfakes in that study (Sources 1, 5, 15). With full context, the evidence supports that deepfake X-rays can deceive radiologists, but it does not establish that they are sufficiently realistic to cause radiologists to make incorrect diagnoses as a demonstrated outcome, so the claim's overall impression overreaches (Sources 1, 3, 5).
Expert 3 — The Source Auditor
The most reliable evidence is Source 1 (RSNA, 2026-03-24) summarizing a Radiology study showing radiologists and multimodal models struggle to distinguish AI-generated radiographs from real ones (e.g., ~75% detection when warned; only ~41% spontaneous suspicion when blinded), while Source 4 (PMC/NIH review citing Mirsky et al.) shows highly successful clinically deceptive image tampering in CT but is not X-ray-specific; most other outlets (Sources 5, 7, 8, 10, 12, 13, 14) appear to be secondary reportage largely echoing the same Radiology/RSNA study rather than independent verification. Taken together, trustworthy sources strongly support that deepfake radiographs can be realistic enough to fool radiologists, but they do not directly establish that this realism has been shown to cause radiologists to make incorrect diagnoses (and Source 5's reported similar diagnostic accuracy on real vs synthetic images does not demonstrate increased misdiagnosis), so the claim overreaches beyond what the best evidence actually measures.
Expert summary
The arguments
Two AI advocates debated this claim using the research gathered.
Argument for
The claim is decisively supported by a multi-center international study published in Radiology (Source 1, RSNA), the field's most authoritative journal, which found that radiologists achieved only 75% accuracy in detecting AI-generated deepfake X-rays even when explicitly warned — meaning roughly one in four fake images went undetected — and that only 41% of radiologists spontaneously identified synthetic images when blinded, directly demonstrating that these fakes are realistic enough to cause misdiagnosis in routine clinical conditions. This is further corroborated by Source 4 (PMC-NIH), which documented a 99.2% success rate for AI-injected false lung cancers on CT scans when radiologists were unaware, and by Source 5 (News-Medical.net), which confirmed that AI-generated X-rays maintained diagnostic accuracy comparable to authentic images (92.4% vs. 91.3%), proving radiologists cannot distinguish them and would therefore act on fabricated pathology — constituting precisely the kind of incorrect diagnosis the claim describes.
You commit a category error by treating “can't reliably detect fakes” (75% detection; 41% spontaneous suspicion) as “therefore they make incorrect diagnoses,” but the only diagnostic-performance figure in the brief cuts the other way: diagnostic accuracy on AI images was comparable to (even slightly higher than) authentic images, so your argument never shows deepfakes cause more wrong calls (Source 1, RSNA; Source 5, News-Medical.net; Source 15). And your CT-tampering citation is a modality switch that doesn't validate the X-ray claim, while “would therefore act on fabricated pathology” is an unsupported leap because the Radiology/RSNA reporting you cite doesn't measure downstream misdiagnosis rates from deepfake X-rays in routine practice (Source 4, PMC-NIH; Source 1, RSNA).
Argument against
The brief's flagship evidence actually undercuts the motion: the Radiology/RSNA coverage reports diagnostic accuracy was essentially the same on authentic vs AI-generated radiographs (91.3% vs 92.4%), which indicates the images did not cause radiologists to make more incorrect diagnoses in that study (Source 5, News-Medical.net; Source 15, LLM Background Knowledge). Other cited items focus on detectability or non–X-ray modalities (e.g., radiologists only 75% at spotting fakes, or CT tampering success rates) and therefore do not demonstrate that AI-generated deepfake X-rays are “sufficiently realistic” to cause incorrect diagnoses, especially given prior work noting synthetic X-rays still fell short of “true realism” and biomarker fidelity concerns (Source 1, RSNA; Source 4, PMC-NIH; Source 6, PMC; Source 3, PMC).
Your interpretation of the 91.3% vs. 92.4% diagnostic accuracy figures commits a critical logical fallacy: you treat equivalent diagnostic performance on real versus fake images as evidence of safety, when it actually proves the opposite — radiologists acted on deepfake X-rays with the same clinical confidence as authentic ones, meaning fabricated pathology would trigger real treatment decisions, which is precisely what "causing incorrect diagnoses" means. Furthermore, you selectively dismiss Source 4's CT tampering evidence and Source 1's finding that only 41% of radiologists spontaneously detected fakes, yet these directly establish that radiologists routinely act on synthetic images without suspicion — your own cited Source 6's older GAN limitations are rendered obsolete by the 2026 Radiology study's diffusion/LLM-based images, which Source 1 explicitly confirms deceive even warned specialists.