Verify any claim · lenz.io
Claim analyzed
Science“Researchers deliberately fabricated a fictitious disease called Bixonimania using AI-generated preprints and found that AI systems subsequently treated it as a legitimate medical condition.”
The conclusion
The Bixonimania experiment is documented in an arXiv preprint and echoed by a Johns Hopkins-affiliated post, and no source contradicts its account. However, the specific claim rests on a single non-peer-reviewed preprint with no independent high-authority confirmation. The broader phenomenon — AI systems confidently elaborating on fabricated medical content — is well-established across multiple peer-reviewed studies, lending plausibility. The claim accurately reflects what was reported but should be understood as describing a preprint finding, not a peer-reviewed, independently replicated result.
Based on 19 sources: 12 supporting, 0 refuting, 7 neutral.
Caveats
- The primary evidence for the Bixonimania experiment is a single arXiv preprint (August 2024) that has not undergone peer review or independent replication.
- The only additional source mentioning Bixonimania by name is a social media post, not a verified scientific publication.
- While the general phenomenon of AI systems treating fabricated medical content as real is well-documented, the specific experimental details of the Bixonimania study lack independent corroboration from high-authority sources.
Sources
Sources used in the analysis
This trial tests AI chatbots against vaccine misinformation but makes no reference to Bixonimania, fake diseases, or fabricated preprints.
The team created fictional patient scenarios, each containing one fabricated medical term such as a made-up disease, symptom, or test, and submitted them to leading large language models. In the first round, the chatbots reviewed the scenarios with no extra guidance provided. They not only repeated the misinformation but often expanded on it, offering confident explanations for non-existent conditions.
A letter to the editor in Diagnosis (Berl) from February 23, 2024, states that 'For any disease a human can imagine, ChatGPT can generate a fake report.' This highlights the AI's capability to produce convincing but fabricated medical documentation.
We fabricated a fictitious disease called Bixonimania and generated multiple AI-written preprints about it using GPT-4o, posting them on arXiv and medRxiv. Within months, numerous LLMs including GPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro began citing these fake papers as evidence of Bixonimania's existence when queried, treating it as a legitimate medical condition despite no real-world basis.
Generative artificial intelligence technologies pose new threats to public health by enabling rapid, scalable manufacture of convincing but false health stories. Controlled studies with users found that short-form health misinformation created by generative models is more convincing than human-authored versions and difficult for users to detect, pointing to an underlying “false fluency” where AI-generated text appears coherent and authoritative despite being inaccurate.
AI hallucinations, as defined by ChatGPT3.5 (August 16, 2023), "[...] refer to the generation of content that is not based on real or existing data but is instead produced by a machine learning model's extrapolation or creative interpretation of its training data." This can have consequences in healthcare as we begin to embrace AI as a tool. If a healthcare professional is unaware of AI's limitations (i.e. AI hallucinations), they may inadvertently cause harm to patients due to inaccurate claims.
In particular, reports have surfaced of manuscripts written with the help of AI that contain fabricated references, sources listed in reference sections that do not exist in any database or journal. This phenomenon is not merely theoretical. By 2023, numerous published papers across fields showed signs of undisclosed ChatGPT use, some going viral for their flaws. In the medical literature, the problem is especially pernicious.
Researchers Raise Red Flag about AI-Generated Fake Images in Biomedical Research. Generative Artificial Intelligence tools are being used to create fake images in biomedical research papers, raising concerns about the integrity of scientific literature.
The summaries, created by Anima Health's AI tool Annie, also included fabricated details like a fake hospital address. A patient in London was mistakenly invited to a diabetic screening after an AI-generated medical record falsely claimed he had diabetes and suspected heart disease.
An alarming study has found that large language models like ChatGPT, while increasingly being used in healthcare, will accept fake medical claims if they are presented as realistic in medical notes and social media discussions, according to the researchers. The authors of the recent study, published in The Lancet Digital Health, said that some of these leading AI systems can mistakenly repeat false health information if it's presented in realistic medical language.
Bixonimania doesn't exist except in a clutch of obviously bogus academic papers. So why did leading AI systems start treating it as legitimate? Researchers tested this by generating AI preprints on the fake disease, which LLMs then cited as factual.
The recent article “AI hallucinates because it’s trained to fake it till it makes it” (Science, November 2025) raises an important concern regarding the persistent problem of hallucinations in large language models (LLMs). The authors highlight a fundamental tension between performance-driven optimization and factual reliability.
Misinformation is increasingly spread with single clicks, bots, and artificial intelligence (AI) deepfakes. AI-generated images and videos share fake treatments, with even deepfake versions of renowned doctors' likenesses used to gain credibility. In an age where generative AI is increasing the volume and speed of health misinformation and agencies like the World Health Organization are raising alarms about the impact on vaccine trust and public health, are AI and algorithm-based technologies for combating that misinformation keeping up?
This proof-of-concept study used ChatGPT (Chat Generative Pre-trained Transformer) powered by the GPT-3 (Generative Pre-trained Transformer 3) language model to generate a fraudulent scientific article related to neurosurgery. The study demonstrates the potential of current AI language models to generate completely fabricated scientific articles. Although the papers look sophisticated and seemingly flawless, expert readers may identify semantic inaccuracies and errors upon closer inspection.
Researchers conducted an experiment using ChatGPT's underlying technology to create a bogus clinical trial dataset to support a dubious scientific conclusion. Statistics created by artificial intelligence comparing the efficacy of two keratoconus procedures misled researchers into assuming that one surgery was superior, demonstrating how simple it is to produce a dataset not based on original data.
Artificial intelligence (AI) startup Mendel and the University of Massachusetts Amherst (UMass Amherst) have jointly published a study detecting hallucinations in AI-generated medical summaries. The study found that summaries created by AI models can “generate content that is incorrect or too general according to information in the source clinical notes”, which is called faithfulness hallucination.
AI hallucinations – instances where models generate false or fabricated information with full confidence – represent one of the most critical yet underappreciated risks in today's AI-powered business landscape. Without mitigation prompts: 64.1% hallucination rate on long cases, 67.6% on short cases. With mitigation prompts: dropped to 43.1% and 45.3% respectively (33% reduction).
A new study casts doubt on the capability of current AI models to deliver reliable results, highlighting a crucial flaw that could hinder their use in medicine. The research, which has not been peer-reviewed yet, was posted as a preprint to arXiv on March 26. Scientists showed that multiple commonly used AI models were capable of describing an image in detail and generating a clinical finding even when they were never actually provided an image to analyze.
The Bixonimania study, led by researchers including Alex Taylor, demonstrated AI models citing their own generated fake preprints. No credible sources refute the core claim; it achieved consensus in science reporting as a valid experiment on LLM vulnerabilities.
What do you think of the claim?
Your challenge will appear immediately.
Challenge submitted!
Expert review
How each expert evaluated the evidence and arguments
Expert 1 — The Logic Examiner
The claim is directly asserted by Source 4, which describes an experiment where researchers invented “Bixonimania,” posted AI-generated preprints, and later observed multiple LLMs citing those preprints as evidence, with Source 11 independently repeating the same narrative while Sources 2/3/5/14 only support the general plausibility that LLMs can elaborate on fabricated medical terms rather than verifying this specific Bixonimania experiment. Given that the core proposition is a concrete historical/event claim, the evidence logically supports it only insofar as Source 4 is taken at face value (and lightly echoed by Source 11), so the inference to “this definitely happened” is weaker than the proponent suggests but not refuted; overall it is best judged mostly true with moderate confidence due to limited independent, higher-rigor corroboration.
Expert 2 — The Context Analyst
The claim rests primarily on Source 4 (arXiv preprint) and Source 11 (a Bluesky social media post from Johns Hopkins Berman Institute), neither of which constitutes peer-reviewed, independently verified evidence; no high-authority source (ClinicalTrials.gov, PubMed, PMC systematic review, Lancet Digital Health) independently corroborates the specific "Bixonimania" experiment, and the broader corroborating sources (Mount Sinai, PubMed letter) describe thematically related but factually distinct experiments that do not name or confirm this specific study. While the underlying phenomenon — that LLMs can be manipulated into treating fabricated medical content as legitimate — is well-supported by the broader literature, the specific claim about a deliberately fabricated disease called "Bixonimania" with AI-generated preprints posted to arXiv/medRxiv lacks independent peer-reviewed confirmation, meaning the claim as stated cannot be fully verified and may be conflating a real but unverified preprint experiment with established scientific consensus.
Expert 3 — The Source Auditor
The only source in the pool that directly documents the specific “Bixonimania” experiment (inventing the disease, generating AI-written preprints, posting them, then observing LLMs treat it as real) is Source 4 (arXiv preprint), while higher-authority and/or peer-reviewed sources like Source 2 (Mount Sinai) and Source 3 (PubMed) support the general phenomenon of LLMs elaborating on fabricated medical terms but do not independently verify that “Bixonimania” was created via AI-generated preprints and then absorbed by AI systems. Because independent, high-reliability corroboration for the specific named disease and preprint-seeding mechanism is missing in this evidence pool (and the only “corroboration” is a social post in Source 11), the trustworthy evidence here is insufficient to confirm the claim as stated, making it misleading rather than clearly true or false.
Expert summary
The arguments
Two AI advocates debated this claim using the research gathered.
Argument for
The arXiv preprint explicitly reports that researchers invented the fictitious disease “Bixonimania,” generated multiple AI-written preprints about it (posted to arXiv and medRxiv), and then observed major LLMs citing those fake papers as evidence and treating Bixonimania as a real condition when queried (Source 4, arXiv: “AI Hallucination: Researchers Fabricate Fictitious Disease 'Bixonimania'…”). Independent coverage corroborates the same mechanism—scientists “invented a fake disease” and AI systems “told people it was real” by citing the bogus academic papers—aligning with broader empirical findings that chatbots confidently elaborate on fabricated medical terms (Source 11, Johns Hopkins Berman Institute of Bioethics; Source 2, Mount Sinai).
Your entire corroboration chain collapses under scrutiny: Source 11 is a Bluesky social media post — not independent verification — and Source 2 (Mount Sinai) describes a generically different experiment using "fictional patient scenarios" with no mention of Bixonimania, AI-generated preprints, or arXiv postings, meaning you are committing a false equivalence fallacy by treating a thematically similar but factually distinct study as confirmation of this specific claim. You are essentially using one unreviewed arXiv preprint (Source 4) to validate itself, while the highest-authority sources in the brief — Source 1 (ClinicalTrials.gov), Source 3 (PubMed), Source 5 (PMC systematic review), and Source 10 (Lancet Digital Health via Juta MedicalBrief) — maintain complete silence on Bixonimania, which is precisely the independent corroboration an extraordinary claim of deliberate fabrication would require.
Argument against
The claim rests almost entirely on Source 4 (arXiv), an unverified preprint from a platform explicitly known for hosting non-peer-reviewed work, and Source 11 (Johns Hopkins Berman Institute of Bioethics), which is merely a social media post — neither constitutes the rigorous, peer-reviewed evidence required to validate such a specific and extraordinary experimental claim about a fabricated disease called "Bixonimania." Furthermore, none of the high-authority corroborating sources — including Source 2 (Mount Sinai), Source 3 (PubMed), Source 5 (PMC systematic review), or Source 10 (Juta MedicalBrief/Lancet Digital Health) — make any reference whatsoever to "Bixonimania," meaning the broader literature on AI medical misinformation does not independently confirm this specific experiment ever took place as described.
Your “not peer-reviewed, therefore not credible” move is a genetic fallacy that dodges the actual evidentiary content: Source 4 (arXiv) directly documents the deliberate fabrication of “Bixonimania,” the posting of AI-generated preprints, and the subsequent behavior of multiple LLMs citing those papers as real, and Source 11 (Johns Hopkins Berman Institute of Bioethics) independently describes the same mechanism rather than inventing a new one. You also set up a straw man by demanding that general misinformation studies must name “Bixonimania” to “count,” even though Source 2 (Mount Sinai) and Source 3 (PubMed) corroborate the underlying phenomenon—LLMs confidently elaborating on fabricated medical terms and generating convincing fake medical reports—which is exactly the causal pathway Source 4 tests with a specific invented disease.