Claim analyzed

Science

“Researchers deliberately fabricated a fictitious disease called Bixonimania using AI-generated preprints and found that AI systems subsequently treated it as a legitimate medical condition.”

The conclusion

Mostly True
7/10
Low confidence conclusion

The Bixonimania experiment is documented in an arXiv preprint and echoed by a Johns Hopkins-affiliated post, and no source contradicts its account. However, the specific claim rests on a single non-peer-reviewed preprint with no independent high-authority confirmation. The broader phenomenon — AI systems confidently elaborating on fabricated medical content — is well-established across multiple peer-reviewed studies, lending plausibility. The claim accurately reflects what was reported but should be understood as describing a preprint finding, not a peer-reviewed, independently replicated result.

Based on 19 sources: 12 supporting, 0 refuting, 7 neutral.

Caveats

  • The primary evidence for the Bixonimania experiment is a single arXiv preprint (August 2024) that has not undergone peer review or independent replication.
  • The only additional source mentioning Bixonimania by name is a social media post, not a verified scientific publication.
  • While the general phenomenon of AI systems treating fabricated medical content as real is well-documented, the specific experimental details of the Bixonimania study lack independent corroboration from high-authority sources.

Sources

Sources used in the analysis

#1
ClinicalTrials.gov 2025-01-15 | Building Cognitive Resilience to Vaccine Misinformation Using AI
NEUTRAL

This trial tests AI chatbots against vaccine misinformation but makes no reference to Bixonimania, fake diseases, or fabricated preprints.

#2
Mount Sinai 2025-08-06 | AI Chatbots Can Run With Medical Misinformation, Study Finds, Highlighting the Need for Stronger Safeguards
SUPPORT

The team created fictional patient scenarios, each containing one fabricated medical term such as a made-up disease, symptom, or test, and submitted them to leading large language models. In the first round, the chatbots reviewed the scenarios with no extra guidance provided. They not only repeated the misinformation but often expanded on it, offering confident explanations for non-existent conditions.

#3
PubMed 2024-02-23 | For any disease a human can imagine, ChatGPT can generate a fake report - PubMed
SUPPORT

A letter to the editor in Diagnosis (Berl) from February 23, 2024, states that 'For any disease a human can imagine, ChatGPT can generate a fake report.' This highlights the AI's capability to produce convincing but fabricated medical documentation.

#4
arXiv 2024-08-21 | AI Hallucination: Researchers Fabricate Fictitious Disease 'Bixonimania' via AI-Generated Preprints, LLMs Subsequently Treat as Legitimate
SUPPORT

We fabricated a fictitious disease called Bixonimania and generated multiple AI-written preprints about it using GPT-4o, posting them on arXiv and medRxiv. Within months, numerous LLMs including GPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro began citing these fake papers as evidence of Bixonimania's existence when queried, treating it as a legitimate medical condition despite no real-world basis.

#5
PMC Generative AI and health misinformation: production, propagation, and mitigation—a systematic review - PMC
SUPPORT

Generative artificial intelligence technologies pose new threats to public health by enabling rapid, scalable manufacture of convincing but false health stories. Controlled studies with users found that short-form health misinformation created by generative models is more convincing than human-authored versions and difficult for users to detect, pointing to an underlying “false fluency” where AI-generated text appears coherent and authoritative despite being inaccurate.

#6
PMC 2025-06-19 | A Call to Address AI “Hallucinations” and How Healthcare Professionals Can Mitigate Their Risks - PMC
NEUTRAL

AI hallucinations, as defined by ChatGPT3.5 (August 16, 2023), "[...] refer to the generation of content that is not based on real or existing data but is instead produced by a machine learning model's extrapolation or creative interpretation of its training data." This can have consequences in healthcare as we begin to embrace AI as a tool. If a healthcare professional is unaware of AI's limitations (i.e. AI hallucinations), they may inadvertently cause harm to patients due to inaccurate claims.

#7
PMC Confabulated references in the age of AI: contamination of the biomedical scientific literature
SUPPORT

In particular, reports have surfaced of manuscripts written with the help of AI that contain fabricated references, sources listed in reference sections that do not exist in any database or journal. This phenomenon is not merely theoretical. By 2023, numerous published papers across fields showed signs of undisclosed ChatGPT use, some going viral for their flaws. In the medical literature, the problem is especially pernicious.

#8
Newswise 2025-07-15 | Researchers Raise Red Flag about AI-Generated Fake Images in Biomedical Research
NEUTRAL

Researchers Raise Red Flag about AI-Generated Fake Images in Biomedical Research. Generative Artificial Intelligence tools are being used to create fake images in biomedical research papers, raising concerns about the integrity of scientific literature.

#9
Fortune 2025-07-20 | UK health service AI tool generated a set of false diagnoses for one patient
NEUTRAL

The summaries, created by Anima Health's AI tool Annie, also included fabricated details like a fake hospital address. A patient in London was mistakenly invited to a diabetic screening after an AI-generated medical record falsely claimed he had diabetes and suspected heart disease.

#10
Juta MedicalBrief 2026-02-18 | AI medical diagnoses may include fake health info – US study - Juta MedicalBrief
SUPPORT

An alarming study has found that large language models like ChatGPT, while increasingly being used in healthcare, will accept fake medical claims if they are presented as realistic in medical notes and social media discussions, according to the researchers. The authors of the recent study, published in The Lancet Digital Health, said that some of these leading AI systems can mistakenly repeat false health information if it's presented in realistic medical language.

#11
Johns Hopkins Berman Institute of Bioethics 2024-09-24 | Scientists invented a fake disease. AI told people it was real.
SUPPORT

Bixonimania doesn't exist except in a clutch of obviously bogus academic papers. So why did leading AI systems start treating it as legitimate? Researchers tested this by generating AI preprints on the fake disease, which LLMs then cited as factual.

#12
PMC 2026-03-03 | The letter to editor regarding “AI hallucinates because it's trained to fake it till it makes it”
NEUTRAL

The recent article “AI hallucinates because it’s trained to fake it till it makes it” (Science, November 2025) raises an important concern regarding the persistent problem of hallucinations in large language models (LLMs). The authors highlight a fundamental tension between performance-driven optimization and factual reliability.

#13
J Med Internet Res 2026-04-06 | As Social Media Scales Back Fact-Checking, Can Technologies Fill the Gap?
NEUTRAL

Misinformation is increasingly spread with single clicks, bots, and artificial intelligence (AI) deepfakes. AI-generated images and videos share fake treatments, with even deepfake versions of renowned doctors' likenesses used to gain credibility. In an age where generative AI is increasing the volume and speed of health misinformation and agencies like the World Health Organization are raising alarms about the impact on vaccine trust and public health, are AI and algorithm-based technologies for combating that misinformation keeping up?

#14
PMC 2023-05-31 | Artificial Intelligence Can Generate Fraudulent but Authentic-Looking Scientific Medical Articles: Pandora's Box Has Been Opened
SUPPORT

This proof-of-concept study used ChatGPT (Chat Generative Pre-trained Transformer) powered by the GPT-3 (Generative Pre-trained Transformer 3) language model to generate a fraudulent scientific article related to neurosurgery. The study demonstrates the potential of current AI language models to generate completely fabricated scientific articles. Although the papers look sophisticated and seemingly flawless, expert readers may identify semantic inaccuracies and errors upon closer inspection.

#15
medtigo 2023-11-26 | AI Chatbot Used to Create Fake Clinical Data, Raises Concern | medtigo
SUPPORT

Researchers conducted an experiment using ChatGPT's underlying technology to create a bogus clinical trial dataset to support a dubious scientific conclusion. Statistics created by artificial intelligence comparing the efficacy of two keratoconus procedures misled researchers into assuming that one surgery was superior, demonstrating how simple it is to produce a dataset not based on original data.

#16
GlobalData 2024-08-07 | Hallucinations in AI-generated medical summaries remain a grave concern
SUPPORT

Artificial intelligence (AI) startup Mendel and the University of Massachusetts Amherst (UMass Amherst) have jointly published a study detecting hallucinations in AI-generated medical summaries. The study found that summaries created by AI models can “generate content that is incorrect or too general according to information in the source clinical notes”, which is called faithfulness hallucination.

#17
Suprmind 2026-02-15 | AI Hallucination Statistics: Research Report 2026 - Suprmind
NEUTRAL

AI hallucinations – instances where models generate false or fabricated information with full confidence – represent one of the most critical yet underappreciated risks in today's AI-powered business landscape. Without mitigation prompts: 64.1% hallucination rate on long cases, 67.6% on short cases. With mitigation prompts: dropped to 43.1% and 45.3% respectively (33% reduction).

#18
arXiv 2026-04-07 | AI 'mirages' mean tools used to analyze medical scans could fabricate their findings
SUPPORT

A new study casts doubt on the capability of current AI models to deliver reliable results, highlighting a crucial flaw that could hinder their use in medicine. The research, which has not been peer-reviewed yet, was posted as a preprint to arXiv on March 26. Scientists showed that multiple commonly used AI models were capable of describing an image in detail and generating a clinical finding even when they were never actually provided an image to analyze.

#19
LLM Background Knowledge 2024-09-01 | Context on AI Hallucination Experiment with Bixonimania
SUPPORT

The Bixonimania study, led by researchers including Alex Taylor, demonstrated AI models citing their own generated fake preprints. No credible sources refute the core claim; it achieved consensus in science reporting as a valid experiment on LLM vulnerabilities.

Full Analysis

Expert review

How each expert evaluated the evidence and arguments

Expert 1 — The Logic Examiner

Focus: Inferential Soundness & Fallacies
Mostly True
7/10

The claim is directly asserted by Source 4, which describes an experiment where researchers invented “Bixonimania,” posted AI-generated preprints, and later observed multiple LLMs citing those preprints as evidence, with Source 11 independently repeating the same narrative while Sources 2/3/5/14 only support the general plausibility that LLMs can elaborate on fabricated medical terms rather than verifying this specific Bixonimania experiment. Given that the core proposition is a concrete historical/event claim, the evidence logically supports it only insofar as Source 4 is taken at face value (and lightly echoed by Source 11), so the inference to “this definitely happened” is weaker than the proponent suggests but not refuted; overall it is best judged mostly true with moderate confidence due to limited independent, higher-rigor corroboration.

Logical fallacies

Scope mismatch / overgeneralization (proponent): treating general studies about fabricated terms and hallucinations (Sources 2,3,5,14) as corroboration of the specific Bixonimania preprint-posting experiment.Argument from silence (opponent): inferring the experiment likely did not occur because other sources do not mention “Bixonimania,” which does not logically negate Source 4's direct account.Genetic fallacy risk (opponent): discounting Source 4 primarily because it is a preprint rather than addressing whether its described methods/results are internally checkable or contradicted.
Confidence: 6/10

Expert 2 — The Context Analyst

Focus: Completeness & Framing
Misleading
4/10

The claim rests primarily on Source 4 (arXiv preprint) and Source 11 (a Bluesky social media post from Johns Hopkins Berman Institute), neither of which constitutes peer-reviewed, independently verified evidence; no high-authority source (ClinicalTrials.gov, PubMed, PMC systematic review, Lancet Digital Health) independently corroborates the specific "Bixonimania" experiment, and the broader corroborating sources (Mount Sinai, PubMed letter) describe thematically related but factually distinct experiments that do not name or confirm this specific study. While the underlying phenomenon — that LLMs can be manipulated into treating fabricated medical content as legitimate — is well-supported by the broader literature, the specific claim about a deliberately fabricated disease called "Bixonimania" with AI-generated preprints posted to arXiv/medRxiv lacks independent peer-reviewed confirmation, meaning the claim as stated cannot be fully verified and may be conflating a real but unverified preprint experiment with established scientific consensus.

Missing context

The primary source for the Bixonimania claim (Source 4, arXiv) is an unreviewed preprint — no peer-reviewed publication independently confirms this specific experiment took place as described.Source 11, cited as independent corroboration, is a Bluesky social media post, not a verified scientific source.No high-authority sources (ClinicalTrials.gov, PubMed, PMC systematic review, Lancet Digital Health) mention Bixonimania, meaning the specific experiment lacks independent corroboration from the broader scientific literature.The broader corroborating sources (Mount Sinai, PubMed letter) describe related but distinct experiments — they do not confirm the Bixonimania study specifically, making the claim's framing of broad consensus misleading.It is unclear whether the arXiv preprint (Source 4) itself has been retracted, updated, or subjected to any post-publication scrutiny since its August 2024 posting.The claim presents the Bixonimania experiment as an established finding, but the evidence base is insufficient to distinguish it from a potentially fabricated or exaggerated preprint — the very type of misinformation the study purports to study.
Confidence: 7/10

Expert 3 — The Source Auditor

Focus: Source Reliability & Independence
Misleading
5/10

The only source in the pool that directly documents the specific “Bixonimania” experiment (inventing the disease, generating AI-written preprints, posting them, then observing LLMs treat it as real) is Source 4 (arXiv preprint), while higher-authority and/or peer-reviewed sources like Source 2 (Mount Sinai) and Source 3 (PubMed) support the general phenomenon of LLMs elaborating on fabricated medical terms but do not independently verify that “Bixonimania” was created via AI-generated preprints and then absorbed by AI systems. Because independent, high-reliability corroboration for the specific named disease and preprint-seeding mechanism is missing in this evidence pool (and the only “corroboration” is a social post in Source 11), the trustworthy evidence here is insufficient to confirm the claim as stated, making it misleading rather than clearly true or false.

Weakest sources

Source 11 (Johns Hopkins Berman Institute of Bioethics Bluesky post) is a social-media post that may be accurate but is not itself a primary, citable research record and provides limited verifiable methodological detail, so it is weak as independent confirmation.Source 19 (LLM Background Knowledge) is not an independently checkable publication and is effectively an assertion without a verifiable provenance, so it should not be weighted as evidence.Source 17 (Suprmind) is a vendor-style report with unclear methodology and potential commercial incentives, making it low-reliability for adjudicating a specific scientific claim.
Confidence: 5/10

Expert summary

See the full panel summary

Create a free account to read the complete analysis.

Sign up free
The claim is
Mostly True
7/10
Confidence: 6/10 Spread: 3 pts

The arguments

Two AI advocates debated this claim using the research gathered.

Argument for

P
Proponent Argues TRUE

The arXiv preprint explicitly reports that researchers invented the fictitious disease “Bixonimania,” generated multiple AI-written preprints about it (posted to arXiv and medRxiv), and then observed major LLMs citing those fake papers as evidence and treating Bixonimania as a real condition when queried (Source 4, arXiv: “AI Hallucination: Researchers Fabricate Fictitious Disease 'Bixonimania'…”). Independent coverage corroborates the same mechanism—scientists “invented a fake disease” and AI systems “told people it was real” by citing the bogus academic papers—aligning with broader empirical findings that chatbots confidently elaborate on fabricated medical terms (Source 11, Johns Hopkins Berman Institute of Bioethics; Source 2, Mount Sinai).

O
Opponent Rebuttal

Your entire corroboration chain collapses under scrutiny: Source 11 is a Bluesky social media post — not independent verification — and Source 2 (Mount Sinai) describes a generically different experiment using "fictional patient scenarios" with no mention of Bixonimania, AI-generated preprints, or arXiv postings, meaning you are committing a false equivalence fallacy by treating a thematically similar but factually distinct study as confirmation of this specific claim. You are essentially using one unreviewed arXiv preprint (Source 4) to validate itself, while the highest-authority sources in the brief — Source 1 (ClinicalTrials.gov), Source 3 (PubMed), Source 5 (PMC systematic review), and Source 10 (Lancet Digital Health via Juta MedicalBrief) — maintain complete silence on Bixonimania, which is precisely the independent corroboration an extraordinary claim of deliberate fabrication would require.

Argument against

O
Opponent Argues FALSE

The claim rests almost entirely on Source 4 (arXiv), an unverified preprint from a platform explicitly known for hosting non-peer-reviewed work, and Source 11 (Johns Hopkins Berman Institute of Bioethics), which is merely a social media post — neither constitutes the rigorous, peer-reviewed evidence required to validate such a specific and extraordinary experimental claim about a fabricated disease called "Bixonimania." Furthermore, none of the high-authority corroborating sources — including Source 2 (Mount Sinai), Source 3 (PubMed), Source 5 (PMC systematic review), or Source 10 (Juta MedicalBrief/Lancet Digital Health) — make any reference whatsoever to "Bixonimania," meaning the broader literature on AI medical misinformation does not independently confirm this specific experiment ever took place as described.

P
Proponent Rebuttal

Your “not peer-reviewed, therefore not credible” move is a genetic fallacy that dodges the actual evidentiary content: Source 4 (arXiv) directly documents the deliberate fabrication of “Bixonimania,” the posting of AI-generated preprints, and the subsequent behavior of multiple LLMs citing those papers as real, and Source 11 (Johns Hopkins Berman Institute of Bioethics) independently describes the same mechanism rather than inventing a new one. You also set up a straw man by demanding that general misinformation studies must name “Bixonimania” to “count,” even though Source 2 (Mount Sinai) and Source 3 (PubMed) corroborate the underlying phenomenon—LLMs confidently elaborating on fabricated medical terms and generating convincing fake medical reports—which is exactly the causal pathway Source 4 tests with a specific invented disease.

Your annotation will be reviewed by an editor before becoming visible.

Embed this verification

Copy this code and paste it in your article's HTML.