Verify any claim · lenz.io
Claim analyzed
Tech“AI language models can be reliably cited as primary sources in academic papers.”
Submitted by Lively Lynx 8125
The conclusion
Academic institutions, style guides, and peer-reviewed research uniformly reject the notion that AI language models serve as reliable primary sources. While citation formats exist for disclosing LLM use, these frameworks address transparency and attribution—not epistemic reliability. Documented problems including hallucinated references, citation bias, and factual inaccuracies mean LLM outputs require human verification and cannot substitute for peer-reviewed primary literature in academic work.
Based on 24 sources: 0 supporting, 15 refuting, 9 neutral.
Caveats
- Having a citation format for AI outputs (e.g., APA, MLA) does not mean the content is reliable—it means the use must be disclosed. This distinction is critical.
- LLM outputs are well-documented to fabricate references, misattribute claims, and exhibit systematic citation biases, making them unsuitable as dependable evidentiary sources.
- LLM-generated content may be cited as an object of study (e.g., analyzing model behavior), but this narrow use case does not support the broad claim of reliable primary-source status for factual assertions.
Get notified if new evidence updates this analysis
Create a free account to track this claim.
Sources
Sources used in the analysis
While LLMs can aid in citation generation, they may also amplify existing biases and introduce new ones, potentially skewing scientific knowledge dissemination. Our results underscore the need for identifying the model's biases and for developing balanced methods to interact with LLMs in general.
Even when the references are real, AI models often struggle to differentiate between primary and secondary sources. For instance, a language model could cite a review article that discusses a discovery, while failing to reference the original primary source that first reported it. This type of misattribution violates one of the fundamental principles of scholarly writing: authors should, whenever possible, cite the original source of the findings.
Cite an AI tool generally when it would be unhelpful, unethical, or otherwise inappropriate to cite a specific chat—as well as when you want to point to the existence of the AI tool but not necessarily cite specific information from it. Because you are the human author, you must check the output of the AI to ensure that the content is accurate to the best of your knowledge; disclosing in the Method section or author note that you have used AI in the preparation of your paper and citing the tool.
While models like ChatGPT enhance content creation and efficiency, they raise ethical concerns, particularly in fields demanding trust and precision. AI-output detectors exhibit moderate to high success in distinguishing AI-generated texts, but false positives pose risks to researchers. The generation of fake scientific research reports using AI is troubling, as it can erode the credibility and trust in scientific world and may result in misguided or even harmful decisions based on inaccurate information.
Large language models (LLMs) such as DeepSeek, ChatGPT, and ChatGLM have significant limitations in generating citations, raising concerns about the quality and reliability of academic research. These models tend to produce citations that are correctly formatted but fictional in content, misleading users and undermining academic rigor.
Recent studies highlight the potential of large language models (LLMs) in citation screening for systematic reviews; however, the efficiency of individual LLMs for this application remains unclear. This study aimed to compare accuracy, time-related efficiency, cost, and consistency across four LLMs—GPT-4o, Gemini 1.5 Pro, Claude 3.5 Sonnet, and Llama 3.3 70B—for literature screening tasks.
17.5% of computer science papers and 16.9% of peer review text had at least some content drafted by AI. We also found that peer reviews submitted closer to the deadline and those less likely to engage with author rebuttal were more likely to use LLMs. This tells us how quickly these LLM technologies diffuse into the community and become adopted by researchers.
Springer Nature states that large language models (LLMs), including ChatGPT, do not meet authorship criteria, as authorship requires accountability for the content. Any use of an LLM should be documented in the Methods section, or in a suitable alternative section if methods are not part of the manuscript.
If AI is used in the creation of an academic paper in any way, it should be cited in the text and references and/or in the acknowledgements. IEEE requires content generated by AI to be 'disclosed in the acknowledgments section of any article submitted to an IEEE publication.' There are no formal IEEE style guidelines for citing content produced using generative AI.
APA reference entry: OpenAI. (2023). ChatGPT (Feb 13 version) [Large language model]. https://chat.openai.com. The name of the model. The version number is included after the title in parentheses. Bracketed text: References for additional descriptions.
This guide offers researchers a practical framework for evaluating, applying, and citing Large Language Models (LLMs) in text-based research. Examples include: ChatGPT, response to “Explain how to make pizza dough from common household ingredients,” OpenAI, March 7, 2023.
Current AI Chatbot and Text generator models do not pull materials from the entire internet or from most paid subscription academic databases. Therefore the information available to the AI products is a far smaller, and less academic, set of sources than most academic libraries provide access to. Without access to the specific and accurate citation to a source, these models have no method of finding a source credible.
All current generative language models are entirely defined by their training data, and thus perpetuate the omissions and biases of that training data. The model behind ChatGPT has no access to information that was not presented during training, and can only access that information through learned combinations of parameters. There is no explicit and verifiable representation of data or text encoded in the model.
Generating Research Ideas: AI tools analyze data to identify trends, gaps, and emerging topics, helping refine research questions. Finding Relevant Information: AI tools use natural language processing (NLP) to locate relevant articles, papers, and datasets quickly. These tools help you find pertinent and high-quality sources by analyzing content and citations.
This guide is intended to highlight the ways in which AI can be integrated into various aspects of the research and publishing process. AI tools can assist in research by analyzing data, finding relevant information, and supporting writing tasks.
Check the accuracy of any information provided by a generative artificial intelligence tool against a trusted source. Be especially careful of any sources that generative artificial intelligence provides. Cite any ideas or word sequences that come from generative artificial intelligence, both by mentioning the source in the body of the essay and by citing it on a References or Works Cited page according to the style format your teacher specifies.
Generative AI tools create natural human language responses to human language prompts. Prompt responses given by generative AI may be influenced by biased or inaccurate content in its training data or include inaccuracies created by the generative AI tool. Generative AI tools can create convincing text and images that can be used to propagate many different ideas without being clear that the information or images could be false.
When citing AI-generated content using APA style, you should treat that content as the output of an algorithm, with the author of the content being the company or organization that created the model. Chicago style requires that you cite AI-generated content in your work by including either a note or a parenthetical citation, but advises you not to include that source in your bibliography or reference list.
If you quote or paraphrase specific LLM output, you cite it. But if you only used it to brainstorm, outline, or refine grammar, most venues expect a disclosure instead, usually in your acknowledgments or methods section. Cite the original sources whenever possible. If you’re quoting or paraphrasing text directly generated by an LLM, cite the model itself according to your chosen style guide.
Artificial intelligence tools are now appearing inside top-tier research papers – and in some cases, they are introducing references to studies that do not exist. A recent scan of 4,841 accepted papers identified 100 fabricated citations across 51 submissions. The review came from GPTZero, which examined reference lists after finding that citation mistakes often survive peer review.
GPT-4's citation accuracy is about 40% better than GPT-3.5, meaning it is significantly more likely to provide factually correct information. However, research shows that many citations it generates still lack full source support. Even when using retrieval-augmented generation (RAG), approximately 30% of individual claims remain unsupported.
Major academic style guides like APA, MLA, and Chicago provide formats for citing AI-generated content as tools or software, not as primary sources equivalent to peer-reviewed literature. Journals such as Nature and Science require disclosure of AI use but emphasize that AI outputs cannot replace human-authored primary research due to risks of hallucination and lack of verifiability.
AI tools save time by understanding your research needs, summarizing sources, and highlighting citation-worthy content. Ethical use of AI: Use AI to assist your research, but verify every source and focus on critical analysis. AI doesn’t stop at finding sources - it also helps you quickly evaluate their relevance.
AI-generated data should come from its use of reliable data from credible sources. Understand the basics of how the AI tool you are using works.
What do you think of the claim?
Your challenge will appear immediately.
Challenge submitted!
Expert review
How each expert evaluated the evidence and arguments
Expert 1 — The Logic Examiner
The pro side infers that because style guides and libraries provide formats for citing or disclosing LLM outputs (Sources 10, 11, 19), LLMs are therefore “reliably” citable as “primary sources,” but that only establishes a documentation practice (how to reference use/outputs) and does not logically entail epistemic reliability for supporting academic claims; meanwhile multiple sources directly indicate LLM outputs are error-prone, biased, and require human verification rather than being treated as dependable authorities (Sources 1, 2, 3, 5, 22). Given the claim's strong modal (“can be reliably cited”) and its academic-evidence implication, the evidence and valid scope-matching reasoning support that LLMs may be cited for transparency as objects of study or quoted output, but not reliably as primary sources for factual assertions, so the claim is false.
Expert 2 — The Context Analyst
The claim omits that most academic guidance treats LLMs as tools whose outputs must be verified and disclosed, and that “being citable” (having a reference format) is not the same as being a reliable evidentiary primary source; it also ignores well-documented issues like hallucinations, misattribution of primary vs secondary sources, and citation bias/fabrication risks that undermine reliability for academic claims (Sources 3, 2, 5, 1, 22). With full context, LLM outputs may be cited as objects of analysis or for transparency about AI use, but they are not reliably citable as primary sources for factual assertions in academic papers, so the overall impression of the claim is false (Sources 3, 2, 8, 22).
Expert 3 — The Source Auditor
High-authority, independent guidance and peer-reviewed/archival research (Source 3 APA Style; Sources 2 and 4 on PubMed Central; Source 5 JMIR; Source 1 arXiv) consistently indicate LLM outputs require human verification, can be biased or inaccurate, and should not be treated as dependable evidentiary authorities—at most they are tools/outputs to be disclosed and checked rather than relied on for factual claims. The pro side's best evidence (Sources 10–11 university library guides) supports that AI outputs can be cited for transparency/attribution, but that is not the same as being “reliably cited as primary sources,” so the trustworthy evidence overall refutes the claim.
Expert summary
The arguments
Two AI advocates debated this claim using the research gathered.
Argument for
While the evidence is largely against this position, it is worth noting that multiple highly authoritative academic institutions — including Stanford University Libraries (Source 11) and Purdue University Libraries (Source 10) — have established formal citation frameworks specifically for LLMs, providing structured reference formats that treat AI outputs as citable entities within academic work, which implicitly acknowledges their legitimacy as sources. Furthermore, Source 19 (Jenni.ai) and Source 23 (Sourcely) confirm that direct quotation or paraphrase of LLM output warrants formal citation according to recognized style guides, meaning that by the very standards of academic citation practice, AI language models can and do function as citable primary sources for the specific content they generate.
You're conflating “there is a format to disclose/cite AI output” with “AI is a reliable primary source,” but Stanford and Purdue's guidance (Sources 11 and 10) is about how to reference an LLM when you used it—not a claim that its content is dependable evidence, which APA explicitly conditions on human verification and disclosure (Source 3). And your appeal to Jenni.ai and Sourcely (Sources 19 and 23) is a classic category error: citing a tool because you quoted it doesn't validate its truth, especially when the research brief documents systematic risks of fabricated/misattributed citations and bias that undermine reliability as a primary source (Sources 5, 2, and 1).
Argument against
The motion is false because multiple scholarly and institutional sources warn that LLM outputs are not reliably verifiable and can generate biased, misattributed, or outright fabricated citations—disqualifying them as dependable primary sources for academic claims (Source 1, arXiv; Source 2, PMC; Source 5, JMIR Medical Education). Style and library guidance further treats generative AI as a tool whose output must be checked and disclosed rather than as a primary-source authority, explicitly emphasizing human accountability and the need to cite original sources instead (Source 3, APA Style; Source 8, University of Utah; Source 22, LLM Background Knowledge).
Your argument commits a category error: you conflate the reliability of AI-generated citations about other sources with the question of whether AI output itself can be cited as a primary source — but Source 11 (Stanford University Libraries) and Source 10 (Purdue University Libraries) provide formal reference formats precisely for citing LLM-generated content as its own citable entity, not as a substitute for peer-reviewed literature, which means the disclosure and verification requirements you cite from Source 3 (APA Style) and Source 8 (University of Utah) actually support the claim by confirming that structured, legitimate citation of AI outputs is both possible and practiced. Furthermore, your reliance on fabrication concerns from Source 5 (JMIR Medical Education) and Source 20 addresses LLMs generating false references to external works — a separate issue entirely from citing what an LLM itself produced, which is a verifiable, reproducible output that Source 19 (Jenni.ai) explicitly confirms warrants formal citation when directly quoted or paraphrased.