ChatGPT Fake Citations

By Grayde Bowen | 11/11/2024 | Share:

Anyone undertaking research will want to back up their conclusions with citations as to where their source material came from. It is assumed that they will have read that material and hence most users of their research will not need to search out further references and verify its reliability. A less scrupulous researcher could be tempted to add in citations that appear to back up their case without ever having read them. There is some degree of excuse for students and casual researchers in that many documents are behind academic paywalls where full details are not readily available so the majority of web users cannot check them.

A study published by Nature in September 2023 looked at citations provided by ChatGPT in articles on subjects typical of 1^st year Undergraduate research studies. They found that ‘55% of the GPT-3.5 citations but just 18% of the GPT-4 citations are fabricated’. In addition ‘43% of the real (non-fabricated) GPT-3.5 citations but just 24% of the real GPT-4 citations include substantive citation errors’. The ChatGPT-3.5 engine is now regarded as deprecated but the current GPT-4 version is still producing a significant error rate. Even putting the search term ‘ChatGPT fake citations’ into Google produces an AI generated section at the top of the result page that begins ‘Yes, ChatGPT can generate fake citations’ and continues with citations referencing sources for the prevalence of its own fake citations. Some of these real citations refer back to the Nature article we cited above.

Rather than run down a rabbit hole of who references who consider why AI is producing fake references; often referred to as hallucinations. The Nature study ignored minor errors in the transcription of citations. They concluded that fake citations were a factor of the predictive model of AI. Although depending on a large volume of existing data any AI by definition will create new data. The models themselves cannot check all references that they create either due to processing constraints or having no access to the research sites that allegedly host these sources. The sites themselves are likely to block AI access as they derive income from selling access to their portals.

Use of these AI hallucinations poses some problems for research students and in turn for their tutors who need to identify their use. A blog article from Duke University in the USA warns against the use of ChatGPT for academic sources. The use of AI detection software and the threat of action for academic misconduct should discourage many students from the use of AI. It is still likely that AI citations have been used in academia either by students or in some of the less responsible research publishers. This leads to the possible proliferation of unsubstantiated conclusions that could in turn be used by AI models that further propagate unreliable conclusions.

A real-world issue has arisen from the use of AI in legal cases. In Mata v Avianca (Jurisdiction: United States District Court, Southern District of New York) lawyers based their submissions on ChatGPT responses and citations that proved to be untrue. Their case was found to rely on nonsensical opinions attributed to real judges and backed up by false citations; all provided by ChatGPT. The UK Bar Standards Board does recognise that ChatGPT can be a useful tool but only as a means of summarising information that a legal team could otherwise have worked out manually. In the legal profession anyone providing information supporting a case takes full responsibility for its content and is assumed to have the professional knowledge to do just that.

The language of citations has evolved to take account of engines such as ChatGPT with acceptable formats for its use in areas such as academia and law. As generative models it is unlikely that the same prompt will create an identical response when repeated. If research or opinions are to be backed up by AI output then any conversation will need to be not only referenced but included as an appendix to the text. One solution would be to export the conversation as a PDF. The user would still need to take responsibility for its content.