Stanford Study Highlights Higher Hallucination Rates in Westlaw’s AI Compared to LexisNexis

Stanford University has released an augmented version of its earlier study on generative AI legal research tools produced by LexisNexis and Thomson Reuters, which found that Westlaw’s AI product exhibits a higher rate of hallucinations compared to LexisNexis’ offering. The original study faced criticism for omitting Thomson Reuters’ AI-Assisted Research in Westlaw Precision, a product that is distinct from the previously tested Ask Practical Law AI. Following negotiations, Thomson Reuters granted access to their AI-Assisted Research tool, leading Stanford to publish revised results.

The enhanced study reports that Westlaw’s AI is accurate only 42% of the time, compared to LexisNexis’ 65% accuracy. More critically, Westlaw’s tool was found to hallucinate in 33% of cases—nearly double the 17% hallucination rate found in Lexis+ AI. One principal reason cited for this discrepancy is the length of responses; Westlaw’s answers average 350 words, whereas Lexis averages 219 words. This increased length introduces more falsifiable propositions, heightening the likelihood of hallucinations and making validation more time-consuming.

Despite these issues, the study acknowledges that both systems can produce high-quality responses in certain contexts. For instance, Lexis was noted for clarifying a false premise in a query by accurately stating the law, while Westlaw demonstrated a proper understanding of a nuanced patent law question. However, the study also underscores the importance of transparency and rigorous benchmarking of AI tools in the legal domain. The authors emphasize that these systems still struggle with basic legal comprehension, such as accurately describing case holdings and differentiating between litigant arguments and court decisions.

According to the researchers, the most significant outcome of the study is the highlighted need for rigorous, transparent assessments and public evaluations of AI tools in the legal industry. They particularly call attention to the lack of access to systematic benchmarking information in legal AI tools, as compared to more widely available systems like GPT-4.

For further details, the full study can be accessed here.

Share this: