Stanford Study Reveals Hallucinations in AI Legal Research Tools, Plans Further Evaluation

Stanford University plans to augment its recent study on generative AI legal research tools from LexisNexis and Thomson Reuters. The initial findings revealed that these tools deliver hallucinated results in over 17% of queries, contradicting the companies’ claims of reliability. The preprint study by Stanford’s RegLab and its Human-Centered Artificial Intelligence research center noted substantial performance differences between the products: Lexis+ AI from LexisNexis offered accurate responses on 65% of queries, whereas Thomson Reuters’ Ask Practical Law AI performed accurately only 18% of the time. Critics argue, however, that the study’s methodology compared dissimilar tools and wasn’t fully transparent.

Significantly, the study did not assess Thomson Reuters’ AI-Assisted Research in Westlaw Precision, which led to claims of unfair comparison with the selected Lexis+ AI product. Representatives from LexisNexis and Thomson Reuters have responded to these findings, emphasizing their products’ use of retrieval-augmented generation (RAG) to minimize hallucinations. LexisNexis highlighted the broader definition of hallucination used in the study, which includes not only fabricated sources but also misgrounded or irrelevant ones.

Due to criticisms regarding methodology and fairness, Thomson Reuters has since provided its AI-Assisted Research tool to the Stanford researchers for evaluation. Stanford is working promptly to update its results, which could potentially alter the conclusions. Daniel E. Ho, one of the study’s authors, emphasized the need for transparency and benchmarking in evaluating legal AI tools, underscoring that it should not be solely the responsibility of academic researchers.

The study concluded that while these AI tools have not eradicated hallucinations, they still offer considerable value compared to traditional legal research methods. Ultimately, the findings underscore the importance of rigorous, transparent benchmarking and systematic access to AI tools designed for legal professionals to ensure responsible integration and oversight.

For further details, you can read the full initial report from LawNext.