Legal professionals face considerable challenges as they navigate the complex landscape of generative AI tools for legal research, according to a panel discussion during the American Association of Law Libraries annual conference in Portland, Ore., this week. The session, titled “AI in Legal Research: Measuring What Matters with Benchmarks and Rubrics,” featured industry experts Sean Harrington, Cindy Guyer, and Nick Hafen, with moderation by Debbie Ginsburg.
Harrington, Guyer, and Hafen addressed the inherent difficulties in benchmarking legal AI tools, a task complicated by the opaque nature of these systems and the multifaceted legal questions they handle. As Hafen pointed out, legal questions often do not have straightforward answers, which complicates benchmark standardization.
Studies presented during the panel, such as the Stanford study on hallucination rates among AI tools, highlighted both the potential and limitations of current methodologies. It found that tools such as LexisNexis and Thomson Reuters still struggle with accuracy, prompting concerns about their reliability.
Harrington reported that despite significant investments in AI, adoption rates remain low within firms. Professionals appear hesitant to rely on these tools fully, as demonstrated by the feedback from an Am Law 10 firm, where only 2% of attorneys considered themselves power users. The hesitation is partly due to the secretive “black box” nature of these AI systems, which vendors are reluctant to open for thorough examination.
Guyer recounted her experience with an “AI Smackdown” conducted for the Southern California Association of Law Libraries, where various AI research platforms were evaluated based on criteria such as accuracy and depth of analysis. The exercise revealed significant strengths and weaknesses in the tools, emphasizing the need for reliability and relevance in outcomes.
The panelists recommended that firms focus on specific use cases for evaluation, involve actual users in the assessment process, and pressure vendors for greater transparency. They advocated for creating industry-standard benchmarks to bring about more meaningful evaluations of AI tools.
While vendors have been resistant to opening up their systems, the panelists suggested that larger law firms, like Kirkland & Ellis, may eventually drive the demand for more transparency and standardization in AI tool benchmarking.
This discussion at the AALL annual meeting marks a crucial step in pushing toward more rigorous and standardized assessments of AI tools in the legal sector, aiming to ensure that these technologies meet the evolving needs of legal professionals.
For further details about the panel discussion and studies addressed, read more on LawNext.