New Study Highlights Impact of AI Tools on Legal Workflows: Harvey and CoCounsel Lead the Charge

A first-of-its-kind benchmark study has provided detailed insights into the performance of generative artificial intelligence in legal workflows, revealing that AI tools are proving to be valuable assets for legal professionals. The Vals Legal AI Report (VLAIR) has undertaken a systematic evaluation of four prominent legal AI tools: Harvey, CoCounsel by Thomson Reuters, Vincent AI from vLex, and Oliver from Vecflow. These tools were assessed across seven core legal tasks, using real-world scenarios developed with input from Am Law 100 firms.

The study found that Harvey Assistant emerged as the top performer, achieving the highest scores in five out of six tasks it participated in, including a remarkable 94.8% accuracy rate for document Q&A. The tool consistently outperformed human lawyers (lawyer baseline) in several areas, such as document extraction and transcript analysis. Harvey’s capabilities are bolstered by leveraging multiple large language models (LLMs) and fine-tuned custom models aligned with legal processes.

CoCounsel also showcased strong performance, particularly excelling in document summarization with a score of 77.2%. The tool ranked amongst the top across all tasks it participated in, consistently surpassing the lawyer baseline in performance by over 10 points in tasks like document Q&A, where it achieved a score of 89.6%.

The study emphasized the efficiency and speed of these AI tools. They are, on average, significantly faster than human counterparts, delivering results six to 80 times quicker. This speed presents a compelling argument for their integration into legal workflows to drive productivity, especially as initial points of reference for legal tasks (full study details).

Vincent AI, recognized for its adept design, was noted for making conscious refusals to answer when data was insufficient, a trait that, while affecting scores, was considered praiseworthy for avoiding inaccuracies. Oliver was identified as the benchmark’s most effective tool for the EDGAR research task, showcasing potential in tasks requiring comprehensive, iterative solutions.

The report mentions future iterations and expansions of this benchmark study, which aim to provide ongoing benchmarks for these tools’ efficacy, reflecting a growing trend within the legal sector to establish standard methodologies for AI evaluation. Concluding that AI tools already offer substantial utility for legal practitioners, the report acknowledges there remains room for both performance improvements and evaluation methods, paving the way for broader adoption and development in the sector.