AI Models Surpass Human Performance in Legal Ethics Examination, Signaling New Era for Legal Sector

In a significant development for the legal sector, generative AI has successfully passed a critical test used to measure a candidate’s aptitude to be licensed as a lawyer. After successfully passing the bar exam earlier in March, AI developed by OpenAI, known as GPT-4, achieved high scores in a demonstration of the Multistate Professional Responsibility Examination (MPRE), an exploration of legal ethical conduct required in almost all U.S. jurisdictions.

This test was administered by a team at LegalOn Technologies, led by Gabor Melli, their Vice President of Artificial Intelligence. The team concluded that two of the leading generative AI models – GPT-4 developed by OpenAI and Claude 2 developed by Anthropic – demonstrated capabilities crucial for passing the legal ethics examination.

LegalOn’s CEO Daniel Lewis acknowledged that while AI may not be explicitly ‘moral’, these results indicate its potential to support ethical decision-making in legal contexts. The test evaluated four leading generative AI models including OpenAI’s GPT-4 and GPT-3.5, Anthropic’s Claude 2, and Google’s PaLM 2 Bison for their ability to accurately respond to questions styled around the MPRE. GPT-4 performed best, answering 74% of questions correctly, which significantly outstripped the average human test-taker by an estimated 6%.

Both GPT-4 and Claude 2 surpassed the approximate passing threshold for the MPRE in every necessary state, a threshold estimated between 56-64% depending upon the jurisdiction. The testing procedure included 500 simulated exam questions created by Dru Stevenson, a law professor at South Texas College of Law Houston. The questions reflected the same format and style as those on the current MPRE. All models were tested using a ‘zero shot’ approach, involving no preliminary training about legal ethics.

While the performance of GPT-4 showed a remarkable proficiency, its results varied by subject area. It scored particularly high on questions related to conflicts of interest and client relationships, however, it was found less effective on topics such as the safekeeping of funds.

Law professor Stevenson acknowledged this achievement as a landmark moment for legal technology and for the practice of law, emphasizing the role of AI in aiding legal practitioners in meeting high ethical standards consistently.

The complete report is available to download here.

Share this: