ChatGPT’s Improved Performance in Supreme Court Analysis Highlights AI Advancements and Limitations

“`html

In 2023, following the launch of ChatGPT, SCOTUSblog decided to evaluate the AI’s aptitude in answering Supreme Court-related questions, finding its performance lackluster as only 21 out of 50 questions were answered correctly. Fast-forward to the present, and advancements in AI have prompted a re-evaluation to see if its capabilities have improved.

As reported by SCOTUSblog, ChatGPT now demonstrates enhanced accuracy and depth in its responses. Questions such as the Supreme Court’s original composition (started with six seats) and the nuances of terms like “relisted” petitions are now answered with significant detail. Notably, the AI now correctly attributes historical facts, such as Donald Trump appointing three justices during his first term, correcting previous errors.

ChatGPT reflects a broadened understanding of legal doctrines, such as non-justiciability, now including examples like mootness, which were previously absent.
The AI has shown improvement in aspects of factual precision, correctly describing the junior justice’s roles and the average number of Supreme Court oral arguments.
Attempts at confusing the AI with false claims showed its improved capability to debunk misinformation confidently.

In a more detailed model evaluation of three versions—4o, o3-mini, and o1—the discrepancies between models became apparent:

Model 4o: Tends to over-elaborate, providing detailed narratives that could lead to misinformation if unchecked. It sometimes cites incorrect historical facts.
Model o3-mini: While fast, it often delivers vague responses and commits errors such as misidentifying court locations.
Model o1: Demonstrated a better balance, providing accurate and well-contextualized responses, marking it as the more proficient among the three.

Notably, the AI’s ability to analyze recent jurisprudence has improved, including accurate depictions of cases like the Second Amendment ruling in New York State Rifle & Pistol Association v. Bruen. Additionally, 4o was the only model to recognize the Supreme Court’s recent adoption of an ethics code, further emphasizing model discrepancies.

Despite advancements, reliance solely on AI remains problematic. While the models have shown substantial improvements with o1, achieving a 90% accuracy rate on 50 questions, the necessity for verification and cross-referencing by legal professionals remains critical. The complete comparison and reflections on ChatGPT’s responses can be accessed here.

The growing role of AI in legal contexts highlights potential but underlines the continuous need for human oversight in the nuanced arena of jurisprudence.

“`

Share this: