Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124

Last month I wrote about it New Mercur standard Measuring the capabilities of AI agents in professional tasks such as law and corporate analysis. At the time, the results were very poor, with every major lab scoring below 25%, so we concluded that lawyers were safe from being displaced by AI, at least for now.
But AI capabilities can change a lot in a couple of weeks.
This week’s release of Anthropic’s Opus 4.6 It shook Leaderboardswhere the new Anthropic model scored approximately 30% in one-time experiments, and averaged 45% when some additional improvements were made to the problem. It’s worth noting that the release included a bunch of new agent features, including Agent Swarms, which may have helped with this type of multi-step problem solving.
Regardless, the result represents a huge leap from the previous state of the art, and a sign that progress in foundational modeling is not slowing down. “Jumping from 18.4% to 29.8% in just a few months is crazy,” said Mercure CEO Brendan Foody, who was particularly impressed.

There’s still a long way to go to reach 30%, so it doesn’t look like lawyers need to worry about being replaced by machines next week. But they must be a lot less confident than they were last month!