Verification Library — Verified Claims & Analysis

2 published verifications about Claude Opus 4.6 Claude Opus 4.6 ×

“Claude Opus 4.7 outperforms Claude Opus 4.6 on coding tasks according to measurable benchmarks.”

Claude Opus 4.7 does show clear, quantified improvements over Opus 4.6 on multiple coding-specific benchmarks, including SWE-bench Verified (80.8%→87.6%), SWE-bench Pro (53.4%→64.3%), and CursorBench (58%→70%). These figures are consistently reported across Anthropic's official documentation, the AWS News Blog, and numerous third-party writeups. The primary caveat is that the benchmark data originates from Anthropic's own reporting and has not yet been independently replicated by a third-party benchmark aggregator.

“Claude Opus 4.6 successfully built a working C compiler.”

Mostly True

Claude Opus 4.6 did produce a functional C compiler — a 100,000-line Rust codebase that compiles Linux 6.9, passes 99% of GCC's torture tests, and builds major projects like FFmpeg, Redis, and PostgreSQL. However, the claim omits important context: the compiler relies on GCC's assembler and linker for critical steps, independent testers found reliability issues with basic programs, it was built by 16 parallel AI agents (not one instance) with human oversight, and it cost ~$20,000 in API usage. It works, but with significant caveats.

Verify any claim →

Library

Enter the 6-digit code

Sign up to verify claims

About