shipfeed
· topic · evals
subscribe
→
home
/
topics
/
evals
§ topic
/
evals
week
002
month
002
all-time
002
Benchmark releases and evaluation results
+
Ad slot open
shipfeed
— Top placement on every shipfeed page. Plain text, your link, your color. See ad rates and inventory below.
advertise here →
clusters this week
2 active
CLAUDE
08:01
Gemini 3 beats Claude on SWE-bench. 64.8% vs 61.2%.
↗
via
Google DeepMind
EVALS
06:01
o4-mini posts SOTA on coding evals.
↗
via
OpenAI
▤
feed
⊞
topics
◉
about
⌕
advertise