Static LLM benchmark dashboard

Agentic model rankings with explicit coverage and quality gates.

Real local-model eval results for Hermes-style agent/tool-calling work. Overall leaders require eligible data in every category as the benchmark set grows.

Current Leaders

Incomplete models can win category cards, but not the overall leaderboard.

Best Agentic Tool UseEligible
Qwen3.6 35B A3B Q4 MLX
OMLX local / Qwen
Leader score90.3

2/2 benchmarks covered

Latest runJul 4, 2026
Best Agentic CodingNo data
No dataNo eligible model has results for this category.
Best Long-Term TasksNo data
No dataNo eligible model has results for this category.
Best SpeedNo data
No dataNo eligible model has results for this category.

Model Comparison

Switch categories and views without hiding dates, harnesses, coverage, or score status.

RankModelProviderOverallStatus / coverageScore railLatest runBenchmark / harnessPass rateLatency / TPSCost
01GPT-5.5
Version gpt-5.5
OpenAI Codex
GPT
No dataIncomplete
2/2 benchmarks
IncompleteIncomplete
Jul 4, 2026Hermes Tool Contract (Hermes Agent Evals hermes_tool_contract_v0); Hermes Tool Contract Hard v1 (Hermes Agent Evals hermes_tool_contract_hard_v1)83.5%8.41s / 13.7 tok/s-
02Ornith 1.0 35B 5bit MLX
Version Ornith-1.0-35B-5bit-mlx
OMLX local
Ornith
No dataIncomplete
2/2 benchmarks
IncompleteIncomplete
Jul 4, 2026Hermes Tool Contract Hard v1 (Hermes Agent Evals hermes_tool_contract_hard_v1); Hermes Tool Contract (Hermes Agent Evals hermes_tool_contract_v0)76.0%11.4s / 22.8 tok/s$0.00
03Qwen3.6 27B Q4 MLX
Version Qwen3.6-27B-UD-Q4_K_XL-mlx
OMLX local
Qwen
No dataIncomplete
2/2 benchmarks
IncompleteIncomplete
Jul 4, 2026Hermes Tool Contract Hard v1 (Hermes Agent Evals hermes_tool_contract_hard_v1); Hermes Tool Contract (Hermes Agent Evals hermes_tool_contract_v0)0.0%6.59s / -$0.00
04Qwen3.6 35B A3B Q4 MLX
Version Qwen3.6-35B-A3B-UD-Q4_K_XL-mlx
OMLX local
Qwen
No dataIncomplete
2/2 benchmarks
IncompleteIncomplete
Jul 4, 2026Hermes Tool Contract (Hermes Agent Evals hermes_tool_contract_v0); Hermes Tool Contract Hard v1 (Hermes Agent Evals hermes_tool_contract_hard_v1)90.5%12.9s / 28.0 tok/s$0.00
05Qwen3.6 35B A3B Q8 MLX
Version Brooooooklyn-Qwen3.6-35B-A3B-UD-Q8_K_XL-mlx
OMLX local
Qwen
No dataIncomplete
2/2 benchmarks
IncompleteIncomplete
Jul 4, 2026Hermes Tool Contract Hard v1 (Hermes Agent Evals hermes_tool_contract_hard_v1); Hermes Tool Contract (Hermes Agent Evals hermes_tool_contract_v0)84.0%13.0s / 27.1 tok/s$0.00
No dataNo eligible rows are available for this chart.

Not charted

GPT-5.5: IncompleteOrnith 1.0 35B 5bit MLX: IncompleteQwen3.6 27B Q4 MLX: IncompleteQwen3.6 35B A3B Q4 MLX: IncompleteQwen3.6 35B A3B Q8 MLX: Incomplete
RankModelProviderAgentic Tool UseStatus / coverageScore railLatest runBenchmark / harnessPass rateLatency / TPSCost
01Qwen3.6 35B A3B Q4 MLX
Version Qwen3.6-35B-A3B-UD-Q4_K_XL-mlx
OMLX local
Qwen
90.3Eligible
2/2 benchmarks
Eligible90.3
Jul 4, 2026Hermes Tool Contract (Hermes Agent Evals hermes_tool_contract_v0); Hermes Tool Contract Hard v1 (Hermes Agent Evals hermes_tool_contract_hard_v1)90.5%12.9s / 28.0 tok/s$0.00
02Qwen3.6 35B A3B Q8 MLX
Version Brooooooklyn-Qwen3.6-35B-A3B-UD-Q8_K_XL-mlx
OMLX local
Qwen
83.4Eligible
2/2 benchmarks
Eligible83.4
Jul 4, 2026Hermes Tool Contract Hard v1 (Hermes Agent Evals hermes_tool_contract_hard_v1); Hermes Tool Contract (Hermes Agent Evals hermes_tool_contract_v0)84.0%13.0s / 27.1 tok/s$0.00
03GPT-5.5
Version gpt-5.5
OpenAI Codex
GPT
81.3Eligible
2/2 benchmarks
Eligible81.3
Jul 4, 2026Hermes Tool Contract (Hermes Agent Evals hermes_tool_contract_v0); Hermes Tool Contract Hard v1 (Hermes Agent Evals hermes_tool_contract_hard_v1)83.5%8.41s / 13.7 tok/s-
04Ornith 1.0 35B 5bit MLX
Version Ornith-1.0-35B-5bit-mlx
OMLX local
Ornith
74.8Eligible
2/2 benchmarks
Eligible74.8
Jul 4, 2026Hermes Tool Contract Hard v1 (Hermes Agent Evals hermes_tool_contract_hard_v1); Hermes Tool Contract (Hermes Agent Evals hermes_tool_contract_v0)76.0%11.4s / 22.8 tok/s$0.00
05Qwen3.6 27B Q4 MLX
Version Qwen3.6-27B-UD-Q4_K_XL-mlx
OMLX local
Qwen
0.0Eligible
2/2 benchmarks
Eligible0.0
Jul 4, 2026Hermes Tool Contract Hard v1 (Hermes Agent Evals hermes_tool_contract_hard_v1); Hermes Tool Contract (Hermes Agent Evals hermes_tool_contract_v0)0.0%6.59s / -$0.00
RankModelProviderAgentic CodingStatus / coverageScore railLatest runBenchmark / harnessPass rateLatency / TPSCost
01GPT-5.5
Version gpt-5.5
OpenAI Codex
GPT
No dataNo data
0/0 benchmarks
No dataNo data
-No data---
02Ornith 1.0 35B 5bit MLX
Version Ornith-1.0-35B-5bit-mlx
OMLX local
Ornith
No dataNo data
0/0 benchmarks
No dataNo data
-No data---
03Qwen3.6 27B Q4 MLX
Version Qwen3.6-27B-UD-Q4_K_XL-mlx
OMLX local
Qwen
No dataNo data
0/0 benchmarks
No dataNo data
-No data---
04Qwen3.6 35B A3B Q4 MLX
Version Qwen3.6-35B-A3B-UD-Q4_K_XL-mlx
OMLX local
Qwen
No dataNo data
0/0 benchmarks
No dataNo data
-No data---
05Qwen3.6 35B A3B Q8 MLX
Version Brooooooklyn-Qwen3.6-35B-A3B-UD-Q8_K_XL-mlx
OMLX local
Qwen
No dataNo data
0/0 benchmarks
No dataNo data
-No data---
No dataNo eligible rows are available for this chart.

Not charted

GPT-5.5: No dataOrnith 1.0 35B 5bit MLX: No dataQwen3.6 27B Q4 MLX: No dataQwen3.6 35B A3B Q4 MLX: No dataQwen3.6 35B A3B Q8 MLX: No data
RankModelProviderLong-Term TasksStatus / coverageScore railLatest runBenchmark / harnessPass rateLatency / TPSCost
01GPT-5.5
Version gpt-5.5
OpenAI Codex
GPT
No dataNo data
0/0 benchmarks
No dataNo data
-No data---
02Ornith 1.0 35B 5bit MLX
Version Ornith-1.0-35B-5bit-mlx
OMLX local
Ornith
No dataNo data
0/0 benchmarks
No dataNo data
-No data---
03Qwen3.6 27B Q4 MLX
Version Qwen3.6-27B-UD-Q4_K_XL-mlx
OMLX local
Qwen
No dataNo data
0/0 benchmarks
No dataNo data
-No data---
04Qwen3.6 35B A3B Q4 MLX
Version Qwen3.6-35B-A3B-UD-Q4_K_XL-mlx
OMLX local
Qwen
No dataNo data
0/0 benchmarks
No dataNo data
-No data---
05Qwen3.6 35B A3B Q8 MLX
Version Brooooooklyn-Qwen3.6-35B-A3B-UD-Q8_K_XL-mlx
OMLX local
Qwen
No dataNo data
0/0 benchmarks
No dataNo data
-No data---
No dataNo eligible rows are available for this chart.

Not charted

GPT-5.5: No dataOrnith 1.0 35B 5bit MLX: No dataQwen3.6 27B Q4 MLX: No dataQwen3.6 35B A3B Q4 MLX: No dataQwen3.6 35B A3B Q8 MLX: No data
RankModelProviderSpeedStatus / coverageScore railLatest runBenchmark / harnessPass rateCompletion / TPSCost
01GPT-5.5
Version gpt-5.5
OpenAI Codex
GPT
No dataNo data
0/0 benchmarks
No dataNo data
-No data---
02Ornith 1.0 35B 5bit MLX
Version Ornith-1.0-35B-5bit-mlx
OMLX local
Ornith
No dataNo data
0/0 benchmarks
No dataNo data
-No data---
03Qwen3.6 27B Q4 MLX
Version Qwen3.6-27B-UD-Q4_K_XL-mlx
OMLX local
Qwen
No dataNo data
0/0 benchmarks
No dataNo data
-No data---
04Qwen3.6 35B A3B Q4 MLX
Version Qwen3.6-35B-A3B-UD-Q4_K_XL-mlx
OMLX local
Qwen
No dataNo data
0/0 benchmarks
No dataNo data
-No data---
05Qwen3.6 35B A3B Q8 MLX
Version Brooooooklyn-Qwen3.6-35B-A3B-UD-Q8_K_XL-mlx
OMLX local
Qwen
No dataNo data
0/0 benchmarks
No dataNo data
-No data---
No dataNo eligible rows are available for this chart.

Not charted

GPT-5.5: No dataOrnith 1.0 35B 5bit MLX: No dataQwen3.6 27B Q4 MLX: No dataQwen3.6 35B A3B Q4 MLX: No dataQwen3.6 35B A3B Q8 MLX: No data