Static LLM benchmark dashboard
Agentic model rankings with explicit coverage and quality gates.
Real local-model eval results for Hermes-style agent/tool-calling work. Overall leaders require eligible data in every category as the benchmark set grows.
Current Leaders
Incomplete models can win category cards, but not the overall leaderboard.
Best Agentic Tool UseEligible
Qwen3.6 35B A3B Q4 MLX
OMLX local / Qwen
2/2 benchmarks covered
Best Agentic CodingNo data
No dataNo eligible model has results for this category.
Best Long-Term TasksNo data
No dataNo eligible model has results for this category.
Best SpeedNo data
No dataNo eligible model has results for this category.
Model Comparison
Switch categories and views without hiding dates, harnesses, coverage, or score status.
| Rank | Model | Provider | Overall | Status / coverage | Score rail | Latest run | Benchmark / harness | Pass rate | Latency / TPS | Cost |
|---|---|---|---|---|---|---|---|---|---|---|
| 01 | GPT-5.5 Version gpt-5.5 | OpenAI Codex GPT | No data | Incomplete 2/2 benchmarks | Jul 4, 2026 | Hermes Tool Contract (Hermes Agent Evals hermes_tool_contract_v0); Hermes Tool Contract Hard v1 (Hermes Agent Evals hermes_tool_contract_hard_v1) | 83.5% | 8.41s / 13.7 tok/s | - | |
| 02 | Ornith 1.0 35B 5bit MLX Version Ornith-1.0-35B-5bit-mlx | OMLX local Ornith | No data | Incomplete 2/2 benchmarks | Jul 4, 2026 | Hermes Tool Contract Hard v1 (Hermes Agent Evals hermes_tool_contract_hard_v1); Hermes Tool Contract (Hermes Agent Evals hermes_tool_contract_v0) | 76.0% | 11.4s / 22.8 tok/s | $0.00 | |
| 03 | Qwen3.6 27B Q4 MLX Version Qwen3.6-27B-UD-Q4_K_XL-mlx | OMLX local Qwen | No data | Incomplete 2/2 benchmarks | Jul 4, 2026 | Hermes Tool Contract Hard v1 (Hermes Agent Evals hermes_tool_contract_hard_v1); Hermes Tool Contract (Hermes Agent Evals hermes_tool_contract_v0) | 0.0% | 6.59s / - | $0.00 | |
| 04 | Qwen3.6 35B A3B Q4 MLX Version Qwen3.6-35B-A3B-UD-Q4_K_XL-mlx | OMLX local Qwen | No data | Incomplete 2/2 benchmarks | Jul 4, 2026 | Hermes Tool Contract (Hermes Agent Evals hermes_tool_contract_v0); Hermes Tool Contract Hard v1 (Hermes Agent Evals hermes_tool_contract_hard_v1) | 90.5% | 12.9s / 28.0 tok/s | $0.00 | |
| 05 | Qwen3.6 35B A3B Q8 MLX Version Brooooooklyn-Qwen3.6-35B-A3B-UD-Q8_K_XL-mlx | OMLX local Qwen | No data | Incomplete 2/2 benchmarks | Jul 4, 2026 | Hermes Tool Contract Hard v1 (Hermes Agent Evals hermes_tool_contract_hard_v1); Hermes Tool Contract (Hermes Agent Evals hermes_tool_contract_v0) | 84.0% | 13.0s / 27.1 tok/s | $0.00 |
No dataNo eligible rows are available for this chart.
Not charted
GPT-5.5: IncompleteOrnith 1.0 35B 5bit MLX: IncompleteQwen3.6 27B Q4 MLX: IncompleteQwen3.6 35B A3B Q4 MLX: IncompleteQwen3.6 35B A3B Q8 MLX: Incomplete
| Rank | Model | Provider | Agentic Tool Use | Status / coverage | Score rail | Latest run | Benchmark / harness | Pass rate | Latency / TPS | Cost |
|---|---|---|---|---|---|---|---|---|---|---|
| 01 | Qwen3.6 35B A3B Q4 MLX Version Qwen3.6-35B-A3B-UD-Q4_K_XL-mlx | OMLX local Qwen | 90.3 | Eligible 2/2 benchmarks | Jul 4, 2026 | Hermes Tool Contract (Hermes Agent Evals hermes_tool_contract_v0); Hermes Tool Contract Hard v1 (Hermes Agent Evals hermes_tool_contract_hard_v1) | 90.5% | 12.9s / 28.0 tok/s | $0.00 | |
| 02 | Qwen3.6 35B A3B Q8 MLX Version Brooooooklyn-Qwen3.6-35B-A3B-UD-Q8_K_XL-mlx | OMLX local Qwen | 83.4 | Eligible 2/2 benchmarks | Jul 4, 2026 | Hermes Tool Contract Hard v1 (Hermes Agent Evals hermes_tool_contract_hard_v1); Hermes Tool Contract (Hermes Agent Evals hermes_tool_contract_v0) | 84.0% | 13.0s / 27.1 tok/s | $0.00 | |
| 03 | GPT-5.5 Version gpt-5.5 | OpenAI Codex GPT | 81.3 | Eligible 2/2 benchmarks | Jul 4, 2026 | Hermes Tool Contract (Hermes Agent Evals hermes_tool_contract_v0); Hermes Tool Contract Hard v1 (Hermes Agent Evals hermes_tool_contract_hard_v1) | 83.5% | 8.41s / 13.7 tok/s | - | |
| 04 | Ornith 1.0 35B 5bit MLX Version Ornith-1.0-35B-5bit-mlx | OMLX local Ornith | 74.8 | Eligible 2/2 benchmarks | Jul 4, 2026 | Hermes Tool Contract Hard v1 (Hermes Agent Evals hermes_tool_contract_hard_v1); Hermes Tool Contract (Hermes Agent Evals hermes_tool_contract_v0) | 76.0% | 11.4s / 22.8 tok/s | $0.00 | |
| 05 | Qwen3.6 27B Q4 MLX Version Qwen3.6-27B-UD-Q4_K_XL-mlx | OMLX local Qwen | 0.0 | Eligible 2/2 benchmarks | Jul 4, 2026 | Hermes Tool Contract Hard v1 (Hermes Agent Evals hermes_tool_contract_hard_v1); Hermes Tool Contract (Hermes Agent Evals hermes_tool_contract_v0) | 0.0% | 6.59s / - | $0.00 |
| Rank | Model | Provider | Agentic Coding | Status / coverage | Score rail | Latest run | Benchmark / harness | Pass rate | Latency / TPS | Cost |
|---|---|---|---|---|---|---|---|---|---|---|
| 01 | GPT-5.5 Version gpt-5.5 | OpenAI Codex GPT | No data | No data 0/0 benchmarks | - | No data | - | - | - | |
| 02 | Ornith 1.0 35B 5bit MLX Version Ornith-1.0-35B-5bit-mlx | OMLX local Ornith | No data | No data 0/0 benchmarks | - | No data | - | - | - | |
| 03 | Qwen3.6 27B Q4 MLX Version Qwen3.6-27B-UD-Q4_K_XL-mlx | OMLX local Qwen | No data | No data 0/0 benchmarks | - | No data | - | - | - | |
| 04 | Qwen3.6 35B A3B Q4 MLX Version Qwen3.6-35B-A3B-UD-Q4_K_XL-mlx | OMLX local Qwen | No data | No data 0/0 benchmarks | - | No data | - | - | - | |
| 05 | Qwen3.6 35B A3B Q8 MLX Version Brooooooklyn-Qwen3.6-35B-A3B-UD-Q8_K_XL-mlx | OMLX local Qwen | No data | No data 0/0 benchmarks | - | No data | - | - | - |
No dataNo eligible rows are available for this chart.
Not charted
GPT-5.5: No dataOrnith 1.0 35B 5bit MLX: No dataQwen3.6 27B Q4 MLX: No dataQwen3.6 35B A3B Q4 MLX: No dataQwen3.6 35B A3B Q8 MLX: No data
| Rank | Model | Provider | Long-Term Tasks | Status / coverage | Score rail | Latest run | Benchmark / harness | Pass rate | Latency / TPS | Cost |
|---|---|---|---|---|---|---|---|---|---|---|
| 01 | GPT-5.5 Version gpt-5.5 | OpenAI Codex GPT | No data | No data 0/0 benchmarks | - | No data | - | - | - | |
| 02 | Ornith 1.0 35B 5bit MLX Version Ornith-1.0-35B-5bit-mlx | OMLX local Ornith | No data | No data 0/0 benchmarks | - | No data | - | - | - | |
| 03 | Qwen3.6 27B Q4 MLX Version Qwen3.6-27B-UD-Q4_K_XL-mlx | OMLX local Qwen | No data | No data 0/0 benchmarks | - | No data | - | - | - | |
| 04 | Qwen3.6 35B A3B Q4 MLX Version Qwen3.6-35B-A3B-UD-Q4_K_XL-mlx | OMLX local Qwen | No data | No data 0/0 benchmarks | - | No data | - | - | - | |
| 05 | Qwen3.6 35B A3B Q8 MLX Version Brooooooklyn-Qwen3.6-35B-A3B-UD-Q8_K_XL-mlx | OMLX local Qwen | No data | No data 0/0 benchmarks | - | No data | - | - | - |
No dataNo eligible rows are available for this chart.
Not charted
GPT-5.5: No dataOrnith 1.0 35B 5bit MLX: No dataQwen3.6 27B Q4 MLX: No dataQwen3.6 35B A3B Q4 MLX: No dataQwen3.6 35B A3B Q8 MLX: No data
| Rank | Model | Provider | Speed | Status / coverage | Score rail | Latest run | Benchmark / harness | Pass rate | Completion / TPS | Cost |
|---|---|---|---|---|---|---|---|---|---|---|
| 01 | GPT-5.5 Version gpt-5.5 | OpenAI Codex GPT | No data | No data 0/0 benchmarks | - | No data | - | - | - | |
| 02 | Ornith 1.0 35B 5bit MLX Version Ornith-1.0-35B-5bit-mlx | OMLX local Ornith | No data | No data 0/0 benchmarks | - | No data | - | - | - | |
| 03 | Qwen3.6 27B Q4 MLX Version Qwen3.6-27B-UD-Q4_K_XL-mlx | OMLX local Qwen | No data | No data 0/0 benchmarks | - | No data | - | - | - | |
| 04 | Qwen3.6 35B A3B Q4 MLX Version Qwen3.6-35B-A3B-UD-Q4_K_XL-mlx | OMLX local Qwen | No data | No data 0/0 benchmarks | - | No data | - | - | - | |
| 05 | Qwen3.6 35B A3B Q8 MLX Version Brooooooklyn-Qwen3.6-35B-A3B-UD-Q8_K_XL-mlx | OMLX local Qwen | No data | No data 0/0 benchmarks | - | No data | - | - | - |
No dataNo eligible rows are available for this chart.
Not charted
GPT-5.5: No dataOrnith 1.0 35B 5bit MLX: No dataQwen3.6 27B Q4 MLX: No dataQwen3.6 35B A3B Q4 MLX: No dataQwen3.6 35B A3B Q8 MLX: No data