Static LLM benchmark dashboard
Agentic model rankings with explicit coverage and quality gates.
Real local-model eval results for Hermes-style agent/tool-calling work. Overall leaders require eligible data in every category as the benchmark set grows.
Current Leaders
Incomplete models can win category cards, but not the overall leaderboard.
Best Agentic Tool UseEligible
Qwen3.6 35B A3B Q4 MLX
OMLX local / Qwen
2/2 benchmarks covered
Best Agentic CodingNo data
No dataNo eligible model has results for this category.
Best Long-Term TasksNo data
No dataNo eligible model has results for this category.
Best SpeedNo data
No dataNo eligible model has results for this category.
Model Comparison
Switch categories and views without hiding dates, harnesses, coverage, or score status.
| Rank | Model | Provider | Overall | Status / coverage | Score rail | Latest run | Benchmark / harness | Pass rate | Latency / TPS | Cost |
|---|---|---|---|---|---|---|---|---|---|---|
| 01 | GPT-5.5 Version gpt-5.5 | OpenAI Codex GPT | No data | Incomplete 2/2 benchmarks | Jul 4, 2026 | Hermes Tool Contract (Hermes Agent Evals hermes_tool_contract_v0); Hermes Tool Contract Hard v1 (Hermes Agent Evals hermes_tool_contract_hard_v1) | 83.5% | 8.41s / 13.7 tok/s | - | |
| 02 | Qwen3.6 35B A3B Q4 MLX Version Qwen3.6-35B-A3B-UD-Q4_K_XL-mlx | OMLX local Qwen | No data | Incomplete 2/2 benchmarks | Jul 4, 2026 | Hermes Tool Contract (Hermes Agent Evals hermes_tool_contract_v0); Hermes Tool Contract Hard v1 (Hermes Agent Evals hermes_tool_contract_hard_v1) | 90.5% | 12.9s / 28.0 tok/s | $0.00 |
No dataNo eligible rows are available for this chart.
Not charted
GPT-5.5: IncompleteQwen3.6 35B A3B Q4 MLX: Incomplete
| Rank | Model | Provider | Agentic Tool Use | Status / coverage | Score rail | Latest run | Benchmark / harness | Pass rate | Latency / TPS | Cost |
|---|---|---|---|---|---|---|---|---|---|---|
| 01 | Qwen3.6 35B A3B Q4 MLX Version Qwen3.6-35B-A3B-UD-Q4_K_XL-mlx | OMLX local Qwen | 90.3 | Eligible 2/2 benchmarks | Jul 4, 2026 | Hermes Tool Contract (Hermes Agent Evals hermes_tool_contract_v0); Hermes Tool Contract Hard v1 (Hermes Agent Evals hermes_tool_contract_hard_v1) | 90.5% | 12.9s / 28.0 tok/s | $0.00 | |
| 02 | GPT-5.5 Version gpt-5.5 | OpenAI Codex GPT | 81.3 | Eligible 2/2 benchmarks | Jul 4, 2026 | Hermes Tool Contract (Hermes Agent Evals hermes_tool_contract_v0); Hermes Tool Contract Hard v1 (Hermes Agent Evals hermes_tool_contract_hard_v1) | 83.5% | 8.41s / 13.7 tok/s | - |
| Rank | Model | Provider | Agentic Coding | Status / coverage | Score rail | Latest run | Benchmark / harness | Pass rate | Latency / TPS | Cost |
|---|---|---|---|---|---|---|---|---|---|---|
| 01 | GPT-5.5 Version gpt-5.5 | OpenAI Codex GPT | No data | No data 0/0 benchmarks | - | No data | - | - | - | |
| 02 | Qwen3.6 35B A3B Q4 MLX Version Qwen3.6-35B-A3B-UD-Q4_K_XL-mlx | OMLX local Qwen | No data | No data 0/0 benchmarks | - | No data | - | - | - |
No dataNo eligible rows are available for this chart.
Not charted
GPT-5.5: No dataQwen3.6 35B A3B Q4 MLX: No data
| Rank | Model | Provider | Long-Term Tasks | Status / coverage | Score rail | Latest run | Benchmark / harness | Pass rate | Latency / TPS | Cost |
|---|---|---|---|---|---|---|---|---|---|---|
| 01 | GPT-5.5 Version gpt-5.5 | OpenAI Codex GPT | No data | No data 0/0 benchmarks | - | No data | - | - | - | |
| 02 | Qwen3.6 35B A3B Q4 MLX Version Qwen3.6-35B-A3B-UD-Q4_K_XL-mlx | OMLX local Qwen | No data | No data 0/0 benchmarks | - | No data | - | - | - |
No dataNo eligible rows are available for this chart.
Not charted
GPT-5.5: No dataQwen3.6 35B A3B Q4 MLX: No data
| Rank | Model | Provider | Speed | Status / coverage | Score rail | Latest run | Benchmark / harness | Pass rate | Completion / TPS | Cost |
|---|---|---|---|---|---|---|---|---|---|---|
| 01 | GPT-5.5 Version gpt-5.5 | OpenAI Codex GPT | No data | No data 0/0 benchmarks | - | No data | - | - | - | |
| 02 | Qwen3.6 35B A3B Q4 MLX Version Qwen3.6-35B-A3B-UD-Q4_K_XL-mlx | OMLX local Qwen | No data | No data 0/0 benchmarks | - | No data | - | - | - |
No dataNo eligible rows are available for this chart.
Not charted
GPT-5.5: No dataQwen3.6 35B A3B Q4 MLX: No data