Back to dashboard

OMLX local

Qwen3.6 35B A3B Q8 MLX

Local OpenAI-compatible OMLX provider; Q8_K_XL MLX quantization.

FamilyQwenVersionBrooooooklyn-Qwen3.6-35B-A3B-UD-Q8_K_XL-mlxContext-Latest runJul 4, 2026

Score Summary

Overall eligibility requires eligible data in all four categories.

Overall

IncompleteIncomplete

Incomplete for overall leaderboards

Agentic Tool Use

Eligible83.4

2/2 benchmarks covered

Agentic Coding

No dataNo data

0/0 benchmarks covered

Long-Term Tasks

No dataNo data

0/0 benchmarks covered

Speed

No dataNo data

0/0 benchmarks covered

Latest Results

2 eval runs

Hermes Tool Contract Hard v1Agentic Tool Use / Hermes Agent Evals hermes_tool_contract_hard_v1
EligibleJul 4, 2026
Raw score82.0
Normalized82.0
Pass rate82.0%
Latency16.2s
Completion / TPS811s / 25.4 tok/s
Cost$0.00

Hermes tool-contract hard v1: 50 cases. Primary score is tool/state pass rate. tool_state=82%, strict=82%, final_text=94%, schema_valid=100%, required_tool=98%, hallucinated_tool=0%, forbidden_tool=0%, recovery=90.0%. avg_tool_calls=3.38, output_tps=25.4, input_tokens=126116, output_tokens=20599. non_text_failures: max_tool_calls=4, missing_required_tool=1, too_many_tool_calls=8. strict_failure_breakdown: final_answer_missing_expected_text=3, max_tool_calls=4, missing_required_tool=1, too_many_tool_calls=8. Local OMLX Q8 run.

Artifact/log: -
Hermes Tool ContractAgentic Tool Use / Hermes Agent Evals hermes_tool_contract_v0
EligibleJul 4, 2026
Raw score86.0
Normalized86.0
Pass rate86.0%
Latency9.82s
Completion / TPS982s / 28.8 tok/s
Cost$0.00

Hermes tool-contract v0: 100 cases. Primary score is tool/state pass rate. tool_state=86%, strict=82%, final_text=85%, schema_valid=100%, required_tool=95%, hallucinated_tool=0%, forbidden_tool=1%, recovery=50.0%. avg_tool_calls=1.46, output_tps=28.8, input_tokens=107163, output_tokens=28249. non_text_failures: final_state_wrong=3, forbidden_tool=1, max_tool_calls=10, missing_required_tool=5. strict_failure_breakdown: final_answer_missing_expected_text=15, final_state_wrong=3, forbidden_tool=1, max_tool_calls=10, missing_required_tool=5. Local OMLX Q8 run.

Artifact/log: -

Historical Runs

2 eval runs

Hermes Tool Contract Hard v1Agentic Tool Use / Hermes Agent Evals hermes_tool_contract_hard_v1
EligibleJul 4, 2026
Raw score82.0
Normalized82.0
Pass rate82.0%
Latency16.2s
Completion / TPS811s / 25.4 tok/s
Cost$0.00

Hermes tool-contract hard v1: 50 cases. Primary score is tool/state pass rate. tool_state=82%, strict=82%, final_text=94%, schema_valid=100%, required_tool=98%, hallucinated_tool=0%, forbidden_tool=0%, recovery=90.0%. avg_tool_calls=3.38, output_tps=25.4, input_tokens=126116, output_tokens=20599. non_text_failures: max_tool_calls=4, missing_required_tool=1, too_many_tool_calls=8. strict_failure_breakdown: final_answer_missing_expected_text=3, max_tool_calls=4, missing_required_tool=1, too_many_tool_calls=8. Local OMLX Q8 run.

Artifact/log: -
Hermes Tool ContractAgentic Tool Use / Hermes Agent Evals hermes_tool_contract_v0
EligibleJul 4, 2026
Raw score86.0
Normalized86.0
Pass rate86.0%
Latency9.82s
Completion / TPS982s / 28.8 tok/s
Cost$0.00

Hermes tool-contract v0: 100 cases. Primary score is tool/state pass rate. tool_state=86%, strict=82%, final_text=85%, schema_valid=100%, required_tool=95%, hallucinated_tool=0%, forbidden_tool=1%, recovery=50.0%. avg_tool_calls=1.46, output_tps=28.8, input_tokens=107163, output_tokens=28249. non_text_failures: final_state_wrong=3, forbidden_tool=1, max_tool_calls=10, missing_required_tool=5. strict_failure_breakdown: final_answer_missing_expected_text=15, final_state_wrong=3, forbidden_tool=1, max_tool_calls=10, missing_required_tool=5. Local OMLX Q8 run.

Artifact/log: -