Qwen3.6 27B OptiQ 4bit MLX | Benchmaxxing model results

Score Summary

Overall eligibility requires eligible data in all four categories.

Overall

IncompleteIncomplete

Incomplete for overall leaderboards

Agentic Tool Use

Eligible80.5

2/2 benchmarks covered

Agentic Coding

No dataNo data

0/0 benchmarks covered

Long-Term Tasks

No dataNo data

0/0 benchmarks covered

Speed

No dataNo data

0/0 benchmarks covered

Latest Results

2 eval runs

Hermes Tool Contract Hard v1Agentic Tool Use / Hermes Agent Evals hermes_tool_contract_hard_v1

EligibleJul 5, 2026

Raw score76.0

Normalized76.0

Pass rate76.0%

Latency71.3s

Completion / TPS3563s / 5.7 tok/s

Cost$0.00

Hermes tool-contract hard v1: 50 cases. Primary score is tool/state pass rate. tool_state=76.0%, strict=76.0%, final_text=92.0%, schema_valid=100.0%, required_tool=92.0%, hallucinated_tool=0%, forbidden_tool=0%, recovery=100.0%. avg_tool_calls=3.44, avg_turns=3.96, output_tps=5.7, input_tokens=135960, output_tokens=20353. non_text_failures: final_state_wrong=4, max_tool_calls=4, missing_required_tool=4, too_many_tool_calls=12. strict_failure_breakdown: final_answer_missing_expected_text=4, final_state_wrong=4, max_tool_calls=4, missing_required_tool=4, too_many_tool_calls=12. Rerun after replacing broken Qwen3.6-27B-UD-Q4_K_XL-mlx artifact with Qwen3.6-27B-OptiQ-4bit.

Artifact/log: -

Hermes Tool ContractAgentic Tool Use / Hermes Agent Evals hermes_tool_contract_v0

EligibleJul 5, 2026

Raw score89.0

Normalized89.0

Pass rate89.0%

Latency35.6s

Completion / TPS3561s / 7.1 tok/s

Cost$0.00

Hermes tool-contract v0: 100 cases. Primary score is tool/state pass rate. tool_state=89.0%, strict=80.0%, final_text=83.0%, schema_valid=100.0%, required_tool=96.0%, hallucinated_tool=1.0%, forbidden_tool=0%, recovery=83.33%. avg_tool_calls=1.45, avg_turns=2.31, output_tps=7.1, input_tokens=109071, output_tokens=25429. non_text_failures: final_state_wrong=3, hallucinated_tool=1, max_tool_calls=8, missing_required_tool=4. strict_failure_breakdown: final_answer_missing_expected_text=17, final_state_wrong=3, hallucinated_tool=1, max_tool_calls=8, missing_required_tool=4. Rerun after replacing broken Qwen3.6-27B-UD-Q4_K_XL-mlx artifact with Qwen3.6-27B-OptiQ-4bit.

Artifact/log: -

Historical Runs

2 eval runs

Hermes Tool Contract Hard v1Agentic Tool Use / Hermes Agent Evals hermes_tool_contract_hard_v1

EligibleJul 5, 2026

Raw score76.0

Normalized76.0

Pass rate76.0%

Latency71.3s

Completion / TPS3563s / 5.7 tok/s

Cost$0.00

Hermes tool-contract hard v1: 50 cases. Primary score is tool/state pass rate. tool_state=76.0%, strict=76.0%, final_text=92.0%, schema_valid=100.0%, required_tool=92.0%, hallucinated_tool=0%, forbidden_tool=0%, recovery=100.0%. avg_tool_calls=3.44, avg_turns=3.96, output_tps=5.7, input_tokens=135960, output_tokens=20353. non_text_failures: final_state_wrong=4, max_tool_calls=4, missing_required_tool=4, too_many_tool_calls=12. strict_failure_breakdown: final_answer_missing_expected_text=4, final_state_wrong=4, max_tool_calls=4, missing_required_tool=4, too_many_tool_calls=12. Rerun after replacing broken Qwen3.6-27B-UD-Q4_K_XL-mlx artifact with Qwen3.6-27B-OptiQ-4bit.

Artifact/log: -

Hermes Tool ContractAgentic Tool Use / Hermes Agent Evals hermes_tool_contract_v0

EligibleJul 5, 2026

Raw score89.0

Normalized89.0

Pass rate89.0%

Latency35.6s

Completion / TPS3561s / 7.1 tok/s

Cost$0.00

Hermes tool-contract v0: 100 cases. Primary score is tool/state pass rate. tool_state=89.0%, strict=80.0%, final_text=83.0%, schema_valid=100.0%, required_tool=96.0%, hallucinated_tool=1.0%, forbidden_tool=0%, recovery=83.33%. avg_tool_calls=1.45, avg_turns=2.31, output_tps=7.1, input_tokens=109071, output_tokens=25429. non_text_failures: final_state_wrong=3, hallucinated_tool=1, max_tool_calls=8, missing_required_tool=4. strict_failure_breakdown: final_answer_missing_expected_text=17, final_state_wrong=3, hallucinated_tool=1, max_tool_calls=8, missing_required_tool=4. Rerun after replacing broken Qwen3.6-27B-UD-Q4_K_XL-mlx artifact with Qwen3.6-27B-OptiQ-4bit.

Artifact/log: -