Qwen3.6 35B A3B Q4 MLX | Benchmaxxing model results

Score Summary

Overall eligibility requires eligible data in all four categories.

Overall

IncompleteIncomplete

Incomplete for overall leaderboards

Agentic Tool Use

Eligible90.3

2/2 benchmarks covered

Agentic Coding

No dataNo data

0/0 benchmarks covered

Long-Term Tasks

No dataNo data

0/0 benchmarks covered

Speed

No dataNo data

0/0 benchmarks covered

Latest Results

2 eval runs

Hermes Tool ContractAgentic Tool Use / Hermes Agent Evals hermes_tool_contract_v0

EligibleJul 4, 2026

Raw score91.0

Normalized91.0

Pass rate91.0%

Latency9.65s

Completion / TPS965s / 29.9 tok/s

Cost$0.00

Hermes tool-contract v0: 100 cases. Primary score is tool/state pass rate excluding exact final-text phrasing. tool_state=91%, strict=86%, final_text=89%, schema_valid=100%, required_tool=97%, hallucinated_tool=0%, forbidden_tool=1%, recovery=66.7%. output_tps=29.9, input_tokens=107708, output_tokens=28803. non_text_failures: final_state_wrong=4, forbidden_tool=1, max_tool_calls=5, missing_required_tool=3. strict_failure_breakdown: final_answer_missing_expected_text=11, final_state_wrong=4, forbidden_tool=1, max_tool_calls=5, missing_required_tool=3. Local OMLX run.

Artifact/log: -

Hermes Tool Contract Hard v1Agentic Tool Use / Hermes Agent Evals hermes_tool_contract_hard_v1

EligibleJul 4, 2026

Raw score90.0

Normalized90.0

Pass rate90.0%

Latency16.1s

Completion / TPS804s / 26.2 tok/s

Cost$0.00

Hermes tool-contract hard v1: 50 cases. Primary score is tool/state pass rate. tool_state=90%, strict=90%, final_text=94%, schema_valid=100%, required_tool=100%, hallucinated_tool=0%, forbidden_tool=0%, recovery=90.0%. avg_tool_calls=3.32, output_tps=26.2, input_tokens=126414, output_tokens=21073. non_text_failures: max_tool_calls=4, too_many_tool_calls=5. strict_failure_breakdown: final_answer_missing_expected_text=3, max_tool_calls=4, too_many_tool_calls=5. Local OMLX run.

Artifact/log: -

Historical Runs

2 eval runs

Hermes Tool ContractAgentic Tool Use / Hermes Agent Evals hermes_tool_contract_v0

EligibleJul 4, 2026

Raw score91.0

Normalized91.0

Pass rate91.0%

Latency9.65s

Completion / TPS965s / 29.9 tok/s

Cost$0.00

Hermes tool-contract v0: 100 cases. Primary score is tool/state pass rate excluding exact final-text phrasing. tool_state=91%, strict=86%, final_text=89%, schema_valid=100%, required_tool=97%, hallucinated_tool=0%, forbidden_tool=1%, recovery=66.7%. output_tps=29.9, input_tokens=107708, output_tokens=28803. non_text_failures: final_state_wrong=4, forbidden_tool=1, max_tool_calls=5, missing_required_tool=3. strict_failure_breakdown: final_answer_missing_expected_text=11, final_state_wrong=4, forbidden_tool=1, max_tool_calls=5, missing_required_tool=3. Local OMLX run.

Artifact/log: -

Hermes Tool Contract Hard v1Agentic Tool Use / Hermes Agent Evals hermes_tool_contract_hard_v1

EligibleJul 4, 2026

Raw score90.0

Normalized90.0

Pass rate90.0%

Latency16.1s

Completion / TPS804s / 26.2 tok/s

Cost$0.00

Hermes tool-contract hard v1: 50 cases. Primary score is tool/state pass rate. tool_state=90%, strict=90%, final_text=94%, schema_valid=100%, required_tool=100%, hallucinated_tool=0%, forbidden_tool=0%, recovery=90.0%. avg_tool_calls=3.32, output_tps=26.2, input_tokens=126414, output_tokens=21073. non_text_failures: max_tool_calls=4, too_many_tool_calls=5. strict_failure_breakdown: final_answer_missing_expected_text=3, max_tool_calls=4, too_many_tool_calls=5. Local OMLX run.

Artifact/log: -