Back to dashboard

OMLX local

Qwen3.6 35B A3B Q4 MLX

Local OpenAI-compatible OMLX provider; Q4_K_XL MLX quantization. First real Hermes tool-contract run.

FamilyQwenVersionQwen3.6-35B-A3B-UD-Q4_K_XL-mlxContext-Latest runJul 4, 2026

Score Summary

Overall eligibility requires eligible data in all four categories.

Overall

IncompleteIncomplete

Incomplete for overall leaderboards

Agentic Tool Use

Eligible90.3

2/2 benchmarks covered

Agentic Coding

No dataNo data

0/0 benchmarks covered

Long-Term Tasks

No dataNo data

0/0 benchmarks covered

Speed

No dataNo data

0/0 benchmarks covered

Latest Results

2 eval runs

Hermes Tool ContractAgentic Tool Use / Hermes Agent Evals hermes_tool_contract_v0
EligibleJul 4, 2026
Raw score91.0
Normalized91.0
Pass rate91.0%
Latency9.65s
Completion / TPS965s / 29.9 tok/s
Cost$0.00

Hermes tool-contract v0: 100 cases. Primary score is tool/state pass rate excluding exact final-text phrasing. tool_state=91%, strict=86%, final_text=89%, schema_valid=100%, required_tool=97%, hallucinated_tool=0%, forbidden_tool=1%, recovery=66.7%. output_tps=29.9, input_tokens=107708, output_tokens=28803. non_text_failures: final_state_wrong=4, forbidden_tool=1, max_tool_calls=5, missing_required_tool=3. strict_failure_breakdown: final_answer_missing_expected_text=11, final_state_wrong=4, forbidden_tool=1, max_tool_calls=5, missing_required_tool=3. Local OMLX run.

Artifact/log: -
Hermes Tool Contract Hard v1Agentic Tool Use / Hermes Agent Evals hermes_tool_contract_hard_v1
EligibleJul 4, 2026
Raw score90.0
Normalized90.0
Pass rate90.0%
Latency16.1s
Completion / TPS804s / 26.2 tok/s
Cost$0.00

Hermes tool-contract hard v1: 50 cases. Primary score is tool/state pass rate. tool_state=90%, strict=90%, final_text=94%, schema_valid=100%, required_tool=100%, hallucinated_tool=0%, forbidden_tool=0%, recovery=90.0%. avg_tool_calls=3.32, output_tps=26.2, input_tokens=126414, output_tokens=21073. non_text_failures: max_tool_calls=4, too_many_tool_calls=5. strict_failure_breakdown: final_answer_missing_expected_text=3, max_tool_calls=4, too_many_tool_calls=5. Local OMLX run.

Artifact/log: -

Historical Runs

2 eval runs

Hermes Tool ContractAgentic Tool Use / Hermes Agent Evals hermes_tool_contract_v0
EligibleJul 4, 2026
Raw score91.0
Normalized91.0
Pass rate91.0%
Latency9.65s
Completion / TPS965s / 29.9 tok/s
Cost$0.00

Hermes tool-contract v0: 100 cases. Primary score is tool/state pass rate excluding exact final-text phrasing. tool_state=91%, strict=86%, final_text=89%, schema_valid=100%, required_tool=97%, hallucinated_tool=0%, forbidden_tool=1%, recovery=66.7%. output_tps=29.9, input_tokens=107708, output_tokens=28803. non_text_failures: final_state_wrong=4, forbidden_tool=1, max_tool_calls=5, missing_required_tool=3. strict_failure_breakdown: final_answer_missing_expected_text=11, final_state_wrong=4, forbidden_tool=1, max_tool_calls=5, missing_required_tool=3. Local OMLX run.

Artifact/log: -
Hermes Tool Contract Hard v1Agentic Tool Use / Hermes Agent Evals hermes_tool_contract_hard_v1
EligibleJul 4, 2026
Raw score90.0
Normalized90.0
Pass rate90.0%
Latency16.1s
Completion / TPS804s / 26.2 tok/s
Cost$0.00

Hermes tool-contract hard v1: 50 cases. Primary score is tool/state pass rate. tool_state=90%, strict=90%, final_text=94%, schema_valid=100%, required_tool=100%, hallucinated_tool=0%, forbidden_tool=0%, recovery=90.0%. avg_tool_calls=3.32, output_tps=26.2, input_tokens=126414, output_tokens=21073. non_text_failures: max_tool_calls=4, too_many_tool_calls=5. strict_failure_breakdown: final_answer_missing_expected_text=3, max_tool_calls=4, too_many_tool_calls=5. Local OMLX run.

Artifact/log: -