Back to dashboard

OMLX local

Qwen3.6 27B Q4 MLX

Local OpenAI-compatible OMLX provider; Q4_K_XL MLX quantization. This run returned endpoint model errors for all cases.

FamilyQwenVersionQwen3.6-27B-UD-Q4_K_XL-mlxContext-Latest runJul 4, 2026

Score Summary

Overall eligibility requires eligible data in all four categories.

Overall

IncompleteIncomplete

Incomplete for overall leaderboards

Agentic Tool Use

Eligible0.0

2/2 benchmarks covered

Agentic Coding

No dataNo data

0/0 benchmarks covered

Long-Term Tasks

No dataNo data

0/0 benchmarks covered

Speed

No dataNo data

0/0 benchmarks covered

Latest Results

2 eval runs

Hermes Tool Contract Hard v1Agentic Tool Use / Hermes Agent Evals hermes_tool_contract_hard_v1
EligibleJul 4, 2026
Raw score0.00
Normalized0.0
Pass rate0.0%
Latency6.59s
Completion / TPS329s / -
Cost$0.00

Hermes tool-contract hard v1: 50 cases. Primary score is tool/state pass rate. tool_state=0%, strict=0%, final_text=60%, schema_valid=100%, required_tool=0%, hallucinated_tool=0%, forbidden_tool=0%, recovery=n/a. avg_tool_calls=0.00, output_tps=n/a, input_tokens=null, output_tokens=null. non_text_failures: final_state_wrong=35, missing_required_tool=50, model_error=50. strict_failure_breakdown: final_answer_missing_expected_text=20, final_state_wrong=35, missing_required_tool=50, model_error=50. Endpoint returned HTTP 500 model errors for every case; retained as failed availability result.

Artifact/log: -
Hermes Tool ContractAgentic Tool Use / Hermes Agent Evals hermes_tool_contract_v0
EligibleJul 4, 2026
Raw score0.00
Normalized0.0
Pass rate0.0%
Latency6.59s
Completion / TPS659s / -
Cost$0.00

Hermes tool-contract v0: 100 cases. Primary score is tool/state pass rate. tool_state=0%, strict=0%, final_text=0%, schema_valid=100%, required_tool=16%, hallucinated_tool=0%, forbidden_tool=0%, recovery=n/a. avg_tool_calls=0.00, output_tps=n/a, input_tokens=null, output_tokens=null. non_text_failures: final_state_wrong=21, missing_required_tool=84, model_error=100. strict_failure_breakdown: final_answer_missing_expected_text=100, final_state_wrong=21, missing_required_tool=84, model_error=100. Endpoint returned HTTP 500 model errors for every case; retained as failed availability result.

Artifact/log: -

Historical Runs

2 eval runs

Hermes Tool Contract Hard v1Agentic Tool Use / Hermes Agent Evals hermes_tool_contract_hard_v1
EligibleJul 4, 2026
Raw score0.00
Normalized0.0
Pass rate0.0%
Latency6.59s
Completion / TPS329s / -
Cost$0.00

Hermes tool-contract hard v1: 50 cases. Primary score is tool/state pass rate. tool_state=0%, strict=0%, final_text=60%, schema_valid=100%, required_tool=0%, hallucinated_tool=0%, forbidden_tool=0%, recovery=n/a. avg_tool_calls=0.00, output_tps=n/a, input_tokens=null, output_tokens=null. non_text_failures: final_state_wrong=35, missing_required_tool=50, model_error=50. strict_failure_breakdown: final_answer_missing_expected_text=20, final_state_wrong=35, missing_required_tool=50, model_error=50. Endpoint returned HTTP 500 model errors for every case; retained as failed availability result.

Artifact/log: -
Hermes Tool ContractAgentic Tool Use / Hermes Agent Evals hermes_tool_contract_v0
EligibleJul 4, 2026
Raw score0.00
Normalized0.0
Pass rate0.0%
Latency6.59s
Completion / TPS659s / -
Cost$0.00

Hermes tool-contract v0: 100 cases. Primary score is tool/state pass rate. tool_state=0%, strict=0%, final_text=0%, schema_valid=100%, required_tool=16%, hallucinated_tool=0%, forbidden_tool=0%, recovery=n/a. avg_tool_calls=0.00, output_tps=n/a, input_tokens=null, output_tokens=null. non_text_failures: final_state_wrong=21, missing_required_tool=84, model_error=100. strict_failure_breakdown: final_answer_missing_expected_text=100, final_state_wrong=21, missing_required_tool=84, model_error=100. Endpoint returned HTTP 500 model errors for every case; retained as failed availability result.

Artifact/log: -