Back to dashboard

OMLX local

Ornith 1.0 35B 5bit MLX

Local OpenAI-compatible OMLX provider; 5-bit MLX quantization.

FamilyOrnithVersionOrnith-1.0-35B-5bit-mlxContext-Latest runJul 4, 2026

Score Summary

Overall eligibility requires eligible data in all four categories.

Overall

IncompleteIncomplete

Incomplete for overall leaderboards

Agentic Tool Use

Eligible74.8

2/2 benchmarks covered

Agentic Coding

No dataNo data

0/0 benchmarks covered

Long-Term Tasks

No dataNo data

0/0 benchmarks covered

Speed

No dataNo data

0/0 benchmarks covered

Latest Results

2 eval runs

Hermes Tool Contract Hard v1Agentic Tool Use / Hermes Agent Evals hermes_tool_contract_hard_v1
EligibleJul 4, 2026
Raw score72.0
Normalized72.0
Pass rate72.0%
Latency14.7s
Completion / TPS736s / 21.5 tok/s
Cost$0.00

Hermes tool-contract hard v1: 50 cases. Primary score is tool/state pass rate. tool_state=72%, strict=72%, final_text=94%, schema_valid=100%, required_tool=96%, hallucinated_tool=0%, forbidden_tool=0%, recovery=100.0%. avg_tool_calls=3.42, output_tps=21.5, input_tokens=127846, output_tokens=15856. non_text_failures: final_state_wrong=1, max_tool_calls=3, missing_required_tool=2, too_many_tool_calls=12. strict_failure_breakdown: final_answer_missing_expected_text=3, final_state_wrong=1, max_tool_calls=3, missing_required_tool=2, too_many_tool_calls=12. Local OMLX Ornith 5-bit run.

Artifact/log: -
Hermes Tool ContractAgentic Tool Use / Hermes Agent Evals hermes_tool_contract_v0
EligibleJul 4, 2026
Raw score80.0
Normalized80.0
Pass rate80.0%
Latency8.11s
Completion / TPS811s / 24.0 tok/s
Cost$0.00

Hermes tool-contract v0: 100 cases. Primary score is tool/state pass rate. tool_state=80%, strict=76%, final_text=84%, schema_valid=100%, required_tool=94%, hallucinated_tool=2%, forbidden_tool=2%, recovery=83.3%. avg_tool_calls=1.58, output_tps=24.0, input_tokens=108310, output_tokens=19479. non_text_failures: final_state_wrong=5, forbidden_tool=2, hallucinated_tool=2, max_tool_calls=12, missing_required_tool=6. strict_failure_breakdown: final_answer_missing_expected_text=16, final_state_wrong=5, forbidden_tool=2, hallucinated_tool=2, max_tool_calls=12, missing_required_tool=6. Local OMLX Ornith 5-bit run.

Artifact/log: -

Historical Runs

2 eval runs

Hermes Tool Contract Hard v1Agentic Tool Use / Hermes Agent Evals hermes_tool_contract_hard_v1
EligibleJul 4, 2026
Raw score72.0
Normalized72.0
Pass rate72.0%
Latency14.7s
Completion / TPS736s / 21.5 tok/s
Cost$0.00

Hermes tool-contract hard v1: 50 cases. Primary score is tool/state pass rate. tool_state=72%, strict=72%, final_text=94%, schema_valid=100%, required_tool=96%, hallucinated_tool=0%, forbidden_tool=0%, recovery=100.0%. avg_tool_calls=3.42, output_tps=21.5, input_tokens=127846, output_tokens=15856. non_text_failures: final_state_wrong=1, max_tool_calls=3, missing_required_tool=2, too_many_tool_calls=12. strict_failure_breakdown: final_answer_missing_expected_text=3, final_state_wrong=1, max_tool_calls=3, missing_required_tool=2, too_many_tool_calls=12. Local OMLX Ornith 5-bit run.

Artifact/log: -
Hermes Tool ContractAgentic Tool Use / Hermes Agent Evals hermes_tool_contract_v0
EligibleJul 4, 2026
Raw score80.0
Normalized80.0
Pass rate80.0%
Latency8.11s
Completion / TPS811s / 24.0 tok/s
Cost$0.00

Hermes tool-contract v0: 100 cases. Primary score is tool/state pass rate. tool_state=80%, strict=76%, final_text=84%, schema_valid=100%, required_tool=94%, hallucinated_tool=2%, forbidden_tool=2%, recovery=83.3%. avg_tool_calls=1.58, output_tps=24.0, input_tokens=108310, output_tokens=19479. non_text_failures: final_state_wrong=5, forbidden_tool=2, hallucinated_tool=2, max_tool_calls=12, missing_required_tool=6. strict_failure_breakdown: final_answer_missing_expected_text=16, final_state_wrong=5, forbidden_tool=2, hallucinated_tool=2, max_tool_calls=12, missing_required_tool=6. Local OMLX Ornith 5-bit run.

Artifact/log: -