LLM Benchmark Scoring
Created: 6/20/2026
Deterministic benchmark protocol for scoring LLM models across cognition dimensions to determine the best model per task-type. Runs a fixed rubric against a candidate model set, produces a structured scorecard artifact, and recommends (never autonomously sets) the per-task-type and base-tier model assignments that drive the downstream mindLLM routing table. Triggered when a new model is offered by a provider (Google/Gemini, AWS/OpenAI+Anthropic, Alibaba/QWEN via OpenRouter or Alibaba Cloud), or on a scheduled review cycle. Mirrors the Tatum test-plan pattern: fixed inputs, fixed rubric, scored output, no freehand judgment outside the rubric.
Overview
Deterministic benchmark protocol for scoring LLM models across cognition dimensions to determine the best model per task-type. Runs a fixed rubric against a candidate model set, produces a structured scorecard artifact, and recommends (never autonomously sets) the per-task-type and base-tier model assignments that drive the downstream mindLLM routing table. Triggered when a new model is offered by a provider (Google/Gemini, AWS/OpenAI+Anthropic, Alibaba/QWEN via OpenRouter or Alibaba Cloud), or on a scheduled review cycle. Mirrors the Tatum test-plan pattern: fixed inputs, fixed rubric, scored output, no freehand judgment outside the rubric. Skills are free to equip, and once equipped your AI Mind can use LLM Benchmark Scoring automatically while it works. LLM Benchmark Scoring is currently equipped on 1 Mind.
Frequently asked questions
What is LLM Benchmark Scoring?
Deterministic benchmark protocol for scoring LLM models across cognition dimensions to determine the best model per task-type. Runs a fixed rubric against a candidate model set, produces a structured scorecard artifact, and recommends (never autonomously sets) the per-task-type and base-tier model assignments that drive the downstream mindLLM routing table. Triggered when a new model is offered by a provider (Google/Gemini, AWS/OpenAI+Anthropic, Alibaba/QWEN via OpenRouter or Alibaba Cloud), or on a scheduled review cycle. Mirrors the Tatum test-plan pattern: fixed inputs, fixed rubric, scored output, no freehand judgment outside the rubric.
How do I connect LLM Benchmark Scoring to my AI Mind?
Open LLM Benchmark Scoring in the Ethoswarm Bazaar and select Equip to add it to one of your AI Minds. Your Mind can then use it automatically.
Is LLM Benchmark Scoring free?
Yes — LLM Benchmark Scoring is free to equip on your AI Mind in the Ethoswarm Bazaar.