Platform · Evals

Know it's good, not just done

An agent always returns something. The hard part is knowing whether it is any good. Evals score every output against a rubric you define, so you get a number and a reason, not a gut feeling. Trust the work without re-reading every line.

Scored
against your rubric
Tracked
quality over time
Explained
a reason, not a verdict
PRD draft · scorecard Cleared the bar
82/100
Overall score, weighted across your rubric.
Completeness9/10
Clarity8/10
Followed the brief9/10
Edge cases covered6/10
One to fix: edge cases are thin. The empty state is not specified.
The Problem

"Done" tells you nothing about good

An agent will always hand you a finished-looking output. Whether it meets the bar is the question it cannot answer about itself, so the judgment falls back on you, every single time.

You re-read every word anyway

The agent saved you the writing, but you still proofread the whole thing to make sure it holds up. The review is now the slow part.

Good is a gut call

Was that PRD solid, or did it just look polished? Without a standard to check against, quality comes down to how you felt reading it.

Quality slips without warning

A prompt tweak or a model change makes outputs worse without anyone noticing, and you find out when a stakeholder points at the weak one.

What Evals Do

A rubric that turns taste into a score

You define what good means in criteria and weights. Evals apply that same standard to every output, so quality stops depending on whether you were paying close attention that day. For stopping bad output before it ships, see guardrails.

PRD rubric yours to edit
Completeness30%

Every required section is present

Clarity20%

A new reader follows it without help

Followed the brief30%

Matches the goal you set

Edge cases20%

Empty, error, and abuse states named

Your standard, applied consistently

The rubric is yours. Start from the default each agent provides, then tune it until the score matches what you would have said in a review. After that, every output gets the same fair read.

  • Score against the criteria you care about
  • Weight what matters more for this kind of work
  • Each agent ships with a sensible default rubric
  • Every score comes with a short reason you can read
How It Works

Four steps, running in the background

You set the rubric once. After that, scoring happens on every output without you asking, and the history builds itself.

01

Define the rubric

Pick the criteria that matter and what good looks like for each. Start from the agent default and adjust.

02

Score every run

On each output, every criterion gets a score and a short reason, so the result is explainable, not a black box.

03

Roll up to a result

Scores combine by your weights into one overall number, with a clear pass or needs-work against your bar.

04

Track over time

Every score is kept, so you can watch quality trend across runs and get flagged the moment it drops.

Quality Over Time

Catch the dip before a stakeholder does

Because every run is scored and kept, a drop in quality shows up as a dip on the chart, not as a surprise in a review. When a change makes outputs worse, the eval flags it while you can still fix it.

PRD agent · score per run regression caught
flagged for review12345678910
Frequently Asked Questions

Questions about evals

Evals are how Idam AI measures the quality of an agent output. Each output is scored against a rubric of criteria you care about, like completeness, clarity, and whether it followed your brief. Instead of guessing whether the work is good, you get a score and the reasoning behind it.
Get Started

Stop guessing. Start scoring

Put a number on every output and trust the work without re-reading it. 14 days free, cancel anytime.

Explore The Platform

What pairs with evals

Platform

Guardrails

Evals score how good an output is. Guardrails stop output that breaks your rules. Use both to ship work that is safe and strong.

Learn more →
Platform

Sub-agents

A multi-perspective review and an eval score answer different questions: what to fix, and how close to the bar you are.

Learn more →
Platform

Memory

When an eval flags a recurring miss, memory helps the agent avoid it next time, so scores climb instead of plateau.

Learn more →
Agent

AI PRD Writer

Every PRD it drafts gets scored against a rubric, so you know the draft is strong before you spend time reviewing it.

Learn more →