Question 1

What are AI evals?

Accepted Answer

Evals are how Idam AI measures the quality of an agent output. Each output is scored against a rubric of criteria you care about, like completeness, clarity, and whether it followed your brief. Instead of guessing whether the work is good, you get a score and the reasoning behind it.

Question 2

How do evals score an output?

Accepted Answer

You define a rubric of criteria, each with what good looks like. On every run, the output is checked against each criterion and given a score with a short explanation. The scores roll up into an overall result, so you can see at a glance whether an output cleared your bar.

Question 3

How is this different from guardrails?

Accepted Answer

Guardrails are preventive. They stop output that breaks your rules before it ships. Evals are measurement. They score how good an output is against a rubric. Guardrails keep agents inside the lines; evals tell you how well they did within them.

Question 4

Do I have to write the rubric myself?

Accepted Answer

No. Each agent ships with a sensible default rubric for its job, so evals work out of the box. You can adjust the criteria, change the weights, or add your own when you want the bar set to your standards.

Question 5

Can evals catch quality regressions over time?

Accepted Answer

Yes. Scores are tracked across runs, so you can see quality trend over time and get flagged when it drops. If a change makes outputs worse, the eval catches it instead of a stakeholder catching it for you.

Question 6

Is there a free trial?

Accepted Answer

Yes. Idam AI offers a 14-day free trial. A credit card is required to start, and you can cancel anytime before the trial ends. Evals run on your outputs during the trial.

Know it's good, not just done

"Done" tells you nothing about good

You re-read every word anyway

Good is a gut call

Quality slips without warning

A rubric that turns taste into a score

Your standard, applied consistently

Four steps, running in the background

Define the rubric

Score every run

Roll up to a result

Track over time

Catch the dip before a stakeholder does

Questions about evals

Stop guessing. Start scoring

What pairs with evals

Guardrails

Sub-agents

Memory

AI PRD Writer