Skip to content

Custom Assertions

Custom assertions let you add evaluation logic that goes beyond built-in types. Define a TypeScript function, drop it in .agentv/assertions/, and reference it by name in your YAML eval files.

AgentV provides two SDK functions for custom evaluation logic:

FunctionBest ForDiscovery
defineAssertion()Pass/fail checks, reusable assertion typesConvention-based (.agentv/assertions/)
defineCodeGrader()Full scoring control with explicit assertions arrayReferenced via type: code-grader + command:

Use defineAssertion() when you want a named assertion type that can be referenced across eval files without specifying a command path. It uses a simplified result contract focused on pass and optional score.

Use defineCodeGrader() when you need full control over scoring with explicit assertions arrays, or when the evaluator is a one-off grader tied to a specific eval. See Code Graders for details.

Both functions handle stdin/stdout JSON parsing, snake_case-to-camelCase conversion, Zod validation, and error handling automatically.

Terminal window
npm install @agentv/eval

Place assertion files in .agentv/assertions/ anywhere in your project tree. AgentV walks up from the eval file’s directory to find the nearest .agentv/assertions/ folder.

The filename (without extension) becomes the assertion type name:

.agentv/assertions/word-count.ts --> type: word-count
.agentv/assertions/sentiment.ts --> type: sentiment
.agentv/assertions/has-citation.ts --> type: has-citation

Supported file extensions: .ts, .js, .mts, .mjs.

Custom assertion types cannot override built-in types (contains, equals, is_json, etc.). If a filename matches a built-in, it is silently skipped.

Reference the assertion by type name directly — no command: path needed:

assertions:
- type: word-count
- type: contains
value: "Hello"

The simplest pattern returns pass (boolean) and reasoning (string):

.agentv/assertions/word-count.ts
import { defineAssertion } from '@agentv/eval';
export default defineAssertion(({ outputText }) => {
const wordCount = outputText.trim().split(/\s+/).length;
return {
pass: wordCount >= 3,
reasoning: `Output has ${wordCount} words`,
};
});

When only pass is provided, the score defaults to 1 (pass) or 0 (fail).

Return a score (0 to 1) for granular evaluation instead of binary pass/fail:

.agentv/assertions/efficiency.ts
import { defineAssertion } from '@agentv/eval';
export default defineAssertion(({ outputText, trace }) => {
const hasContent = outputText.length > 0 ? 0.5 : 0;
const isEfficient = (trace?.eventCount ?? 0) <= 5 ? 0.5 : 0;
return {
score: hasContent + isEfficient,
assertions: [
{ text: 'Has content', passed: hasContent > 0 },
{ text: 'Efficient', passed: isEfficient > 0 },
],
};
});

If pass is omitted but score is provided, pass is derived as score >= 0.5. Scores are clamped to the [0, 1] range.

The handler must return an AssertionScore object:

FieldTypeDescription
passbooleanExplicit pass/fail. If omitted, derived from score (>= 0.5 = pass).
scorenumberNumeric score between 0 and 1. Defaults to 1 if pass=true, 0 if pass=false.
assertionsArray<{ text: string, passed: boolean, evidence?: string }>Per-aspect results. Each entry describes one check with its verdict and optional evidence.
reasoningstringHuman-readable explanation.
detailsRecord<string, unknown>Optional structured data for domain-specific metrics.

The handler receives an AssertionContext with the same fields as a code grader:

FieldTypeDescription
inputTextstringFirst user message content as string
outputTextstringLast assistant message content as string
expectedOutputTextstringExpected output content as string
criteriastringEvaluation criteria from the test case
traceTraceSummaryExecution metrics (tool calls, tokens, duration, cost)
inputMessage[]Full resolved input messages
expectedOutputMessage[]Expected output messages
outputMessage[]Actual agent output messages

Test assertions locally by piping JSON to stdin:

Terminal window
echo '{"input_text":"Say hello","criteria":"Multi-word greeting","output_text":"Hello there, nice to meet you!","expected_output_text":""}' \
| bun run .agentv/assertions/word-count.ts

Expected output:

{
"score": 1,
"assertions": [],
"reasoning": "Output has 6 words (>= 3 required)"
}

For test-driven development, write Vitest tests against your assertion logic directly:

.agentv/assertions/__tests__/word-count.test.ts
import { expect, test } from 'vitest';
// Extract the core logic into a testable function
function checkWordCount(answer: string) {
const wordCount = answer.trim().split(/\s+/).length;
const minWords = 3;
const pass = wordCount >= minWords;
return { pass, wordCount };
}
test('passes with enough words', () => {
const result = checkWordCount('Hello there friend');
expect(result.pass).toBe(true);
});
test('fails with too few words', () => {
const result = checkWordCount('Hi');
expect(result.pass).toBe(false);
});

This example shows the complete flow from assertion definition to YAML eval file.

my-project/
.agentv/
assertions/
word-count.ts
evals/
dataset.eval.yaml
package.json
.agentv/assertions/word-count.ts
#!/usr/bin/env bun
import { defineAssertion } from '@agentv/eval';
export default defineAssertion(({ outputText }) => {
const wordCount = outputText.trim().split(/\s+/).length;
const minWords = 3;
const pass = wordCount >= minWords;
return {
pass,
score: pass ? 1.0 : Math.min(wordCount / minWords, 0.9),
reasoning: pass
? `Output has ${wordCount} words (>= ${minWords} required)`
: `Output has only ${wordCount} words (need >= ${minWords})`,
};
});
evals/dataset.eval.yaml
name: custom-assertion-demo
description: Demonstrates custom assertions with convention discovery
execution:
target: default
tests:
- id: greeting-response
criteria: Agent gives a multi-word greeting
input: "Say hello and introduce yourself"
expected_output: "Hello! I'm an AI assistant here to help you."
assertions:
- type: contains
value: "Hello"
- type: word-count
- id: short-answer
criteria: Agent gives a short but valid response
input: "What is 2+2?"
expected_output: "The answer is 4."
assertions:
- type: contains
value: "4"
- type: word-count
Terminal window
npm install @agentv/eval
agentv eval evals/dataset.eval.yaml

Each test produces scores from both the built-in contains assertion and your custom word-count assertion. Results appear in the output JSONL with each evaluator’s score in the scores[] array.