How to Build an Audit Trail for AI Agents (with Code)

An audit trail for an AI agent is not just a log file. It is the evidence chain that answers the question every regulator, auditor, and angry customer will eventually ask: exactly what did your AI do, when, on what inputs, and with what authority?

This guide walks through the architecture of a production-grade AI audit trail, the data structures you need, and the code to implement it. We will use AgentGate as the audit backend, but the design principles apply regardless of your stack.

Why AI Audit Trails Are Different from Application Logs

Application logs capture system events: errors, latencies, request paths. They are designed for debugging. An AI audit trail must capture decision-level evidence: what the agent was trying to accomplish, what data it processed, what it decided, and why. These are different things.

Three properties separate an audit trail from a log file:

Tamper evidence — You must be able to prove that a record has not been altered after the fact. This requires hash chaining or a write-once store.
Completeness — Missing events are as dangerous as falsified events. Your audit trail must capture every decision, including low-confidence ones that were routed to human review.
Retention with integrity — Regulatory regimes specify retention periods (GDPR: as long as necessary; SOX: 7 years; PCI-DSS: 1 year minimum for transaction logs). You must store records without allowing modification during the retention window.

The Audit Event Schema

Every audit event should contain the following fields:

interface AuditEvent {
  // Identity
  event_id: string;          // UUID v4, globally unique
  agent_id: string;          // Which agent produced this event
  model_version: string;     // Exact model version/hash
  
  // Timing
  timestamp: string;         // ISO 8601 with milliseconds
  session_id: string;        // Groups events for one user session
  
  // Data references (never store raw PII in audit logs)
  subject_id: string;        // Hashed or pseudonymous user identifier
  input_hash: string;        // SHA-256 of the input payload
  
  // Decision
  decision_type: string;     // e.g. 'credit_approval', 'fraud_flag'
  output: Record;  // The agent's decision/output
  confidence: number;        // 0-1 confidence score
  reason_codes: string[];    // Human-readable decision factors
  
  // Governance
  policy_version: string;    // Which policy governed this decision
  human_review_required: boolean;
  human_review_outcome?: string;
  
  // Chain integrity
  previous_event_hash: string;  // SHA-256 of previous event
  event_hash: string;           // SHA-256 of this event's content
}

The input_hash field stores a SHA-256 hash of the input rather than the input itself. This preserves your ability to verify what was processed without storing personal data in your audit log — a GDPR-friendly pattern. If you need to produce the original input for a specific investigation, it should come from your primary data store under appropriate access controls.

Implementing SHA-256 Hash Chaining

Hash chaining works by including the hash of the previous event in each new event. If any event is altered, every subsequent hash becomes invalid — making tampering detectable.

import { createHash } from 'crypto';

function hashEvent(event: Omit): string {
  return createHash('sha256')
    .update(JSON.stringify(event, Object.keys(event).sort()))
    .digest('hex');
}

async function recordAuditEvent(
  params: Omit
): Promise {
  // Fetch the hash of the last event in the chain
  const lastHash = await getLastEventHash(params.agent_id);
  
  const eventWithChain = {
    ...params,
    event_id: crypto.randomUUID(),
    previous_event_hash: lastHash ?? '0'.repeat(64),
    timestamp: new Date().toISOString()
  };
  
  const event: AuditEvent = {
    ...eventWithChain,
    event_hash: hashEvent(eventWithChain)
  };
  
  await persistEvent(event);
  return event;
}

Using AgentGate as Your Audit Backend

Implementing hash chaining, retention enforcement, and tamper detection from scratch is non-trivial. AgentGate provides this as an API so you can focus on instrumenting your agent rather than building audit infrastructure.

import AgentGate from '@agengate/sdk';

const gate = new AgentGate({ apiKey: process.env.AGENGATE_API_KEY });

// Wrap your agent's decision function
async function makeDecision(agentId: string, input: unknown) {
  const inputHash = createHash('sha256')
    .update(JSON.stringify(input))
    .digest('hex');
  
  const modelOutput = await yourModel.predict(input);
  
  // Record to AgentGate — hash chaining and tamper evidence are automatic
  await gate.audit.record({
    agent_id: agentId,
    model_version: process.env.MODEL_VERSION!,
    input_hash: inputHash,
    decision_type: 'classification',
    output: modelOutput,
    confidence: modelOutput.confidence,
    policy_version: 'v1.2'
  });
  
  return modelOutput;
}

AgentGate stores events in a write-once append-only log with automatic hash chain verification. You can retrieve the chain verification status at any time via the audit API.

What Not to Store in Your Audit Trail

The most common mistake is storing raw personal data in audit logs. Audit logs typically have weaker access controls than primary data stores and often flow to third-party log aggregators. Storing a user's full name, date of birth, or financial details in an audit event creates a secondary PII exposure risk.

Store hashes of inputs, not inputs. Store pseudonymous identifiers, not names. Store reason codes, not the underlying feature values. If you need to reconstruct the full picture for a specific investigation, join the audit log record with your primary data store at query time.

Retention, Archival, and Deletion

Design your retention policy before you write your first audit event. Key questions:

What is the longest regulatory retention requirement for your jurisdiction and use case?
Can you delete audit events when the underlying subject exercises a GDPR right to erasure? (Answer: only the personal data fields — the decision record itself may need to be retained for other legal obligations.)
How will you archive events after the active retention window without losing hash chain integrity?

AgentGate's retention settings allow you to configure per-agent retention windows and automatic archival to cold storage. Deletion of pseudonymous identifiers is supported without breaking the hash chain. See the retention configuration docs.

Verifying Chain Integrity

A hash chain is only useful if you verify it. Build a scheduled job that replays the chain and confirms every hash is valid:

async function verifyChainIntegrity(agentId: string, fromDate: Date) {
  const events = await gate.audit.list({ agent_id: agentId, from: fromDate });
  
  for (let i = 1; i < events.length; i++) {
    const expectedPrevHash = events[i - 1].event_hash;
    if (events[i].previous_event_hash !== expectedPrevHash) {
      throw new Error(
        `Chain broken at event ${events[i].event_id}: ` +
        `expected prev_hash ${expectedPrevHash}, ` +
        `got ${events[i].previous_event_hash}`
      );
    }
  }
  
  return { verified: true, events_checked: events.length };
}

Run this verification daily and alert your security team on any failure. A broken chain is a serious security event.

From Audit Trail to Evidence Package

When a regulator or auditor requests evidence, you need to produce a package that demonstrates the integrity of your records. This means exporting the event chain, the chain verification report, and the policy versions that governed each decision period. AgentGate's compliance export generates this package in one API call, formatted for common audit frameworks including SOC 2, ISO 27001, and PCI-DSS.

Get production-grade audit trails in minutes

AgentGate handles SHA-256 hash chaining, write-once storage, retention policies, and compliance export — so you can focus on building your agent, not the audit infrastructure around it.

Start free | Read the audit docs | See pricing