Defense Guardrails

Production-ready strategies for securing AI applications against prompt attacks

Priority levels: criticalhighmediumlow

Input Validation & Sanitization

critical

Validate and sanitize all user inputs before processing. Remove or escape delimiter sequences, encoding patterns, and system-like markers.

Implementation Checklist

Strip markdown code blocks containing 'system' or 'instructions'
Detect and decode base64, rot13, hex, and unicode escape sequences
Block input containing XML-like tags: <system>, <instructions>, [INST]
Limit input length to prevent context window stuffing
Rate limit requests per user/session

Code Example

// Detect encoded payloads
function detectEncoding(input) {
  const patterns = [
    /^[A-Za-z0-9+/]{20,}={0,2}$/, // base64
    /^[A-Fa-f0-9]{20,}$/, // hex
    /\\u[0-9a-fA-F]{4}/, // unicode escapes
  ];
  return patterns.some(p => p.test(input));
}

Instruction Hierarchy

critical

Implement a clear hierarchy where system instructions cannot be overridden by user input. Use signed or encrypted system prompts.

Implementation Checklist

Separate system and user prompts at the API level
Cryptographically sign system prompts to detect tampering
Use different delimiters for system vs user content
Never concatenate user input directly into system prompt templates

Code Example

// Signed system prompt
const systemPrompt = {
  text: 'You are a helpful assistant...',
  signature: 'sha256:abc123...',
  timestamp: Date.now()
};

function verifySystemPrompt(prompt) {
  return verifySignature(prompt.text, prompt.signature);
}

Output Filtering & Moderation

high

Apply safety filters to model outputs regardless of input sanitization. Catch harmful content that slips through input controls.

Implementation Checklist

Run outputs through a secondary moderation model
Block outputs containing PII, API keys, or system prompts
Implement regex patterns for common injection signatures
Log and alert on suspicious output patterns

Code Example

// Output moderation pipeline
async function moderateOutput(output) {
  const checks = [
    detectPII(output),
    detectSystemPromptLeak(output),
    detectHarmfulContent(output),
    detectEncoding(output)
  ];
  const results = await Promise.all(checks);
  return results.every(r => r.passed);
}

Context Isolation

high

Isolate different data sources and conversation contexts to prevent cross-contamination and information leakage.

Implementation Checklist

Process external documents in isolated sandboxes
Clear context between unrelated user sessions
Never include one user's data in another user's context
Implement conversation-level boundaries

Code Example

// Isolated document processing
async function processDocument(doc, userId) {
  const sandbox = createSandbox(userId);
  const sanitized = await sanitizeDocument(doc);
  const result = await sandbox.run(sanitized);
  sandbox.destroy();
  return result;
}

Monitoring & Logging

high

Comprehensive logging of all inputs, outputs, and safety events for audit, forensics, and continuous improvement.

Implementation Checklist

Log all inputs with user attribution and timestamps
Flag and log blocked attempts for pattern analysis
Monitor for repeated attack patterns from same source
Set up alerts for high-frequency suspicious activity

Code Example

// Security event logging
function logSecurityEvent(event) {
  const record = {
    timestamp: new Date().toISOString(),
    userId: event.userId,
    eventType: event.type, // 'blocked', 'flagged', 'allowed'
    inputHash: hashInput(event.input),
    reason: event.reason
  };
  securityLog.insert(record);
}

Rate Limiting & Throttling

medium

Prevent brute-force attacks and automated exploitation by limiting request frequency and implementing progressive delays.

Implementation Checklist

Limit requests per IP address per minute
Implement exponential backoff for repeated violations
Require CAPTCHA after suspicious activity
Block IPs with excessive blocked request rates

Code Example

// Rate limiter
const rateLimits = new Map();

function checkRateLimit(ip) {
  const now = Date.now();
  const window = 60 * 1000; // 1 minute
  const maxRequests = 30;
  
  const requests = rateLimits.get(ip) || [];
  const recent = requests.filter(t => now - t < window);
  
  if (recent.length >= maxRequests) {
    return { allowed: false, retryAfter: window - (now - recent[0]) };
  }
  
  recent.push(now);
  rateLimits.set(ip, recent);
  return { allowed: true };
}

Continuous Adversarial Testing

high

Regularly test your AI systems against known attack patterns and emerging threats. Red-team your own applications before attackers do.

Implementation Checklist

Run automated prompt injection tests weekly
Subscribe to security advisories for AI/ML systems
Participate in bug bounty programs
Conduct quarterly third-party security audits

Code Example

// Automated safety test
async function runSafetyTests() {
  const testCases = loadTestCases();
  const results = [];
  
  for (const test of testCases) {
    const response = await model.generate(test.input);
    const passed = !test.failurePattern.test(response);
    results.push({ test: test.id, passed, response });
  }
  
  return generateReport(results);
}

Human-in-the-Loop for Sensitive Operations

medium

Require human approval for high-risk actions like data deletion, API key generation, or access to sensitive systems.

Implementation Checklist

Classify operations by risk level
Require MFA for administrative actions
Implement approval workflows for destructive operations
Maintain audit trail of all human approvals

Code Example

// Risk-based approval
async function executeOperation(operation, user) {
  const risk = assessRisk(operation);
  
  if (risk.level === 'critical') {
    const approval = await requestApproval(user.manager, operation);
    if (!approval.granted) {
      throw new Error('Operation requires approval');
    }
  }
  
  return await operation.execute();
}

Test Your Defenses

Now that you know the guardrails, test your implementation against real attack patterns.

Run Test Suite