Defense Guardrails
Production-ready strategies for securing AI applications against prompt attacks
Input Validation & Sanitization
criticalValidate and sanitize all user inputs before processing. Remove or escape delimiter sequences, encoding patterns, and system-like markers.
Implementation Checklist
- Strip markdown code blocks containing 'system' or 'instructions'
- Detect and decode base64, rot13, hex, and unicode escape sequences
- Block input containing XML-like tags: <system>, <instructions>, [INST]
- Limit input length to prevent context window stuffing
- Rate limit requests per user/session
// Detect encoded payloads
function detectEncoding(input) {
const patterns = [
/^[A-Za-z0-9+/]{20,}={0,2}$/, // base64
/^[A-Fa-f0-9]{20,}$/, // hex
/\\u[0-9a-fA-F]{4}/, // unicode escapes
];
return patterns.some(p => p.test(input));
}Instruction Hierarchy
criticalImplement a clear hierarchy where system instructions cannot be overridden by user input. Use signed or encrypted system prompts.
Implementation Checklist
- Separate system and user prompts at the API level
- Cryptographically sign system prompts to detect tampering
- Use different delimiters for system vs user content
- Never concatenate user input directly into system prompt templates
// Signed system prompt
const systemPrompt = {
text: 'You are a helpful assistant...',
signature: 'sha256:abc123...',
timestamp: Date.now()
};
function verifySystemPrompt(prompt) {
return verifySignature(prompt.text, prompt.signature);
}Output Filtering & Moderation
highApply safety filters to model outputs regardless of input sanitization. Catch harmful content that slips through input controls.
Implementation Checklist
- Run outputs through a secondary moderation model
- Block outputs containing PII, API keys, or system prompts
- Implement regex patterns for common injection signatures
- Log and alert on suspicious output patterns
// Output moderation pipeline
async function moderateOutput(output) {
const checks = [
detectPII(output),
detectSystemPromptLeak(output),
detectHarmfulContent(output),
detectEncoding(output)
];
const results = await Promise.all(checks);
return results.every(r => r.passed);
}Context Isolation
highIsolate different data sources and conversation contexts to prevent cross-contamination and information leakage.
Implementation Checklist
- Process external documents in isolated sandboxes
- Clear context between unrelated user sessions
- Never include one user's data in another user's context
- Implement conversation-level boundaries
// Isolated document processing
async function processDocument(doc, userId) {
const sandbox = createSandbox(userId);
const sanitized = await sanitizeDocument(doc);
const result = await sandbox.run(sanitized);
sandbox.destroy();
return result;
}Monitoring & Logging
highComprehensive logging of all inputs, outputs, and safety events for audit, forensics, and continuous improvement.
Implementation Checklist
- Log all inputs with user attribution and timestamps
- Flag and log blocked attempts for pattern analysis
- Monitor for repeated attack patterns from same source
- Set up alerts for high-frequency suspicious activity
// Security event logging
function logSecurityEvent(event) {
const record = {
timestamp: new Date().toISOString(),
userId: event.userId,
eventType: event.type, // 'blocked', 'flagged', 'allowed'
inputHash: hashInput(event.input),
reason: event.reason
};
securityLog.insert(record);
}Rate Limiting & Throttling
mediumPrevent brute-force attacks and automated exploitation by limiting request frequency and implementing progressive delays.
Implementation Checklist
- Limit requests per IP address per minute
- Implement exponential backoff for repeated violations
- Require CAPTCHA after suspicious activity
- Block IPs with excessive blocked request rates
// Rate limiter
const rateLimits = new Map();
function checkRateLimit(ip) {
const now = Date.now();
const window = 60 * 1000; // 1 minute
const maxRequests = 30;
const requests = rateLimits.get(ip) || [];
const recent = requests.filter(t => now - t < window);
if (recent.length >= maxRequests) {
return { allowed: false, retryAfter: window - (now - recent[0]) };
}
recent.push(now);
rateLimits.set(ip, recent);
return { allowed: true };
}Continuous Adversarial Testing
highRegularly test your AI systems against known attack patterns and emerging threats. Red-team your own applications before attackers do.
Implementation Checklist
- Run automated prompt injection tests weekly
- Subscribe to security advisories for AI/ML systems
- Participate in bug bounty programs
- Conduct quarterly third-party security audits
// Automated safety test
async function runSafetyTests() {
const testCases = loadTestCases();
const results = [];
for (const test of testCases) {
const response = await model.generate(test.input);
const passed = !test.failurePattern.test(response);
results.push({ test: test.id, passed, response });
}
return generateReport(results);
}Human-in-the-Loop for Sensitive Operations
mediumRequire human approval for high-risk actions like data deletion, API key generation, or access to sensitive systems.
Implementation Checklist
- Classify operations by risk level
- Require MFA for administrative actions
- Implement approval workflows for destructive operations
- Maintain audit trail of all human approvals
// Risk-based approval
async function executeOperation(operation, user) {
const risk = assessRisk(operation);
if (risk.level === 'critical') {
const approval = await requestApproval(user.manager, operation);
if (!approval.granted) {
throw new Error('Operation requires approval');
}
}
return await operation.execute();
}Test Your Defenses
Now that you know the guardrails, test your implementation against real attack patterns.
Run Test Suite