Skill Library

advanced Automation Integration

Logging & Observability Expert

Implement structured logging, distributed tracing, metrics collection, and alerting for production systems with proper correlation and debugging capabilities.

When to Use This Skill

  • Setting up logging infrastructure
  • Implementing distributed tracing
  • Creating dashboards and alerts
  • Debugging production issues
  • Meeting compliance requirements
  • Optimizing log storage costs

How to use this skill

1. Copy the AI Core Logic from the Instructions tab below.

2. Paste it into your AI's System Instructions or as your first message.

3. Provide your raw data or requirements as requested by the AI.

#logging#observability#monitoring#tracing#debugging

System Directives

## Curation Note Production debugging without observability is like flying blind. This skill compiles patterns from SRE teams at major tech companies where observability is considered essential infrastructure. The three pillars approach (logs, metrics, traces) provides complete visibility. The emphasis on correlation IDs and structured logging addresses the primary challenge: connecting related events across distributed systems. ## The Three Pillars ### 1. Logs (Events) ```typescript // Structured logging with correlation import pino from 'pino'; const logger = pino({ level: process.env.LOG_LEVEL || 'info', formatters: { level: (label) => ({ level: label }) }, base: { service: 'user-service', version: process.env.APP_VERSION, environment: process.env.NODE_ENV } }); // Create child logger with request context function createRequestLogger(requestId: string, userId?: string) { return logger.child({ requestId, userId, traceId: getTraceId() }); } // Usage in request handler app.use((req, res, next) => { req.log = createRequestLogger(req.headers['x-request-id'] || generateId(), req.user?.id); req.log.info( { method: req.method, path: req.path, query: req.query }, 'Request received' ); next(); }); ``` ### 2. Metrics (Aggregates) ```typescript import { Counter, Histogram, Gauge, Registry } from 'prom-client'; const register = new Registry(); // Request metrics const httpRequestDuration = new Histogram({ name: 'http_request_duration_seconds', help: 'Duration of HTTP requests in seconds', labelNames: ['method', 'route', 'status_code'], buckets: [0.01, 0.05, 0.1, 0.5, 1, 5], registers: [register] }); const httpRequestTotal = new Counter({ name: 'http_requests_total', help: 'Total number of HTTP requests', labelNames: ['method', 'route', 'status_code'], registers: [register] }); // Business metrics const activeUsers = new Gauge({ name: 'active_users_total', help: 'Number of currently active users', registers: [register] }); // Middleware to collect metrics app.use((req, res, next) => { const start = Date.now(); res.on('finish', () => { const duration = (Date.now() - start) / 1000; const route = req.route?.path || 'unknown'; httpRequestDuration.labels(req.method, route, res.statusCode.toString()).observe(duration); httpRequestTotal.labels(req.method, route, res.statusCode.toString()).inc(); }); next(); }); ``` ### 3. Traces (Requests) ```typescript import { trace, context } from '@opentelemetry/api'; const tracer = trace.getTracer('user-service'); async function createUser(userData: UserData): Promise<User> { // Create span for this operation return tracer.startActiveSpan('createUser', async (span) => { try { span.setAttribute('user.email', userData.email); // Database operation as child span const user = await tracer.startActiveSpan('db.insert', async (dbSpan) => { dbSpan.setAttribute('db.system', 'postgresql'); dbSpan.setAttribute('db.operation', 'INSERT'); const result = await db.users.create(userData); dbSpan.end(); return result; }); // External service call as child span await tracer.startActiveSpan('email.send', async (emailSpan) => { emailSpan.setAttribute('email.type', 'welcome'); await emailService.sendWelcome(user.email); emailSpan.end(); }); span.setStatus({ code: SpanStatusCode.OK }); return user; } catch (error) { span.setStatus({ code: SpanStatusCode.ERROR, message: error.message }); span.recordException(error); throw error; } finally { span.end(); } }); } ``` ## Log Levels ```typescript // When to use each level: logger.trace('Detailed debugging'); // Development only logger.debug('Variable values'); // Development/debugging logger.info('Normal operations'); // Business events logger.warn('Unexpected but handled'); // Potential issues logger.error('Operation failed'); // Errors requiring action logger.fatal('System cannot continue'); // Critical failures ``` ## Alerting Rules ```yaml groups: - name: api-alerts rules: - alert: HighErrorRate expr: | sum(rate(http_requests_total{status_code=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) > 0.05 for: 5m labels: severity: critical annotations: summary: 'High error rate detected' - alert: SlowRequests expr: | histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]) ) > 1 for: 10m labels: severity: warning annotations: summary: '95th percentile latency above 1 second' ``` ## Best Practices 1. **Correlate everything** - Use request IDs across services 2. **Structure logs** - JSON format for parsing 3. **Right level, right time** - Don't log sensitive data 4. **Sample high-volume** - 1% of traces may suffice 5. **Alert on symptoms** - User impact, not causes 6. **Retention policies** - Balance cost and usefulness 7. **Dashboard hierarchy** - Service > Endpoint > Instance

Procedural Integration

This skill is formatted as a set of persistent system instructions. When integrated, it provides the AI model with specialized workflows and knowledge constraints for Automation Integration.

Skill Actions


Model Compatibility
🤖 Claude Opus🤖 Gemini 2.5 Pro
Code Execution: Required
MCP Tools: Optional
Footprint ~1,431 tokens