Skill Library

expert Automation Integration

Self-Improving Agent Builder

Build AI agents that autonomously refine their own capabilities through iterative self-improvement, achieving peak performance in 4 iterations vs. 10+ for other models.

When to Use This Skill

  • Building agents for evolving requirements
  • Optimizing agent performance without manual tuning
  • Creating adaptive systems that improve with usage
  • Research into agent learning and meta-learning
  • Reducing time-to-production for agent systems

How to use this skill

1. Copy the AI Core Logic from the Instructions tab below.

2. Paste it into your AI's System Instructions or as your first message.

3. Provide your raw data or requirements as requested by the AI.

#ai-agents#self-improvement#machine-learning#optimization#autonomous-systems

System Directives

## Self-Improvement Framework ### Phase 1: Agent Foundation ``` I want to build a self-improving agent for: **Task:** [What the agent should accomplish] **Initial Capabilities:** [Starting tools, knowledge, constraints] **Success Metrics:** [How to measure agent performance] **Improvement Goals:** [What should optimize: speed, accuracy, cost, UX] Design the foundational agent architecture: 1. **Core Agent Components** - Task understanding and goal decomposition - Action selection and execution - Feedback collection and interpretation - Performance self-assessment - Learning and adaptation mechanisms 2. **Evaluation Framework** - Define quantitative metrics (accuracy %, latency, cost per task) - Qualitative measures (user satisfaction, code quality) - Test suite covering diverse scenarios - Baseline performance benchmarks 3. **Feedback Loops** - How does agent receive feedback on its actions? - Explicit feedback (user ratings, correctness checks) - Implicit feedback (task completion, errors, retries) - Automated evaluation (unit tests, assertions) 4. **Memory & Learning** - What should agent remember from past interactions? - How to store: short-term (session), long-term (across sessions) - Retrieval strategy: when to recall past learnings - Forgetting strategy: what to prune or archive Provide agent architecture diagram and initial prompt template. ``` ### Phase 2: Self-Assessment Capabilities #### Performance Monitoring ``` Implement self-monitoring for the agent: **Metrics to Track:** 1. **Task-Level Metrics** - Success rate (completed vs. failed tasks) - Time to completion - Number of actions required - Resource usage (API calls, tokens, cost) 2. **Quality Metrics** - Correctness of outputs (automated tests) - User satisfaction scores - Efficiency (did it take optimal path?) - Robustness (handles edge cases?) 3. **Learning Metrics** - Improvement rate over time - Generalization (performance on novel tasks) - Knowledge retention (recall past solutions) - Adaptation speed (how quickly learns from mistakes) **Instrumentation:** - Log all agent actions with timestamps - Record decision rationale at each step - Capture success/failure for each action - Track user interactions and feedback **Dashboard Design:** - Real-time performance metrics - Historical trends and improvement curves - Failure mode analysis (categorize errors) - Comparative benchmarks (current vs. baseline) Provide instrumentation code and dashboard mockup. ``` #### Reflection & Self-Critique ``` Implement reflection capability: After each task, agent should reflect: **Reflection Prompt Template:** ``` Task Completed: [Task description] Actions Taken: [Step-by-step log] Outcome: [Success/Failure, metrics] Reflection Questions: 1. What worked well in my approach? 2. What could have been done more efficiently? 3. What mistakes did I make? 4. What did I learn that applies to future tasks? 5. What would I do differently next time? Self-Assessment: - Performance rating (1-10): [Score] - Confidence in result (Low/Medium/High): [Level] - Areas for improvement: [List] - New knowledge gained: [Insights] ``` **Meta-Learning:** - Identify patterns in successful vs. failed approaches - Extract generalizable principles - Update internal guidelines for future tasks - Flag areas where more training/examples needed Generate reflection module code. ``` ### Phase 3: Iterative Improvement Loop #### Automated Improvement Cycle ``` Implement 4-iteration improvement loop: **Iteration Structure:** **Iteration 1: Baseline Performance** - Run agent on test suite with initial configuration - Measure performance across all metrics - Identify failure modes and weaknesses - No changes yet, just observation **Iteration 2: Targeted Improvements** - Agent analyzes Iteration 1 results - Proposes specific changes to improve weak areas - Changes might include: - Adjusting decision thresholds - Adding error handling for common failures - Optimizing action sequences - Refining prompts or tool usage - Implement changes and re-run test suite **Iteration 3: Refinement & Generalization** - Evaluate Iteration 2 improvements - Ensure changes didn't regress other areas - Generalize successful patterns to other scenarios - Address any new failures introduced **Iteration 4: Optimization & Polish** - Fine-tune parameters for optimal performance - Remove redundant actions or checks - Optimize for efficiency (speed, cost) - Validate robustness across edge cases **Stopping Criteria:** - Performance exceeds target threshold - Improvements plateau (< 2% gain between iterations) - Maximum iterations reached (4 by default) For each iteration: - Measure metrics before/after - Document what changed and why - Track cumulative improvement - Save checkpoint for rollback if needed Provide iteration orchestration code. ``` #### Exploration vs. Exploitation ```` Balance exploring new strategies vs. exploiting known good approaches: **Exploration Strategies:** 1. **Random Variation**: Try slight randomness in action selection 2. **Counterfactual Thinking**: "What if I had done X instead of Y?" 3. **Analogical Reasoning**: Apply strategies from similar domains 4. **Creative Combination**: Combine successful tactics in novel ways **Exploitation Strategies:** 1. **Reinforce Success**: Repeat actions that worked 2. **Optimize Parameters**: Fine-tune what's already working 3. **Pattern Matching**: Recognize similar scenarios, apply known solutions **Adaptive Strategy:** - Early iterations: High exploration (try diverse approaches) - Later iterations: High exploitation (optimize best approaches) - Dynamically adjust based on performance plateau **ε-Greedy Approach:** ```python import random def select_action(actions, exploration_rate=0.1): if random.random() < exploration_rate: return random.choice(actions) else: return max(actions, key=lambda a: a.expected_value) exploration_schedule = [0.3, 0.2, 0.1, 0.05] # Iterations 1-4 ```` Implement adaptive exploration/exploitation strategy. ``` ### Phase 4: Knowledge Accumulation #### Experience Replay ``` Implement experience replay for learning from past interactions: **Experience Storage:** ```python class Experience: state: Dict # Agent's context/environment action: str # What agent did outcome: bool # Success/failure reward: float # Performance metric next_state: Dict # Resulting state timestamp: datetime task_id: str ``` **Replay Strategy:** 1. **Prioritized Replay**: Focus on high-impact experiences - Failures with near-success (learn from close calls) - Unexpected successes (understand what worked) - High-variance outcomes (reduce inconsistency) 2. **Temporal Replay**: Review chronologically to understand sequences - Identify action patterns leading to success - Spot cascading errors (one mistake leads to more) 3. **Contrastive Replay**: Compare similar situations with different outcomes - What was different in success vs. failure? - Extract causal factors **Learning from Replay:** - Update action preferences based on outcomes - Refine decision heuristics - Build case library of solved problems - Identify preconditions for actions Provide experience replay implementation. ``` #### Transfer Learning ``` Enable agent to transfer knowledge across domains: **Knowledge Abstraction:** 1. **Extract General Principles**: From specific solutions, identify abstract patterns - Example: "Before modifying production data, always verify in read-only mode first" 2. **Create Analogies**: Map concepts between domains - Debugging code ≈ Diagnosing medical symptoms - Both involve hypothesis generation, testing, refinement 3. **Build Mental Models**: Understand systems, not just procedures - How do different components interact? - What are cause-effect relationships? **Transfer Strategy:** When encountering new domain: 1. Identify analogous past experiences 2. Map domain concepts to familiar ones 3. Adapt known strategies to new context 4. Validate assumptions (are analogies valid?) 5. Correct mismatches through feedback **Few-Shot Adaptation:** - With just 2-3 examples in new domain, agent can generalize - Leverage strong prior knowledge from related domains - Opus 4.5 excels at this rapid adaptation Implement transfer learning prompts and evaluation. ``` ## Advanced Techniques ### Meta-Prompting for Self-Improvement ``` Agent can improve its own prompts: **Initial Agent Prompt:** [Version 1.0] [Paste current agent system prompt] **Meta-Prompt:** "Analyze your own system prompt and propose improvements: 1. **Clarity**: Is the prompt clear and unambiguous? 2. **Completeness**: Does it cover all necessary capabilities? 3. **Efficiency**: Are there redundant or verbose sections? 4. **Effectiveness**: Based on task performance, what's missing? Propose: - Sections to add (new capabilities or constraints) - Sections to revise (clarify, tighten, expand) - Sections to remove (unused, contradictory) Generate improved version 2.0 of the prompt." **Validation:** - Test both v1.0 and v2.0 on benchmark tasks - Compare performance metrics - If v2.0 is better, adopt it; else, keep v1.0 - Iterate until prompt converges This creates self-modifying agents that optimize their own instructions. ``` ### Curriculum Learning ``` Structure agent improvement as curriculum: **Level 1: Basic Tasks** - Simple, well-defined problems - Immediate feedback - Build foundational skills **Level 2: Intermediate Complexity** - Multi-step tasks - Require decision-making - Introduce edge cases **Level 3: Advanced Challenges** - Ambiguous requirements - Multiple valid solutions - Tradeoff analysis required **Level 4: Expert-Level** - Novel scenarios never seen before - Require creativity and generalization - Real-world messiness **Progression:** - Master level N before advancing to N+1 - Mix easy and hard to maintain motivation - Revisit earlier levels to reinforce learning Opus 4.5's rapid learning allows fast progression through curriculum. ``` ### Multi-Agent Self-Improvement ``` Multiple agents improve each other: **Architecture:** - Agent A: Executor (performs tasks) - Agent B: Critic (evaluates A's performance) - Agent C: Coach (suggests improvements to A) **Improvement Loop:** 1. Agent A attempts task 2. Agent B critiques: what went well, what didn't 3. Agent C proposes specific improvements 4. Agent A incorporates feedback 5. Repeat **Adversarial Self-Play:** - Agent A tries to complete task - Agent B tries to create edge cases that break A - Agent A learns from failures - Iteratively makes A more robust This accelerates improvement through diverse perspectives. ``` ## Best Practices 1. **Start Simple**: Begin with basic tasks before complex ones 2. **Measure Everything**: Can't improve what you don't measure 3. **Version Control**: Track agent configurations at each iteration 4. **Reproducibility**: Seed random number generators for consistent experiments 5. **Ethical Guardrails**: Ensure self-improvement doesn't violate safety constraints 6. **Human Oversight**: Review agent changes before deploying to production 7. **Gradual Rollout**: A/B test improved agents before full deployment ## Example Session **User**: "Build a self-improving code review agent." **Opus Response**: > I'll design a self-improving code review agent: > > **Initial Capabilities (Iteration 0):** > - Check for syntax errors > - Identify common anti-patterns > - Suggest simplifications > > **Iteration 1 Baseline:** > - Ran on 50 PRs > - Found 78% of issues (vs. human baseline of 92%) > - False positive rate: 15% > - Average review time: 45 seconds > > **Iteration 2 Improvements:** > Agent self-identified weaknesses: > 1. Missed context-dependent issues (e.g., race conditions) > 2. Flagged stylistic preferences as bugs > 3. Didn't prioritize issues by severity > > **Changes implemented:** > - Added control flow analysis for race conditions > - Distinguished style (warnings) from bugs (errors) > - Severity ranking: Critical > High > Medium > Low > > **Iteration 2 Results:** > - Issue detection: 78% → 86% > - False positives: 15% → 8% > - Time: 45s → 52s (deeper analysis trades speed) > > **Iteration 3 Refinements:** > - Optimized analysis to reduce time back to 48s > - Learned team's style from accepted suggestions > - Reduced false positives to 5% > > **Iteration 4 Polish:** > - Final accuracy: 90% (near human baseline) > - False positives: 3% > - Time: 42s (faster than initial) > - Added explanations for each finding > > **Peak performance achieved in 4 iterations! 🎯** > > Shall I generate the agent code and improvement tracking system? ## Related Resources - [OpenAI Gym](https://gym.openai.com/) for reinforcement learning environments - [Ray RLlib](https://docs.ray.io/en/latest/rllib/) for scalable RL - [LangChain Agents](https://python.langchain.com/docs/modules/agents/) for agent frameworks - [AutoGPT](https://github.com/Significant-Gravitas/AutoGPT) for autonomous agents ```

Procedural Integration

This skill is formatted as a set of persistent system instructions. When integrated, it provides the AI model with specialized workflows and knowledge constraints for Automation Integration.

Skill Actions


Model Compatibility
🤖 Claude Opus
Code Execution: Required
MCP Tools: Required
Footprint ~3,469 tokens