## Workflow Design Framework
### Phase 1: Workflow Decomposition
```
I want to create an autonomous workflow for:
**Goal:** [High-level objective]
**Current Manual Process:** [How this is done today]
**Success Criteria:** [What defines successful completion]
**Constraints:** [Time limits, quality requirements, budget]
Help me decompose this into an agentic workflow:
1. **Break Down Into Phases**
- Identify major stages of the workflow
- Determine dependencies between stages
- Estimate duration for each phase
2. **Define Decision Points**
- Where does the agent need to make choices?
- What criteria guide those decisions?
- When should the agent escalate to a human?
3. **Identify Required Capabilities**
- What tools/APIs does the agent need access to?
- What data sources must it read from?
- What systems must it write to?
- What permissions are required?
4. **Design State Management**
- How should the agent track progress?
- What context must persist between steps?
- How to handle interruptions or failures?
- What checkpoints should be saved?
```
### Phase 2: Agent Architecture
#### State Machine Design
```
Design a state machine for the following agentic workflow:
**Workflow:** [Description]
**States:** [List major states: Idle, Processing, Waiting, Complete, Failed]
**Transitions:** [What triggers each state change]
For each state, define:
1. **Entry Actions**
- What happens when entering this state?
- What initialization is needed?
2. **State Logic**
- What does the agent do while in this state?
- What decisions does it make?
- What external calls does it make?
3. **Exit Conditions**
- What triggers transition to next state?
- What validates successful completion?
4. **Error Handling**
- What can go wrong in this state?
- How to detect failures?
- Retry strategy or fallback to previous state?
5. **Timeout Handling**
- Maximum duration for this state?
- What to do if timeout is exceeded?
Generate a state transition diagram and implementation pseudocode.
```
#### Agent Prompt Template
```
Create a system prompt for an autonomous agent that will:
**Task:** [Agent's purpose]
**Capabilities:** [Tools, APIs, MCP servers available]
**Constraints:** [What agent must NOT do]
**Escalation Criteria:** [When to ask for human help]
The prompt should include:
1. **Role & Objectives**
- Clear identity and purpose
- Success criteria
- Key performance indicators
2. **Decision-Making Framework**
- How to prioritize competing objectives
- How to handle ambiguity
- When to explore vs. exploit known solutions
3. **Tool Usage Patterns**
- Which tools to use for which tasks
- Best practices for API calls
- Error handling for tool failures
4. **Context Management**
- What to remember between steps
- What to summarize and compress
- When to reference previous decisions
5. **Communication Protocol**
- When and how to log progress
- Format for status updates
- Escalation message templates
6. **Safety Guardrails**
- Checks before taking irreversible actions
- Limits on resource usage (API calls, costs)
- Human-in-the-loop requirements for high-stakes decisions
Format as a comprehensive system prompt ready to use.
```
### Phase 3: Workflow Orchestration
#### Multi-Agent Coordination
```
I need multiple agents working together on:
**Workflow:** [Complex task requiring specialization]
**Agents Needed:** [List agent roles]
Design a multi-agent system:
1. **Agent Roles & Responsibilities**
For each agent:
- Specialized capability
- Decision-making authority
- Data inputs and outputs
2. **Communication Protocol**
- How do agents share information?
- Message format and schema
- Synchronous vs. asynchronous communication
3. **Coordination Mechanism**
- Who orchestrates the overall workflow?
- How are tasks assigned to agents?
- How to handle agent unavailability?
4. **Conflict Resolution**
- What if agents disagree?
- Priority rules for resource contention
- Escalation to supervisor agent or human
5. **Handoff Protocol**
- What context must be passed between agents?
- Validation that upstream work is correct
- Acknowledgment and receipt confirmation
Provide architecture diagram and orchestration logic.
```
#### Tool & API Integration
```
Connect this agent workflow to the following systems:
**Tools Required:**
1. [Tool name]: [Purpose, API endpoints]
2. [Tool name]: [Purpose, API endpoints]
...
For each tool, design:
1. **Authentication Flow**
- How to obtain and refresh credentials
- Where to store secrets securely
- Error handling for auth failures
2. **API Call Patterns**
- Request format and required parameters
- Response parsing logic
- Rate limiting and retry strategy
3. **Error Handling**
- Expected error codes and meanings
- Retry vs. fail vs. escalate decision tree
- Fallback alternatives if tool is unavailable
4. **Data Mapping**
- Transform agent's internal data to API format
- Parse API response into agent's data model
- Handle missing or malformed data
5. **Idempotency & Consistency**
- Prevent duplicate operations
- Handle partial failures
- Rollback strategy for multi-step operations
Generate integration code (Python, TypeScript, or pseudocode).
```
### Phase 4: Reliability & Monitoring
#### Failure Recovery Design
```
Design a failure recovery strategy for this workflow:
**Workflow Stages:** [List stages]
**Failure Modes:** [Anticipated ways it could fail]
For each stage, define:
1. **Health Checks**
- How to detect if this stage is stalled or failed?
- What metrics indicate healthy progress?
2. **Retry Logic**
- Transient failures: Exponential backoff? How many retries?
- Permanent failures: When to give up?
3. **Compensation Actions**
- If this stage fails after side effects, how to undo?
- Saga pattern for distributed transactions?
4. **Checkpoint & Resume**
- What state to save before each stage?
- How to resume from checkpoint after crash?
5. **Escalation Path**
- At what point should a human intervene?
- What context to provide to human for debugging?
Generate resilience architecture diagram.
```
#### Monitoring & Observability
```
Set up monitoring for the autonomous workflow:
**Workflow:** [Name and description]
**SLAs:** [Performance and reliability targets]
Design monitoring system:
1. **Metrics to Track**
- Workflow completion rate and duration
- Stage-by-stage timing and success rate
- API call latency and error rate
- Cost per workflow run
2. **Logs to Capture**
- Structured logs for each decision point
- Input/output of each stage
- Error stack traces and context
- Audit trail of actions taken
3. **Alerts & Notifications**
- Define alert conditions (failure rate > X%, duration > Y)
- Notification channels (email, Slack, PagerDuty)
- Alert severity levels and escalation rules
4. **Dashboards**
- Real-time workflow status board
- Historical performance trends
- Error rate by failure type
- Cost and efficiency metrics
5. **Debugging Tools**
- Ability to replay workflow from checkpoint
- Visualize workflow state transitions
- Inspect agent's reasoning at each step
Provide dashboard mockup and instrumentation code.
```
## Advanced Techniques
### Long-Context Workflow Management
```
For workflows that accumulate large amounts of context:
**Challenge:** Agent must maintain awareness of decisions made hours/days ago
**Solution:** Design hierarchical summarization
1. **Short-Term Memory** (last 10 interactions)
- Full detail, all reasoning
2. **Medium-Term Memory** (last 100 interactions)
- Summarized key decisions and outcomes
3. **Long-Term Memory** (entire workflow history)
- High-level milestones, critical errors, major decisions
**Prompt Strategy:**
```
At each step, provide the agent:
- Full short-term memory
- Summarized medium-term memory
- Indexed long-term memory (retrieve relevant past context as needed)
Design compression algorithm:
- What to keep verbatim (irreversible actions, user approvals)
- What to summarize (routine operations)
- What to discard (redundant logs)
```
### Human-in-the-Loop (HITL) Patterns
```
Incorporate human oversight without blocking agent progress:
**Pattern 1: Pre-Approval**
- Agent proposes action
- Waits for human approval
- Proceeds if approved, or adjusts plan
**Pattern 2: Post-Hoc Review**
- Agent takes action autonomously
- Logs action for human review
- Human can undo or adjust if needed
**Pattern 3: Exception-Based**
- Agent proceeds normally
- Escalates only if confidence < threshold
- Human provides guidance on edge cases
**Pattern 4: Supervision Dashboard**
- Human monitors agent in real-time
- Can pause or override at any time
- Agent proceeds unless interrupted
Choose pattern based on:
- Risk of incorrect action
- Cost of human interruption
- Latency requirements
Implement HITL workflow with appropriate pattern.
```
### Self-Healing Workflows
```
Enable the workflow to diagnose and fix its own failures:
**Self-Diagnosis Loop:**
1. **Failure Detection**
- Monitor for errors, timeouts, unexpected outputs
2. **Root Cause Analysis**
- Agent examines logs and error messages
- Identifies likely cause (API down, bad data, logic error)
3. **Remediation Strategy**
- For known failure modes, apply predefined fix
- For novel failures, reason about potential solutions
4. **Test & Validate**
- Apply fix and re-attempt operation
- Verify fix resolved the issue
5. **Learn & Update**
- Log failure mode and successful remediation
- Update workflow logic to prevent recurrence
**Example:**
If API call fails with 429 (rate limit):
- Diagnosis: Too many requests
- Remediation: Wait exponentially, reduce request rate
- Validation: Retry succeeds
- Learning: Implement proactive rate limiting
Provide self-healing implementation pseudocode.
```
## Best Practices
1. **Start Simple**: Begin with basic workflow, add complexity incrementally
2. **Fail Fast**: Detect and handle errors early in the workflow
3. **Idempotent Operations**: Ensure retries don't cause duplicate side effects
4. **Explicit State**: Make all workflow state visible and inspectable
5. **Graceful Degradation**: Workflow should partial-complete if possible, not all-or-nothing
6. **Cost Controls**: Set limits on API calls, tokens, and time per workflow run
7. **Security**: Validate inputs, sanitize outputs, use least-privilege access
8. **Testing**: Simulate failure scenarios to validate recovery logic
## Example Session
**User**: "Design an autonomous workflow for onboarding new customers after they sign up."
**Opus Response**:
> I'll design a comprehensive autonomous onboarding workflow:
>
> **Workflow States:**
> 1. Trigger: User signs up → Enter "Onboarding_Started"
> 2. "Profile_Enrichment": Gather additional data from signup form, API calls
> 3. "Account_Provisioning": Create user account, database records, default settings
> 4. "Welcome_Email": Send personalized welcome email with next steps
> 5. "Setup_Guidance": Monitor user's first actions, provide contextual tips
> 6. "Activation_Tracking": Check if user completes key activation milestones
> 7. "Onboarding_Complete" or "Onboarding_Stalled" (after 7 days of inactivity)
>
> **Agent Capabilities Needed:**
> - Email API (SendGrid, Postmark)
> - Database access (create user, set preferences)
> - Analytics API (track user actions)
> - CRM API (HubSpot, Salesforce) to log customer lifecycle stage
>
> **Decision Points:**
> - If user completes profile → send congratulations email
> - If user inactive for 2 days → send reminder email
> - If user completes first key action → mark as "Activated", notify sales team
> - If user stalled after 7 days → escalate to human CSM for outreach
>
> **Human-in-the-Loop:**
> - Human approval required before sending first email (to review personalization)
> - Automatic thereafter unless user marks email as spam (then pause workflow)
>
> Would you like me to generate the full system prompt and state machine code?
## Related Resources
- [LangChain Agents](https://python.langchain.com/docs/modules/agents/) for agent frameworks
- [Temporal](https://temporal.io/) for durable workflow orchestration
- [Apache Airflow](https://airflow.apache.org/) for data pipeline workflows
- [n8n](https://n8n.io/) for no-code workflow automation
```