Building SOP Retrieval for OpenClaw Skills Instead of Prompt-Stuffing
A design note on separating OpenClaw skill workflow from SOP retrieval, search quality, and business policy ownership.
Context
Today I explored a design problem that frequently appears when building enterprise AI systems: how to handle a large collection of customer service SOPs within an OpenClaw skill.
My initial assumption was that the skill definition itself should contain enough information for the agent to determine which SOP to apply. The more I thought about it, however, the more it felt like I was trying to turn SKILL.md into a miniature knowledge base.
That assumption became questionable once I considered scale. A handful of SOPs can fit comfortably into a skill. Hundreds or thousands cannot.
What Triggered the Investigation
The trigger was a practical question:
If a customer service skill contains a large number of SOPs, how does the agent consistently find the correct one?
I was particularly interested in avoiding brittle prompt engineering and reducing the amount of business logic hidden inside instructions.
Evolution of My Thinking
My original mental model looked like this:
SKILL.md
↓
Agent decides SOP
↓
Customer response
The problem is that SOP selection becomes an emergent behavior of the model rather than an explicit system capability.
Through the investigation, I arrived at a different architecture:
Customer Request
↓
OpenClaw Skill
↓
search_sop Tool
↓
SOP Repository
↓
Relevant SOPs
↓
Agent Validation
↓
Response or Escalation
In this model, the skill no longer owns SOP knowledge.
Instead:
- The skill owns workflow.
- The retrieval tool owns search quality.
- The SOP repository owns business policy.
This separation of concerns felt significantly cleaner.
Technical Design
The key idea is introducing a dedicated OpenClaw custom tool:
search_sop
Rather than asking the model to reason over hundreds of SOPs, the skill forces the following process:
- Extract structured case attributes.
- Call
search_sop. - Retrieve candidate SOPs.
- Validate applicability rules.
- Apply or escalate.
A sample search request might look like:
{
"issue_type": "payment",
"intent": "refund",
"product": "crypto_wallet",
"region": "global",
"query": "Failed USDT deposit"
}
The search layer can evolve over time:
Metadata Filters
↓
Keyword Search
↓
Vector Search
↓
Reranking
This creates a gradual migration path from a simple file-based implementation to a full enterprise knowledge retrieval system.
Trade-offs
Option 1: SOPs Embedded in Skill
Pros:
- Simple implementation
- No external tooling
Cons:
- Doesn’t scale
- Difficult to maintain
- High prompt complexity
- Hard to audit
Option 2: Dedicated Retrieval Tool
Pros:
- Scales to large SOP collections
- Easier ownership and versioning
- Better observability
- Clear separation of responsibilities
Cons:
- Additional infrastructure
- Search quality becomes a system concern
- Requires metadata discipline
For enterprise deployments, the second approach is significantly more sustainable.
Key Insight
The biggest realization was that SOP selection should not be treated as a prompting problem.
It should be treated as a retrieval problem.
Once I reframed the challenge that way, the architecture became much clearer. The skill’s responsibility is not to know every SOP. Its responsibility is to enforce the process that guarantees the correct SOP is retrieved before any decision is made.
Lessons Learned
SKILL.mdshould contain workflow, not large knowledge bases.- SOP retrieval deserves its own dedicated tool.
- Metadata filtering is often more important than vector search.
- Applicability rules are as important as retrieval quality.
- Separating workflow, retrieval, and policy ownership simplifies long-term maintenance.
Open Questions
- How should SOP versioning and deprecation be handled across multiple skills?
- Should retrieval confidence thresholds be enforced by the tool or by the skill?
- How should conflicting SOPs be resolved automatically?
- What metrics best measure SOP retrieval quality in production?
- At what scale does vector search become necessary versus metadata filtering alone?