The Problem Plugins Solve

An LLM without domain context is a brilliant generalist with the depth of a generic instruction manual. It knows a little about everything and a lot about nothing. Ask it to evaluate an IT vendor and you'll get something reasonable, vaguely correct, and completely useless for making a real decision — because it has no evaluation framework, no scoring criteria, no knowledge of the 17 regulatory screening questions that determine which compliance frameworks apply, and none of the 6 dimensions an experienced professional uses to structure an assessment.

Cowork plugins fix this by injecting structured domain knowledge directly into the agent's context. They're not code extensions in the traditional sense. They're executable knowledge packages: Markdown files that encode frameworks, protocols, checklists, decision criteria, and workflows that the agent reads, interprets, and executes conversationally.

The difference from a long prompt is architectural. A prompt is a monolithic blob that competes for the context window. A plugin is a modular system where each component loads on demand, based on what the user asks for. When someone types /vendor-assess, Cowork doesn't load all 17,000 lines into context. It loads the vendor-assess.md command, which tells it to read the vendor-assessment/SKILL.md skill, which in turn references references/evaluation-criteria.md and references/certifications-guide.md. Selective loading. On-demand composition. Context gets used for what matters, not for hauling dead weight.

Plugin Structure: Four Layers

A Cowork plugin is a directory with a conventional structure. Here's the it-vendor-provision plugin (71 files, v0.6.0):

it-vendor-provision/
├── .claude-plugin/
│   └── plugin.json            # Plugin manifest
├── commands/                   # 11 executable commands
│   ├── vendor-assess.md
│   ├── contract-review.md
│   ├── license-audit.md
│   ├── shadow-audit.md
│   ├── audit-prep.md
│   ├── vendor-dashboard.md
│   ├── vendor-incident.md
│   ├── vendor-onboarding.md
│   ├── vendor-exit.md
│   ├── vendor-review.md
│   └── rfp-generate.md
├── skills/                     # 8 specialized skills
│   ├── vendor-assessment/
│   │   ├── SKILL.md
│   │   └── references/
│   │       ├── evaluation-criteria.md
│   │       └── certifications-guide.md
│   ├── it-contract-review/
│   │   ├── SKILL.md
│   │   └── references/
│   ├── risk-compliance/
│   │   ├── SKILL.md
│   │   └── references/
│   │       ├── checklist-gdpr.md
│   │       ├── checklist-nis2.md
│   │       ├── checklist-dora.md
│   │       ├── checklist-ens.md
│   │       ├── checklist-ai-act.md
│   │       ├── checklist-uk-gdpr.md
│   │       ├── checklist-fca.md
│   │       ├── checklist-hipaa.md
│   │       ├── checklist-ccpa-cpra.md
│   │       ├── checklist-nist-csf.md
│   │       ├── checklist-soc2.md
│   │       ├── checklist-pci-dss.md
│   │       ├── checklist-iso-27001.md
│   │       ├── data-classification-matrix.md
│   │       └── audit-evidence-map.md
│   ├── license-management/
│   ├── shadow-it-governance/
│   ├── contract-lifecycle/
│   ├── vendor-onboarding/
│   └── vendor-reporting/
├── connectors/                 # Configurable integrations
│   └── asset-management-mcp/
├── examples/                   # Usage examples
├── CONNECTORS.md               # Connector documentation
├── CHANGELOG.md
├── LICENSE                     # Apache 2.0
├── NOTICE                      # Attribution
└── README.md

The four layers: manifest, commands, skills (with references), and connectors.

Layer 1: The Manifest — plugin.json

The .claude-plugin/plugin.json file is the entry point. It defines metadata, keywords for discovery, and the plugin's identity:

{
  "name": "it-vendor-provision",
  "version": "0.6.0",
  "description": "End-to-end IT vendor software provisioning management:
    initial assessment with segmentation and benchmarking, contract and SLA
    review, vendor onboarding with security hardening and knowledge transfer,
    license management with FinOps and SaaS spend governance, shadow IT
    discovery and governance, multi-jurisdiction regulatory risk analysis
    (EU: GDPR/NIS2/AI Act/DORA/ENS; UK: UK GDPR/DPA 2018/FCA SS2-21;
    US: HIPAA/CCPA-CPRA/NIST CSF/SOC 2; Global: PCI DSS/ISO 27001)...",
  "author": {
    "name": "Ricardo Devis / Bilbao.AI",
    "url": "https://ricardodevis.com",
    "linkedin": "https://linkedin.com/in/devis",
    "company_url": "https://bilbao.ai"
  },
  "keywords": [
    "vendor-management", "compliance", "gdpr", "nis2",
    "dora", "shadow-it", "finops", "iso-27001", "soc2",
    "multi-jurisdiction", "audit-preparation", ...
  ],
  "repository": "https://github.com/ricardodevis/it-vendor-provision",
  "license": "Apache-2.0"
}

Cowork uses this file for three things: registering the plugin in the system, indexing keywords for automatic skill activation, and displaying metadata to the user. The description field is technically a context prompt — the more precise it is, the better the agent understands when and how to use the plugin.

Layer 2: Commands — The User Interface

Commands are Markdown files in /commands/ that define workflows invocable with /name. Each command is an orchestration script that tells the agent which skill to load, what information to gather, and what output to produce.

Real example — commands/vendor-assess.md:

---
name: vendor-assess
description: >
  Evaluates an IT software vendor with scoring across 6 dimensions:
  technical capability, security, compliance, financial viability,
  support, and stability. Generates scorecard with recommendation.
user_instructions: >
  Provide the vendor name and software to evaluate.
  Optionally provide vendor documentation
  (commercial proposal, contract, trust center, etc.)
---

# Command: Evaluate IT Vendor

## Instructions

When executing this command:

1. **Load the skill** `vendor-assessment` by reading
   `${CLAUDE_PLUGIN_ROOT}/skills/vendor-assessment/SKILL.md`

2. **Gather vendor information:**
   - If the user provides documentation, analyze it
   - Search web: trust center, security page, pricing, technical docs
   - Search certifications: SOC 2, ISO 27001, ISO 42001
   - If connected document repository, search for vendor documentation

3. **Execute the evaluation framework:**
   - Phase 1: Quick screening (disqualifying criteria)
   - If passes screening → Phase 2: Detailed evaluation (6 dimensions)
   - Phase 3: Generate scorecard with scores and recommendation

4. **For certifications**, consult
   `${CLAUDE_PLUGIN_ROOT}/skills/vendor-assessment/references/certifications-guide.md`

5. **Generate report** in Markdown with:
   - Executive summary
   - Scorecard (table with dimensions, score 1-5, traffic light)
   - Analysis by dimension
   - Critical risks
   - Recommendation: APPROVE / APPROVE WITH CONDITIONS / REJECT
   - Next steps

6. **Save the report** in the user's folder

This isn't code. It's a structured instruction for the agent. When the user types /vendor-assess Salesforce, Cowork:

  1. Reads this file

  2. Follows the instructions: loads the referenced skill

  3. Gathers vendor information (web, provided documents, connectors)

  4. Runs the evaluation framework with 6-dimension scoring

  5. Generates a structured report with a recommendation

The ${CLAUDE_PLUGIN_ROOT} variable resolves to the plugin's root directory. Commands reference skills and references via relative paths, not inline inclusion. This matters: content loads on demand, not at startup.

Layer 3: Skills — The Domain Knowledge

Skills are the intellectual core of the plugin. Each skill is a directory with a main SKILL.md and a references/ subdirectory for supporting material.

SKILL.md: Encoding Professional Judgment

A SKILL.md isn't documentation. It's a structured knowledge transfer to the agent. It contains the domain philosophy, analysis frameworks, decision criteria, workflows, and the exceptions and edge cases that an experienced professional knows.

Real excerpt from skills/vendor-assessment/SKILL.md:

# IT Vendor Evaluation

## Philosophy

Evaluating an IT vendor isn't an exercise in checking boxes. It's an
operational, regulatory, and strategic risk analysis that determines whether
your organization is going to depend on a third party for a critical function.
Treating vendor approval as a paperwork formality is the surest way to end up
trapped in a contract with a vendor that can't—or won't—deliver.

## Evaluation framework: 6 dimensions

Score the vendor across these 6 dimensions, rating each from 1 to 5:

### 1. Technical Capability
- Software architecture (monolithic vs. microservices, cloud-native, hybrid)
- Technology stack and visible technical debt
- Available APIs, technical documentation, SDKs
- Product roadmap and release cadence
- Integrability with the client's ecosystem

### 2. Security and Certifications
- Current certifications (SOC 2 Type II, ISO 27001, ISO 42001, ISO 27701)
- Recent penetration testing and vulnerability management
- Encryption policy (at rest, in transit)
- Access management and authentication (SSO, MFA, RBAC)
- Business continuity and disaster recovery plan

### 3. Regulatory Compliance
- GDPR: DPA existence, data location, international transfers
- NIS2: applicability and evidence of compliance (art. 21)
- AI Regulation (2024/1689): if software incorporates AI, risk classification
- Sector-specific regulation (financial, healthcare, legal)
...

Notice the tone. It doesn't say "evaluate the vendor across these dimensions." It says "Evaluating an IT vendor isn't an exercise in checking boxes. It's an operational, regulatory, and strategic risk analysis." That's not stylistic decoration. It's agent calibration. The tone of the SKILL.md determines the tone of the output. If the skill reads like a bureaucratic form, the agent produces bureaucracy. If it reads like an analytical professional with judgment, the agent produces analysis with judgment.

This is the single most underappreciated design decision in plugin architecture: the voice of your SKILL.md is the voice of your agent's output. Write it like a senior practitioner explaining things to a competent colleague, and that's exactly what you'll get back.

References: Granular Supporting Material

The references/ subdirectory contains specialized files that the skill loads when it needs depth. This is pure selective loading: the agent doesn't read checklist-hipaa.md if the vendor doesn't process PHI.

Example from the risk-compliance skill — the screening table that determines which regulations apply:

## Step 1: Identify Applicable Regulations

| Question | If "Yes" | Applicable Regulation | Jurisdiction |
|----------|---------|---------------------|--------------|
| Will the vendor process personal data of EU residents? | Evaluate controller/processor | GDPR | EU |
| Will the vendor process personal data of UK residents? | Evaluate UK controller/processor | UK GDPR / DPA 2018 | UK |
| Will the vendor process PHI? | BAA required | HIPAA | US |
| Is the client in EU-regulated financial sector? | Critical ICT third-party | DORA | EU |
| Does the software incorporate AI components? | Classify risk per AI Act | EU AI Act | EU |
| Is the client subject to Spanish ENS? | Evaluate applicable category | ENS | Spain |
...

17 questions. Each affirmative answer triggers one or more specific checklists (which are independent files in references/). The agent doesn't need to know regulation — the regulation is encoded in the structure. What it needs is to follow the decision logic and load the right files.

Layer 4: Connectors — Tool Abstraction

Cowork plugins need to work with any tech stack. The plugin can't assume the user runs ServiceNow, or Jira, or SAP. The solution is the ~~connector notation:

### ~~document repository
**Function**: Document repository where contracts, DPAs, and vendor
documentation are stored.
**Tool examples**: iManage Work, SharePoint, Google Drive, Box, Notion

### ~~asset management tool
**Function**: IT asset management tool (ITAM/CMDB).
**Tool examples**: ServiceNow CMDB, Snipe-IT, Snow Software, Flexera

### ~~identity provider
**Function**: Identity and SSO provider managing user access.
**Tool examples**: Okta, Azure AD (Entra ID), Google Workspace, JumpCloud

### ~~finance tool
**Function**: Financial or procurement tool where invoices and subscriptions
are recorded.
**Tool examples**: SAP, Oracle Financials, NetSuite, Coupa

Inside skills and commands, references to external tools always use the notation. For example, a skill might say "query document repository for the current contract" or "verify in ~~asset management tool the installed license inventory." When the user configures their actual tools, the agent substitutes the generic reference for the real one. The mechanism is conversational — you tell Cowork "my document repository is SharePoint" and the agent maps the reference accordingly.

This design makes the plugin stack-agnostic without losing operational specificity. The 12 connectors in this plugin cover: document repository, CMDB/ITAM, identity provider, finance tool, monitoring, ticketing, security scanner, GRC platform, BI tool, email, calendar, and communication tool.

Connectors can also be implemented as MCP (Model Context Protocol) servers — the connectors/asset-management-mcp/ directory contains a working TypeScript example with an MCP server that exposes the asset inventory to the agent as an invocable tool:

connectors/asset-management-mcp/
├── dist/
│   ├── index.js          # Compiled MCP server
│   ├── index.d.ts        # Type definitions
│   └── mock-data.js      # Test data
├── src/
│   └── index.ts          # TypeScript source
├── package.json
└── tsconfig.json

Design Patterns That Matter

1. Composition Over Monolith

Every piece is independent. A command references one or more skills. A skill references zero or more references. Nothing depends on everything. If tomorrow I need to add a business-continuity-planning skill, I create the directory, write the SKILL.md and its references, and add a command to orchestrate it. I don't touch anything that already exists.

This is the Unix philosophy applied to AI knowledge systems: small, focused components that compose well.

2. Selective Context Loading

An LLM has a finite context window. Loading 17,000 lines into context isn't just inefficient — it degrades output quality. Cowork's architecture loads only what it needs. /vendor-assess loads ~400 lines of skill + ~200 of references. /audit-prep loads a different ~500. They never compete. They never interfere.

If you've worked with RAG systems, you'll recognize this as retrieval with deterministic routing instead of embedding-based similarity. The command tells the agent exactly which files to load. No hallucinated retrievals. No missed context. Predictable, every time.

3. Tone as a Calibration Parameter

I mentioned this above, but it deserves emphasis. The tone of a SKILL.md is not cosmetic. It's an implicit instruction to the model. "Evaluating an IT vendor isn't an exercise in checking boxes" produces qualitatively different outputs than "Complete the following vendor evaluation form." The first generates analysis. The second generates filled-out forms.

Think of it this way: the SKILL.md is not just what the agent knows — it's who the agent becomes when it runs that skill. You're not writing docs. You're writing a persona with expertise.

4. Tool Abstraction

The ~~connector notation is elegant in its simplicity. It requires no API. No code. It's a textual convention that the agent interprets conversationally. And it works because Cowork has access to MCP servers, APIs, and sandbox tools — when the user connects their actual tool, the agent knows where to look.

If you're building a plugin that needs to integrate with external systems, resist the temptation to hardcode tool names. Abstract them. Your plugin will survive the user's next migration from Jira to Linear, from SharePoint to Notion, from SAP to NetSuite.

5. Checklists as Compiled Regulatory Knowledge

Each checklist in references/ is the result of reading the original regulation, extracting the requirements applicable to IT vendors, and encoding them in a format the agent can execute point by point. This isn't a loose interpretation. It's a verifiable compilation: each item traces back to its article in the original statute. The agent doesn't invent compliance — it executes it.

For anyone building in regulated domains (finance, healthcare, government, critical infrastructure): this is the pattern. Don't ask the agent to reason about regulation from training data. Give it the compiled checklist and let it apply it systematically.

Plugin Metrics

To give a sense of the scope:

Component

Count

Total files

71

Skills

8

Commands

11

Connectors

12

Regulatory checklists

15

Jurisdictions

4 (EU, UK, US, Global)

Screening questions

17

Lines of code/content

17,000+

Regulatory frameworks covered

GDPR, NIS2, DORA, EU AI Act, ENS, UK GDPR, FCA SS2/21, HIPAA, CCPA/CPRA, NIST CSF/CMMC, SOC 2, PCI DSS, ISO 27001, ISO 20000, ISO 22301

Lessons for Plugin Builders

If you're thinking about building a Cowork plugin for your domain — legal, financial, medical, engineering, whatever — here's what I wish I'd known from day one:

Start with skills, not commands. The skill is the knowledge. The command is the interface. If the skill is solid, the command writes itself. If the skill is weak, no command can save it.

Encode judgment, not procedure. A step-by-step checklist produces mechanical outputs. A framework with philosophy, evaluation criteria, and edge cases produces analysis. The agent can reason — give it material to reason with, not forms to fill out.

Use references for anything heavy. If a regulatory checklist has 80 items, don't put it in the SKILL.md. Put it in references/ and have the skill load it when needed. Your SKILL.md should contain the decision logic, not the detail of every regulation.

Connectors are contracts, not implementations. Define what you need (a document repository, a CMDB), not what tool it is (SharePoint, ServiceNow). The user will bring their own. Your plugin needs to work with any of them.

The skill's tone is the output's tone. Write like you'd want a senior practitioner in your domain to explain things to a competent colleague. Not bureaucratic. Not casual. Precise, opinionated, and able to tell the difference between what matters and what doesn't.

Test with real cases, not toy examples. This plugin works because I've tested it evaluating real vendors, reviewing real contracts, running real audits. Edge cases don't show up in sample data — they show up in production.

Availability

The it-vendor-provision plugin is published on GitHub under Apache 2.0:

Requires Claude Cowork (Anthropic's desktop agent for Mac and Windows).

Keep reading