Playbook Protocol (v0.1)

The test-driven, cost-aware standard for declarative workflows

Playbook Protocol is an open, language-agnostic specification for defining, validating, coordinating, and optimizing any multi-step process as a declarative playbook.


Why We’re Here

Modern automation lives everywhere—from cloud APIs and databases to IoT sensors, AI agents, and robotic arms. While powerful, coordinating these systems introduces significant challenges:

Playbook Protocol addresses these gaps by providing a standard way to define, execute, and understand complex workflows.


What Playbook Protocol Offers:

  1. Declarative Playbooks (as DAGs): Define your process as a Directed Acyclic Graph (DAG) of named stages. This structure naturally supports complex dependencies and parallel execution paths. Each stage clearly defines:
    • Expectations (expect:): Testable conditions that must be true for the stage to succeed. Acts as built-in validation.
    • Metrics (metrics:): Track performance, quality, or any quantifiable outcome (e.g., latency, success rate, accuracy).
    • Costs (cost:): Monitor resource usage – API credits, LLM tokens, compute time, estimated dollar amounts, etc.
  2. Inspect & Advance RPCs: A minimal RPC interface (inspect, advance, evaluate) allows diverse engines (code, CI, AI agents, low-code tools) to interact with and drive playbook execution based on the current state and defined logic.
  3. Deep Observability & Provenance: Automatically capture detailed state snapshots, logs, metrics, and costs for every stage transition across all runs. This rich execution history enables:
    • Visualization & Debugging: Understand exactly how a workflow executed, pinpoint failures, and analyze bottlenecks.
    • Programmatic Analysis: Query the state and history of any run or artifact.
    • (Planned) Graph Utilities: Leverage the DAG history for diffing execution branches, finding common ancestors (LCA), and understanding artifact provenance – crucial for managing parallel work and enabling intelligent merging/conflict resolution.
  4. Bulk Simulation & Optimization: Run playbooks at scale on real or synthetic inputs. Evaluate runs against reward-per-cost functions to tune parameters, compare strategies (e.g., different LLM prompts), and optimize for business KPIs.

Core Principles


Example Playbook (YAML)

# Simple example: Invoice Collection Process
id: invoice-collection-v1
description: Records an invoice and attempts payment within a time limit.
variables:
  invoice_id: { type: string, required: true }
  invoice_data: { type: object } # Populated by fetch tool
  payment_status: { type: string, default: 'pending' } # Updated by charge tool
stages:
  - name: recorded
    description: Ensure invoice data is fetched and marked as recorded in our system.
    expect:
      - invoice_data.id == variables.invoice_id
      - invoice_data.system_status == 'recorded'
    tools:
      - fetch-invoice-from-source # Input: invoice_id -> Output: invoice_data
      - mark-invoice-recorded   # Input: invoice_data
    metrics:
      fetch_latency_ms: { type: duration }
    cost:
      fetch_api_call: { type: count, unit_cost: 0.01 } # Cost per fetch

  - name: attempt_payment
    description: Charge the payment method associated with the invoice.
    depends_on: [recorded] # Only run if recorded stage succeeded
    expect:
      # Check variable possibly updated by the tool
      - payment_status == 'paid'
      # Could also check external state if tool updates it
      # - external_payment_gateway.check_status(variables.invoice_id) == 'paid'
    tools:
      - charge_customer_payment # Input: invoice_data -> Output: updates payment_status variable
    metrics:
      time_to_payment_attempt_secs: { type: duration, max: 3600 } # Metric with threshold
    cost:
      payment_api_call: { type: count, unit_cost: 0.02 } # Cost per payment attempt

Use Cases


Get Involved

This project is in its early days (v0.1). We welcome feedback, ideas, and contributions:


Roadmap & Next Steps

  1. v1.0 Spec Finalization: Lock down schemas for expect, metrics, cost, and core RPCs.
  2. Reference Engines: Implementations (e.g., Node, Rust) of inspect/advance/evaluate.
  3. SDK Releases: Clients for TypeScript, Python, Rust, Swift.
  4. CLI & Inspector: playctl commands and a web UI for running, inspecting, visualizing DAG executions, and utilizing (planned) graph analysis utilities (diff, LCA).
  5. Tooling Ecosystem: Integrations with common platforms (CI/CD, Cloud Providers, AI Frameworks).
  6. Industry Playbook Libraries: Curated examples for common use cases.
  7. Bulk Simulation Engine: Support for parallel runs and built-in optimization loops.

© 2025 Playbook Protocol • BSD-3-Clause • playbookprotocol.org | GitHub