Files
boss/docs/message-protocol-and-state-machine.md
2026-03-23 12:43:39 +08:00

7.2 KiB
Raw Blame History

Boss 消息协议与任务状态机

更新日期2026-03-23

设计目标

这份文档定义 Boss 内部最小可用消息模型,确保系统支持:

  • 实时对话
  • 子任务拆分
  • 中途改需求
  • 暂停和恢复
  • 审批
  • 事件回放

核心原则

  • 所有关键变化都事件化
  • 会话状态来自事件流和状态快照
  • 用户消息和系统动作统一进入同一条项目时间线
  • worker 回传必须结构化,不能只传纯文本

会话模型

层级

Project Session
  -> Conversation Thread
  -> Task Tree
  -> Worker Assignments
  -> Approval Requests
  -> Artifacts

对象定义

project_session

{
  "id": "ps_001",
  "title": "修复登录与缓存问题",
  "status": "active",
  "active_objective": "先定位根因,再决定是否修改代码",
  "created_at": "2026-03-23T12:00:00Z"
}

message

{
  "id": "msg_001",
  "session_id": "ps_001",
  "role": "user",
  "channel": "web",
  "content": "先排查登录失败,不要急着改代码",
  "created_at": "2026-03-23T12:00:05Z"
}

task

{
  "id": "task_001",
  "session_id": "ps_001",
  "parent_task_id": null,
  "title": "定位登录失败根因",
  "kind": "investigation",
  "status": "planning",
  "priority": "high",
  "risk_level": "medium"
}

worker_assignment

{
  "id": "wa_001",
  "task_id": "task_002",
  "worker_id": "worker_win_a",
  "status": "assigned"
}

事件模型

统一事件格式

{
  "id": "evt_001",
  "session_id": "ps_001",
  "task_id": "task_002",
  "source": "worker",
  "type": "task.step.started",
  "timestamp": "2026-03-23T12:10:00Z",
  "payload": {
    "step": "run_tests",
    "summary": "开始运行登录相关测试"
  }
}

字段说明:

  • source 可取 user, manager, system, worker
  • type 是事件名
  • payload 是结构化内容

推荐事件类型

会话事件

  • session.created
  • session.objective.updated
  • session.message.added

规划事件

  • plan.created
  • plan.updated
  • plan.diff.generated

任务事件

  • task.created
  • task.assigned
  • task.started
  • task.progress
  • task.blocked
  • task.paused
  • task.resumed
  • task.cancelled
  • task.completed
  • task.failed

worker 事件

  • worker.heartbeat
  • worker.capabilities.updated
  • worker.assignment.accepted
  • worker.assignment.rejected

执行步骤事件

  • task.step.started
  • task.step.finished
  • task.step.failed
  • tool.call.requested
  • tool.call.finished

审批事件

  • approval.requested
  • approval.approved
  • approval.rejected

任务状态机

主状态机

stateDiagram-v2
    [*] --> planning
    planning --> queued
    queued --> assigned
    assigned --> running
    running --> blocked
    running --> paused
    running --> waiting_approval
    running --> completed
    running --> failed
    running --> cancelled
    blocked --> running
    paused --> running
    waiting_approval --> running
    waiting_approval --> cancelled
    failed --> queued

状态说明

状态 含义
planning manager 正在生成任务计划
queued 已创建,等待调度
assigned 已分配给某个 worker
running worker 正在执行
blocked 因外部条件缺失而卡住
paused 被用户或系统暂停
waiting_approval 等待用户审批
completed 成功完成
failed 执行失败
cancelled 被取消

需求变更协议

为什么需要单独协议

用户在对话里改需求,不应该等于简单新增一句聊天消息。系统需要知道这条消息是否会:

  • 改变当前目标
  • 废弃现有子任务
  • 新增任务
  • 触发审批

建议流程

  1. 用户消息进入 session
  2. manager 判断该消息是否属于 objective change
  3. 如果是,生成 plan.diff
  4. Task Service 根据 diff 执行状态迁移

plan diff 示例

{
  "session_id": "ps_001",
  "change_reason": "用户要求先修接口重试,不处理缓存",
  "cancel_tasks": ["task_cache_001"],
  "pause_tasks": ["task_ui_003"],
  "continue_tasks": ["task_api_002"],
  "create_tasks": [
    {
      "title": "修复接口重试逻辑",
      "kind": "implementation",
      "priority": "high"
    }
  ]
}

审批协议

审批请求格式

{
  "id": "apr_001",
  "session_id": "ps_001",
  "task_id": "task_009",
  "kind": "dangerous_command",
  "summary": "准备执行 rm -rf build-cache",
  "risk_level": "high",
  "status": "pending"
}

审批动作

支持:

  • approve
  • reject
  • approve_once
  • approve_for_session

MVP 建议只做:

  • approve
  • reject

Worker 协议

worker 注册

{
  "worker_id": "worker_mac_001",
  "hostname": "mac-studio",
  "os": "macos",
  "shell": "zsh",
  "capabilities": [
    "git",
    "terminal",
    "playwright",
    "xcode"
  ]
}

heartbeat

{
  "worker_id": "worker_mac_001",
  "status": "idle",
  "current_task_id": null,
  "load": 0.25,
  "timestamp": "2026-03-23T12:15:00Z"
}

任务执行请求

{
  "task_id": "task_102",
  "session_id": "ps_001",
  "workspace": {
    "repo": "git@github.com:org/repo.git",
    "branch": "boss/task-102",
    "worktree_path": "/workers/worktrees/task-102"
  },
  "execution_mode": "independent",
  "goal": "排查登录失败,输出根因和修复建议",
  "constraints": [
    "先不要改代码",
    "优先跑测试和读日志"
  ]
}

进度摘要协议

worker 和 manager 都不应该只回长文本。

推荐维护一份结构化摘要:

{
  "task_id": "task_102",
  "summary": "已定位到接口重试在 401 时进入死循环",
  "progress_percent": 60,
  "current_step": "analyze_retry_logic",
  "next_step": "补充最小修复方案并跑测试",
  "risk": "medium"
}

这个结构化摘要可以直接喂给:

  • Web 控制台右侧面板
  • 聊天入口状态播报
  • manager 汇总器

协同开发协议

模式字段

任务可声明:

  • independent
  • collaborative
  • research_only

collaborative 模式下增加字段

{
  "shared_context_refs": [
    "artifact_001",
    "task_202.summary"
  ],
  "handoff_expected": true
}

用于支持:

  • 研究任务把结论交给实现任务
  • 后端任务把 API 变更交给前端任务

UI 实时订阅模型

控制台建议订阅三个流:

  • session_feed
  • task_feed
  • worker_feed

这样可以避免一个超大流承载所有内容。

失败处理策略

可重试失败

  • 网络抖动
  • worker 短时离线
  • tool call 超时

不可重试失败

  • 权限不足
  • 任务目标冲突
  • 用户明确取消

推荐策略

  • 每个任务记录 retry_count
  • 重试必须带原因
  • 多次失败要自动升级为 blocked

MVP 最小协议集

第一版只需要支持这些事件:

  • session.message.added
  • plan.created
  • task.created
  • task.assigned
  • task.started
  • task.progress
  • task.paused
  • task.cancelled
  • task.completed
  • task.failed
  • worker.heartbeat
  • approval.requested
  • approval.approved
  • approval.rejected

一句话总结

Boss 的协议核心不是“消息收发”而是“让对话、任务、worker、审批都落在同一个可追踪状态机里”。