Files
boss/docs/message-protocol-and-state-machine.md
2026-03-23 12:43:39 +08:00

420 lines
7.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Boss 消息协议与任务状态机
更新日期2026-03-23
## 设计目标
这份文档定义 Boss 内部最小可用消息模型,确保系统支持:
- 实时对话
- 子任务拆分
- 中途改需求
- 暂停和恢复
- 审批
- 事件回放
## 核心原则
- 所有关键变化都事件化
- 会话状态来自事件流和状态快照
- 用户消息和系统动作统一进入同一条项目时间线
- worker 回传必须结构化,不能只传纯文本
## 会话模型
### 层级
```text
Project Session
-> Conversation Thread
-> Task Tree
-> Worker Assignments
-> Approval Requests
-> Artifacts
```
### 对象定义
#### project_session
```json
{
"id": "ps_001",
"title": "修复登录与缓存问题",
"status": "active",
"active_objective": "先定位根因,再决定是否修改代码",
"created_at": "2026-03-23T12:00:00Z"
}
```
#### message
```json
{
"id": "msg_001",
"session_id": "ps_001",
"role": "user",
"channel": "web",
"content": "先排查登录失败,不要急着改代码",
"created_at": "2026-03-23T12:00:05Z"
}
```
#### task
```json
{
"id": "task_001",
"session_id": "ps_001",
"parent_task_id": null,
"title": "定位登录失败根因",
"kind": "investigation",
"status": "planning",
"priority": "high",
"risk_level": "medium"
}
```
#### worker_assignment
```json
{
"id": "wa_001",
"task_id": "task_002",
"worker_id": "worker_win_a",
"status": "assigned"
}
```
## 事件模型
### 统一事件格式
```json
{
"id": "evt_001",
"session_id": "ps_001",
"task_id": "task_002",
"source": "worker",
"type": "task.step.started",
"timestamp": "2026-03-23T12:10:00Z",
"payload": {
"step": "run_tests",
"summary": "开始运行登录相关测试"
}
}
```
字段说明:
- `source` 可取 `user`, `manager`, `system`, `worker`
- `type` 是事件名
- `payload` 是结构化内容
### 推荐事件类型
#### 会话事件
- `session.created`
- `session.objective.updated`
- `session.message.added`
#### 规划事件
- `plan.created`
- `plan.updated`
- `plan.diff.generated`
#### 任务事件
- `task.created`
- `task.assigned`
- `task.started`
- `task.progress`
- `task.blocked`
- `task.paused`
- `task.resumed`
- `task.cancelled`
- `task.completed`
- `task.failed`
#### worker 事件
- `worker.heartbeat`
- `worker.capabilities.updated`
- `worker.assignment.accepted`
- `worker.assignment.rejected`
#### 执行步骤事件
- `task.step.started`
- `task.step.finished`
- `task.step.failed`
- `tool.call.requested`
- `tool.call.finished`
#### 审批事件
- `approval.requested`
- `approval.approved`
- `approval.rejected`
## 任务状态机
### 主状态机
```mermaid
stateDiagram-v2
[*] --> planning
planning --> queued
queued --> assigned
assigned --> running
running --> blocked
running --> paused
running --> waiting_approval
running --> completed
running --> failed
running --> cancelled
blocked --> running
paused --> running
waiting_approval --> running
waiting_approval --> cancelled
failed --> queued
```
### 状态说明
| 状态 | 含义 |
|---|---|
| `planning` | manager 正在生成任务计划 |
| `queued` | 已创建,等待调度 |
| `assigned` | 已分配给某个 worker |
| `running` | worker 正在执行 |
| `blocked` | 因外部条件缺失而卡住 |
| `paused` | 被用户或系统暂停 |
| `waiting_approval` | 等待用户审批 |
| `completed` | 成功完成 |
| `failed` | 执行失败 |
| `cancelled` | 被取消 |
## 需求变更协议
### 为什么需要单独协议
用户在对话里改需求,不应该等于简单新增一句聊天消息。系统需要知道这条消息是否会:
- 改变当前目标
- 废弃现有子任务
- 新增任务
- 触发审批
### 建议流程
1. 用户消息进入 session
2. manager 判断该消息是否属于 `objective change`
3. 如果是,生成 `plan.diff`
4. Task Service 根据 diff 执行状态迁移
### plan diff 示例
```json
{
"session_id": "ps_001",
"change_reason": "用户要求先修接口重试,不处理缓存",
"cancel_tasks": ["task_cache_001"],
"pause_tasks": ["task_ui_003"],
"continue_tasks": ["task_api_002"],
"create_tasks": [
{
"title": "修复接口重试逻辑",
"kind": "implementation",
"priority": "high"
}
]
}
```
## 审批协议
### 审批请求格式
```json
{
"id": "apr_001",
"session_id": "ps_001",
"task_id": "task_009",
"kind": "dangerous_command",
"summary": "准备执行 rm -rf build-cache",
"risk_level": "high",
"status": "pending"
}
```
### 审批动作
支持:
- `approve`
- `reject`
- `approve_once`
- `approve_for_session`
MVP 建议只做:
- `approve`
- `reject`
## Worker 协议
### worker 注册
```json
{
"worker_id": "worker_mac_001",
"hostname": "mac-studio",
"os": "macos",
"shell": "zsh",
"capabilities": [
"git",
"terminal",
"playwright",
"xcode"
]
}
```
### heartbeat
```json
{
"worker_id": "worker_mac_001",
"status": "idle",
"current_task_id": null,
"load": 0.25,
"timestamp": "2026-03-23T12:15:00Z"
}
```
### 任务执行请求
```json
{
"task_id": "task_102",
"session_id": "ps_001",
"workspace": {
"repo": "git@github.com:org/repo.git",
"branch": "boss/task-102",
"worktree_path": "/workers/worktrees/task-102"
},
"execution_mode": "independent",
"goal": "排查登录失败,输出根因和修复建议",
"constraints": [
"先不要改代码",
"优先跑测试和读日志"
]
}
```
## 进度摘要协议
worker 和 manager 都不应该只回长文本。
推荐维护一份结构化摘要:
```json
{
"task_id": "task_102",
"summary": "已定位到接口重试在 401 时进入死循环",
"progress_percent": 60,
"current_step": "analyze_retry_logic",
"next_step": "补充最小修复方案并跑测试",
"risk": "medium"
}
```
这个结构化摘要可以直接喂给:
- Web 控制台右侧面板
- 聊天入口状态播报
- manager 汇总器
## 协同开发协议
### 模式字段
任务可声明:
- `independent`
- `collaborative`
- `research_only`
### collaborative 模式下增加字段
```json
{
"shared_context_refs": [
"artifact_001",
"task_202.summary"
],
"handoff_expected": true
}
```
用于支持:
- 研究任务把结论交给实现任务
- 后端任务把 API 变更交给前端任务
## UI 实时订阅模型
控制台建议订阅三个流:
- `session_feed`
- `task_feed`
- `worker_feed`
这样可以避免一个超大流承载所有内容。
## 失败处理策略
### 可重试失败
- 网络抖动
- worker 短时离线
- tool call 超时
### 不可重试失败
- 权限不足
- 任务目标冲突
- 用户明确取消
### 推荐策略
- 每个任务记录 `retry_count`
- 重试必须带原因
- 多次失败要自动升级为 `blocked`
## MVP 最小协议集
第一版只需要支持这些事件:
- `session.message.added`
- `plan.created`
- `task.created`
- `task.assigned`
- `task.started`
- `task.progress`
- `task.paused`
- `task.cancelled`
- `task.completed`
- `task.failed`
- `worker.heartbeat`
- `approval.requested`
- `approval.approved`
- `approval.rejected`
## 一句话总结
Boss 的协议核心不是“消息收发”而是“让对话、任务、worker、审批都落在同一个可追踪状态机里”。