feat: bootstrap boss control plane prototype

This commit is contained in:
Codex
2026-03-23 12:43:39 +08:00
commit 0ab83990b2
24 changed files with 5534 additions and 0 deletions

View File

@@ -0,0 +1,419 @@
# Boss 消息协议与任务状态机
更新日期2026-03-23
## 设计目标
这份文档定义 Boss 内部最小可用消息模型,确保系统支持:
- 实时对话
- 子任务拆分
- 中途改需求
- 暂停和恢复
- 审批
- 事件回放
## 核心原则
- 所有关键变化都事件化
- 会话状态来自事件流和状态快照
- 用户消息和系统动作统一进入同一条项目时间线
- worker 回传必须结构化,不能只传纯文本
## 会话模型
### 层级
```text
Project Session
-> Conversation Thread
-> Task Tree
-> Worker Assignments
-> Approval Requests
-> Artifacts
```
### 对象定义
#### project_session
```json
{
"id": "ps_001",
"title": "修复登录与缓存问题",
"status": "active",
"active_objective": "先定位根因,再决定是否修改代码",
"created_at": "2026-03-23T12:00:00Z"
}
```
#### message
```json
{
"id": "msg_001",
"session_id": "ps_001",
"role": "user",
"channel": "web",
"content": "先排查登录失败,不要急着改代码",
"created_at": "2026-03-23T12:00:05Z"
}
```
#### task
```json
{
"id": "task_001",
"session_id": "ps_001",
"parent_task_id": null,
"title": "定位登录失败根因",
"kind": "investigation",
"status": "planning",
"priority": "high",
"risk_level": "medium"
}
```
#### worker_assignment
```json
{
"id": "wa_001",
"task_id": "task_002",
"worker_id": "worker_win_a",
"status": "assigned"
}
```
## 事件模型
### 统一事件格式
```json
{
"id": "evt_001",
"session_id": "ps_001",
"task_id": "task_002",
"source": "worker",
"type": "task.step.started",
"timestamp": "2026-03-23T12:10:00Z",
"payload": {
"step": "run_tests",
"summary": "开始运行登录相关测试"
}
}
```
字段说明:
- `source` 可取 `user`, `manager`, `system`, `worker`
- `type` 是事件名
- `payload` 是结构化内容
### 推荐事件类型
#### 会话事件
- `session.created`
- `session.objective.updated`
- `session.message.added`
#### 规划事件
- `plan.created`
- `plan.updated`
- `plan.diff.generated`
#### 任务事件
- `task.created`
- `task.assigned`
- `task.started`
- `task.progress`
- `task.blocked`
- `task.paused`
- `task.resumed`
- `task.cancelled`
- `task.completed`
- `task.failed`
#### worker 事件
- `worker.heartbeat`
- `worker.capabilities.updated`
- `worker.assignment.accepted`
- `worker.assignment.rejected`
#### 执行步骤事件
- `task.step.started`
- `task.step.finished`
- `task.step.failed`
- `tool.call.requested`
- `tool.call.finished`
#### 审批事件
- `approval.requested`
- `approval.approved`
- `approval.rejected`
## 任务状态机
### 主状态机
```mermaid
stateDiagram-v2
[*] --> planning
planning --> queued
queued --> assigned
assigned --> running
running --> blocked
running --> paused
running --> waiting_approval
running --> completed
running --> failed
running --> cancelled
blocked --> running
paused --> running
waiting_approval --> running
waiting_approval --> cancelled
failed --> queued
```
### 状态说明
| 状态 | 含义 |
|---|---|
| `planning` | manager 正在生成任务计划 |
| `queued` | 已创建,等待调度 |
| `assigned` | 已分配给某个 worker |
| `running` | worker 正在执行 |
| `blocked` | 因外部条件缺失而卡住 |
| `paused` | 被用户或系统暂停 |
| `waiting_approval` | 等待用户审批 |
| `completed` | 成功完成 |
| `failed` | 执行失败 |
| `cancelled` | 被取消 |
## 需求变更协议
### 为什么需要单独协议
用户在对话里改需求,不应该等于简单新增一句聊天消息。系统需要知道这条消息是否会:
- 改变当前目标
- 废弃现有子任务
- 新增任务
- 触发审批
### 建议流程
1. 用户消息进入 session
2. manager 判断该消息是否属于 `objective change`
3. 如果是,生成 `plan.diff`
4. Task Service 根据 diff 执行状态迁移
### plan diff 示例
```json
{
"session_id": "ps_001",
"change_reason": "用户要求先修接口重试,不处理缓存",
"cancel_tasks": ["task_cache_001"],
"pause_tasks": ["task_ui_003"],
"continue_tasks": ["task_api_002"],
"create_tasks": [
{
"title": "修复接口重试逻辑",
"kind": "implementation",
"priority": "high"
}
]
}
```
## 审批协议
### 审批请求格式
```json
{
"id": "apr_001",
"session_id": "ps_001",
"task_id": "task_009",
"kind": "dangerous_command",
"summary": "准备执行 rm -rf build-cache",
"risk_level": "high",
"status": "pending"
}
```
### 审批动作
支持:
- `approve`
- `reject`
- `approve_once`
- `approve_for_session`
MVP 建议只做:
- `approve`
- `reject`
## Worker 协议
### worker 注册
```json
{
"worker_id": "worker_mac_001",
"hostname": "mac-studio",
"os": "macos",
"shell": "zsh",
"capabilities": [
"git",
"terminal",
"playwright",
"xcode"
]
}
```
### heartbeat
```json
{
"worker_id": "worker_mac_001",
"status": "idle",
"current_task_id": null,
"load": 0.25,
"timestamp": "2026-03-23T12:15:00Z"
}
```
### 任务执行请求
```json
{
"task_id": "task_102",
"session_id": "ps_001",
"workspace": {
"repo": "git@github.com:org/repo.git",
"branch": "boss/task-102",
"worktree_path": "/workers/worktrees/task-102"
},
"execution_mode": "independent",
"goal": "排查登录失败,输出根因和修复建议",
"constraints": [
"先不要改代码",
"优先跑测试和读日志"
]
}
```
## 进度摘要协议
worker 和 manager 都不应该只回长文本。
推荐维护一份结构化摘要:
```json
{
"task_id": "task_102",
"summary": "已定位到接口重试在 401 时进入死循环",
"progress_percent": 60,
"current_step": "analyze_retry_logic",
"next_step": "补充最小修复方案并跑测试",
"risk": "medium"
}
```
这个结构化摘要可以直接喂给:
- Web 控制台右侧面板
- 聊天入口状态播报
- manager 汇总器
## 协同开发协议
### 模式字段
任务可声明:
- `independent`
- `collaborative`
- `research_only`
### collaborative 模式下增加字段
```json
{
"shared_context_refs": [
"artifact_001",
"task_202.summary"
],
"handoff_expected": true
}
```
用于支持:
- 研究任务把结论交给实现任务
- 后端任务把 API 变更交给前端任务
## UI 实时订阅模型
控制台建议订阅三个流:
- `session_feed`
- `task_feed`
- `worker_feed`
这样可以避免一个超大流承载所有内容。
## 失败处理策略
### 可重试失败
- 网络抖动
- worker 短时离线
- tool call 超时
### 不可重试失败
- 权限不足
- 任务目标冲突
- 用户明确取消
### 推荐策略
- 每个任务记录 `retry_count`
- 重试必须带原因
- 多次失败要自动升级为 `blocked`
## MVP 最小协议集
第一版只需要支持这些事件:
- `session.message.added`
- `plan.created`
- `task.created`
- `task.assigned`
- `task.started`
- `task.progress`
- `task.paused`
- `task.cancelled`
- `task.completed`
- `task.failed`
- `worker.heartbeat`
- `approval.requested`
- `approval.approved`
- `approval.rejected`
## 一句话总结
Boss 的协议核心不是“消息收发”而是“让对话、任务、worker、审批都落在同一个可追踪状态机里”。