3.4 KiB
Boss Edge Reliability Implementation Plan
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Add the first production reliability shell for Boss task execution without changing the deployment topology.
Architecture: Keep Boss Cloud and the current local-agent, but make local-agent behave like a lightweight Boss Edge by adding a durable outbox and explicit task phases. Cloud-side task APIs keep leases and add watchdog cleanup so APP progress never stays ambiguous forever.
Tech Stack: Next.js API routes, file-backed Boss state, Node local-agent, Codex App Server runner, Node test runner.
Task 1: Task Phase Contract
Files:
-
Modify:
src/lib/boss-data.ts -
Test:
src/lib/boss-data-reliability.test.ts -
Add
MasterAgentTaskPhaseand normalized fields onMasterAgentTask:phase,lastProgressAt,lastErrorCode,recoverable,nextRetryAt. -
Update task normalization so old state files default
queued -> queued,running -> claimed, terminal states preserve terminal phase. -
Update execution progress card generation to derive step status from phase when available.
-
Test that
executor_starting,turn_started,awaiting_reply,completing, andrecoverable_failedmap to visible progress steps.
Task 2: Local Agent Durable Outbox
Files:
-
Create:
local-agent/reliable-outbox.mjs -
Modify:
local-agent/server.mjs -
Test:
local-agent/reliable-outbox.test.mjs -
Implement JSONL-backed outbox with append, list pending, mark sent, and compaction.
-
Wrap
postMasterAgentTaskProgress,completeMasterAgentTask, andpostAppLogso payloads are persisted before network send. -
Replay pending records on startup and every heartbeat loop.
-
Preserve idempotency keys using
taskId + event kind + phase + createdAt.
Task 3: Cloud Watchdog
Files:
-
Modify:
src/lib/boss-data.ts -
Test:
src/lib/boss-data-reliability.test.ts -
Add a lightweight watchdog function invoked during claim, progress, complete, and heartbeat-derived writes.
-
Expire stale user conversation tasks older than 1 hour while still queued.
-
Convert stale running tasks without progress into
recoverable_failedif turn has not started, otherwisetimed_out. -
Ensure late complete cannot overwrite terminal states.
Task 4: Executor Health Grading
Files:
-
Modify:
src/lib/boss-data.ts -
Modify:
local-agent/codex-app-server-runner.mjs -
Test:
src/lib/boss-data-reliability.test.ts -
Derive
codexAppServerHealthasavailable / degraded / unavailablefrom heartbeat metadata and recent errors. -
Allow GUI-preferred task claim only when health is not
unavailable. -
Mark app-server stdio closed and timeout errors as degraded for the next heartbeat.
Task 5: Verification
Files:
-
Modify:
docs/architecture/current_runtime_and_deploy_status_cn.md -
Run
node --test local-agent/reliable-outbox.test.mjs local-agent/master-task-timeout.test.mjs. -
Run
npx eslint src/lib/boss-data.ts local-agent/server.mjs local-agent/codex-app-server-runner.mjs local-agent/reliable-outbox.mjs. -
Run
npm run build. -
Run
npm run lint. -
Document the B+ reliability shell and the local Edge direction in the runtime status doc.