Generation Pipeline Framework

Role

The generation pipeline framework runs the artifact-producing phases defined by Report Generation. It owns task ordering, dependency readiness, concurrency limits, and common artifact checks. It does not own evidence extraction, project synthesis, or daily synthesis semantics.

Generation remains artifact-first: every phase invocation consumes the prepared workspace plus durable prerequisite artifacts, writes its own durable outputs, and returns success only after those outputs exist.

Task Model

The framework models phase invocations as task nodes:

Task kind	Scope	Durable outputs
`evidence_extraction`	one `(project_key, session_ref)`	`projects/<project_key>/evidence/<session_ref>.json`
`project_synthesis`	one `project_key`	`projects/<project_key>/project-synthesis.json`
`daily_synthesis`	the prepared workspace	`daily-report.json`
`rendering`	the prepared workspace	`report.md`, `report.notion.json`

This is a real DAG, not only three coarse phase barriers. Project synthesis for one project depends only on that project’s evidence tasks. Daily synthesis depends on all project synthesis tasks.

APIs

TaskSpec records the stable task id, kind, project/session scope, dependencies, expected inputs, and expected outputs. GenerationPlan is the immutable task graph built from the prepared workspace indexes.

Generation workflow APIs take a prepared workspace path. CLI and preparation code own date and reports-root resolution and the mapping to <reports-root>/work/<YYYY-MM-DD>; the generation package only inspects the workspace and its durable artifacts. The reports root is resolved once at the CLI boundary by prompt_diary.config.resolve_reports_root (--reports-root over PROMPT_DIARY_HOME over the stored config over the per-user data directory, the last supplied by prompt_diary.paths.platform_data_dir).

Dependencies normally require successful prerequisite tasks. Project synthesis is the exception: it waits for all evidence extraction attempts in that project to finish, but checks that each expected evidence card exists before starting. A failed extraction can continue into project synthesis only when it wrote a durable evidence card that represents the gap.

PhaseRunner is the narrow phase execution protocol:

async def run(*, workspace_path: Path, task: TaskSpec) -> TaskResult: ...

Each real phase implementation should live in its phase package and implement this protocol. The runner may use Codex, MCP tools, deterministic code, or mocks. The framework calls it only after dependencies are complete.

The three agent phase runners hold an injected AgentSessionFactory but do not own backend lifecycle. Backend ownership lives at the run scope: GenerateWorkspaceWorkflow enters one shared factory once per run (inside asyncio.run), and every agent task mints its own conversation off that shared backend via factory.runner(config). The composition root cmds/generate.py::build_generation_workflow() constructs one CodexAgentSessionFactory, wraps it with the Prompt Diary content-language injector, passes the wrapper to the three agent phase runners, and sets it as the workflow’s agent_factory; the rendering runner is deterministic and takes no agent factory. The wrapper writes the generated workspace AGENTS.md and appends the same rendered language norm to every AgentConfig.developer_instructions before minting a conversation. GeneratePipelineRunner itself is agent-agnostic — it schedules tasks and calls PhaseRunner.run; backend and agent wiring are the workflow’s concern.

A phase runner therefore does not need to be an async context manager to obtain its backend: the shared AgentSessionFactory is entered once at the workflow scope, above the pipeline. The pipeline still enters any phase runner that is an async context manager (once per run), but that mechanism now serves only a runner’s own additional resources, not the agent backend.

GenerateWorkspaceWorkflow is the shared workspace executor for both the full pipeline and one standalone phase task. run_generation_task is the lower-level task API used after declared prerequisites exist, which keeps phase development and debugging independent from the full pipeline.

GeneratePipelineRunner runs a full GenerationPlan. It schedules ready tasks, applies per-kind concurrency limits, marks dependents blocked after failed prerequisites, and validates that a successful task produced its declared outputs.

The scheduler does not retry failed tasks. Codex-backed phase runners own same-process agent retry inside a task through generate/agent_retry.py: they keep the current AgentRunner, re-read durable artifacts after each successful or failed turn, and send a phase-specific resume prompt when the artifact shows more work is needed. The default policy permits three consecutive no-progress attempts with exponential backoff from 1s up to 60s. If that budget is exhausted, the phase returns a failed task with an agent made no progress ... error. Deterministic rendering and non-agent failures remain outside this helper.

A full pipeline run succeeds when terminal deliverables succeed. Non-terminal tolerated failures, such as failed extraction attempts that still wrote durable evidence cards for project synthesis, remain visible on the run result without making the final report command fail.

CLI

report generate runs the full pipeline for a target date, preparing the workspace first when it is missing.

Standalone phase commands require an existing prepared workspace and run one task after checking its declared prerequisites:

report generate evidence --date YYYY-MM-DD --project-key <project_key> --session-ref S0001
report generate project --date YYYY-MM-DD --project-key <project_key>
report generate daily --date YYYY-MM-DD
report generate render --date YYYY-MM-DD
report generate render --date YYYY-MM-DD --notion

The phase commands do not rerun earlier phases or prepare missing workspaces. They are development and repair entrypoints for the phase boundary rule. generate render writes the views from an existing daily-report.json; generate render --notion renders then publishes to Notion.

Evidence Extraction Runner

The evidence extraction phase runner drives one agent conversation per session. It sends the full extractor prompt on the first turn; each subsequent turn carries the prior committed result via the next-turn prompt. Turns are driven in indexed order until the session is complete.

After each turn the runner verifies the result by reading the evidence card from the workspace directly. It never trusts the assistant’s text response. An uncommitted turn — one where the card on disk does not reflect the expected turn — is retried on the same agent conversation until that turn is committed or the no-progress budget is exhausted. The retry counter is scoped to the current assigned turn and resets when the runner advances to the next committed turn.

At the start of every task run the runner deletes any existing evidence card and re-extracts all turns from scratch. This reset means a re-run is always clean and never encounters write_evidence’s duplicate-turn rejection. Within that task run, retries never delete the active partial card. A failed mid-run may leave a partial card on disk; project synthesis treats an incomplete card as an evidence gap, which is outside the scope of this phase.

The runner builds a workspace-aware agent factory once per run. For the Codex backend the factory registers the package MCP server (report mcp serve) with the prepared workspace path in the PROMPT_DIARY_WORKSPACE environment variable. A Codex-spawned stdio MCP server does not inherit the calling thread’s working directory, so the MCP write_evidence tool resolves its workspace from that variable, falling back to cwd. The agent runs non-interactively (approval_mode="auto_review", sandbox="workspace-write") using the system codex binary on PATH.

Project And Daily Agent Retry

Project synthesis uses the same helper with the current uncovered-turn count as its progress marker. A retry continues on the same runner with the current uncovered-turn list; progress means that list strictly shrinks, and completion means every indexed turn is covered. The runner deletes a pre-existing project-synthesis.json only once at task start, never between retry turns.

Daily synthesis still uses one fresh agent conversation per pass: each project summary, report title, engagement assessment, and team-learning pass gets its own runner. A pass retries on that same runner until its target slot is written in daily-report.json or the no-progress budget is exhausted. If a turn fails after writing the slot, the artifact inspection treats the pass as complete.

Progress

The scheduler emits TaskStarted/TaskFinished events and threads a ProgressReporter into each phase runner’s run(...); the evidence runner emits TurnAdvanced per turn. See Progress Reporting.

Boundaries

The framework checks only generic output existence. Phase-local validation belongs to the phase runner before it returns success. For example, evidence extraction should validate evidence card structure, daily synthesis should validate daily-report.json, and the rendering phase should validate the rendered views.

Failed extraction may become a durable evidence card that project synthesis accounts for as a gap. An absent evidence card is a missing prerequisite artifact and prevents the project task from starting. Other failed dependencies block their dependent tasks.

Keyboard shortcuts

Report Generator