Moving from SAE L3 → SAE L4

Before you start: the L3 → L4 readiness test

You’re ready to move up when all three are true:

Your IDE workflows are repeatable (not heroic).
You can name your failure modes (and what catches them).
You already use protocols to plug context + tools in (not just “chat in editor”).

SAE L3 mastery path (E-P-I-A-S) that leads into L4

❶ L3 Explorer → ❷ L3 Practitioner

Goal: Make multi-step runs reliable in the IDE.

What to master:

A stable “run loop”: plan → generate → review → revise
Consistent context injection (rules, constraints, repo references)
Lightweight evals (lint/tests/checklists) as default

Concrete upgrades:

Standard run template (same steps every time)
“Stop & review” gates at predictable points
Use MCP-style tools/resources to make context/tool access explicit rather than implicit

Signal you’ve mastered this step:

“If I give this workflow to another designer in the same repo, they can reproduce the result.”

❷ L3 Practitioner → ❸ L3 Integrator

Goal: Decide what AI runs vs what humans approve—predictably.

What to master:

Clear ownership boundaries (AI executes; human approves)
Failure mode taxonomy (the top 10 ways it breaks)
Escalation triggers (“if X, stop and ask”)

Concrete upgrades:

“Approval gates” in the workflow (diff review, spec check, UI pass)
Documented failure modes + fixes
Stronger evals than checklists: structured checks, deterministic tests, or acceptance criteria

Signal:

“I trust the workflow unless it hits a known exception class.”

❸ L3 Integrator → ❹ L3 Architect

Goal: Build shared workflows others can run—still IDE-centric.

What to master:

Modular context libraries (brand voice, design system rules, constraints)
Reusable Skills/MCP tools
Reusable eval templates and runbooks

Concrete upgrades:

A team “workflow pack” (prompt + context + tool wiring + evals)
Shared MCP servers/tools to standardize tool access and reduce bespoke integrations
Shared workflow agents inside IDE sessions (multi-step orchestration patterns)

Signal:

“Teammates run my workflows and get comparable quality without me coaching live.”

❹ L3 Architect → ❺ L3 Steward

Goal: Standardize IDE-based agentic work safely.

What to master:

Org norms for safety, quality, traceability
Governance for tool access (especially MCP/tooling permissions)
Coaching judgment, not tricks

Concrete upgrades:

“Allowed tools / disallowed tools” policy for agents
Review norms (what must be inspected; what can be trusted)
Security posture awareness for tool-extended agents (tool access increases risk surface)

Signal:

“People trust IDE-agent work here because expectations and review gates are explicit.”

The Transition: L3 → L4 (The Infrastructure Shift)

At L3, the IDE is the workspace.

At L4, the harness becomes the workspace.

What changes at L4:

Work is triggered by events (ticket, PR, schedule), not by you typing in the IDE
The system runs eval → retry → escalate on its own
You focus on harness design, not step-by-step execution

Mental shift:

From “run this workflow with me” → “run this workflow without me, and alert me only when needed”

How to move from L3 → L4 in practical steps

Step 1: Extract your “best L3 workflow” into a runnable spec

Pick one workflow you already trust (e.g., “generate component + tests + docs”).

Turn it into:

Inputs (requirements, constraints)
Steps (the run sequence)
Gates (approval points)
Outputs (artifacts produced)

Why: L4 requires the workflow to exist independently of your presence.

Step 2: Add eval gates that decide “pass / retry / escalate”

This is the heart of L4 harness behavior.

Minimum viable gate set:

Structure gate (did it produce the right artifacts?)
Quality gate (tests/lint/a11y/basic heuristics)
Regression gate (diff sanity, snapshot checks)

This maps to the “agentic coding harness” idea: developer becomes supervisor; system logs, diffs, and rollback traces matter

Step 3: Implement automatic retries with corrective prompts

L4 means the system does the boring part:

If gate fails → apply corrective instruction → retry
If repeated failure → escalate with a crisp report

Key change vs L3:

In L3, you notice and fix.
In L4, the harness notices and fixes until it can’t.

Step 4: Introduce “background execution” + auditability

Your harness should be able to run:

While you’re in meetings
Overnight
As part of CI-like automation

But it must leave:

Logs
Diffs
Decision traces
Rollback plan

This is where “agentic IDEs” stop being enough and “systems” begin. ([SoftwareSeni][7])

Step 5: Make it operable by others (and by the org)

L4 is not personal power—it’s shared infrastructure.

To graduate:

Others can trigger runs
Others can interpret failures
The harness is maintained like a product

Tooling patterns that often show up here:

Workflow orchestration (graphs, pipelines)
Evaluation observability
Shared tool servers (MCP) for consistent tool access

L3 vs L4 “tell”

L3: If you close your laptop, the system stops.
L4: Work completes while you’re away; you only handle exceptions.