How to Choose an AI Development Framework

← All posts Abstract illustration of four wireframe branching paths converging toward a single glowing point

GitHub Spec Kit crossed 90,000 GitHub stars within eight months of its open-source release in September 2025. BMAD-METHOD has built one of the largest communities in the category. GSD, built natively for Claude Code, is described by its practitioners as the most rigorous execution framework available. Yet most development teams using AI coding tools are prompting ad hoc, without a structured workflow of any kind.

That gap is closing, and the decision about how to close it has become a genuine architectural choice. A new category — spec-driven development — has emerged from the failure modes of unstructured AI coding: agents that produce plausible code that drifts from intent, hallucinates APIs, and degrades in quality as projects grow. The four frameworks compared here — GSD, BMAD, OpenSpec, and GitHub Spec Kit — each claim to solve that problem. They do not solve it in the same way, and the differences determine which one is right for a given organisation.

What You Are Really Choosing Between

The question "which framework should we use?" usually conceals a more fundamental one: where in the development process does your biggest quality risk sit? The answer should drive the choice — not feature lists.

Spec-driven development is not a single thing. It is a family of approaches that share one principle but diverge sharply in scope, ceremony, and what they optimise for. GSD is primarily a context management system. BMAD is a full-lifecycle agent orchestration platform. OpenSpec is a change documentation standard. GitHub Spec Kit is a cross-agent workflow normalisation tool. Choosing without understanding that distinction leads to either over-engineering — deploying BMAD to write a microservice — or under-engineering — using Spec Kit for a twelve-month brownfield migration with audit requirements.

The choice also carries commitment. These frameworks shape how specifications are written, how implementation tasks are broken down, and what artefacts are produced. Switching mid-project is expensive. The right time to make this decision is before the first sprint, not after the third.

The productivity evidence supports that caution. Our 59-study review of AI-assisted software development finds larger reported gains as work moves from assistance to specified delegation, but confidence falls and verification becomes a larger part of the operating model.

When to Use Which Framework

The variables that matter most when choosing a spec-driven framework are not feature lists. They are team size, project type, and compliance requirements. The matrix below maps the most common scenarios to a recommended starting point.

Scenario	Team size	Project type	Compliance	Framework
Pre-PMF startup MVP	1–3	Greenfield	Low	OpenSpec or GSD
Funded startup, scaling team	4–15	Greenfield product	Medium	GitHub Spec Kit
Complex enterprise greenfield	10–15	New platform or major product	Medium–High	BMAD
Brownfield modernisation	Any	Legacy refactor	Medium	OpenSpec + selective BMAD
Regulated industry (FinTech, healthcare)	Any	Either	High — SOC 2, HIPAA, EU AI Act	BMAD
Solo developer, fast iteration	1	Anything small	Low	GSD

GSD: Maximum execution rigour for Claude Code environments

GSD — Get Stuff Done — is a specification and execution framework built entirely on Claude Code's native capabilities: slash commands, CLAUDE.md files, hooks, and agent spawning. Its core argument is that AI output quality degrades as context windows fill. GSD's own documentation describes the pattern: at 50 percent context, Claude starts cutting corners; at 70 percent, hallucinations and forgotten requirements become common. The solution is aggressive atomicity.

Each plan is broken into tasks sized to occupy roughly 50 percent of a fresh context window. GSD then spawns isolated subagents for each task, so task 50 in a project gets the same model quality as task one. The workflow spans six commands: project initialisation, phase discussion, phase planning, parallel execution, verification, and milestone archiving. Each produces structured artefacts — PLAN.md, SPEC.md, VERIFY.md — that accumulate into a coherent record of the project.

The constraints are clear. GSD is Claude Code-specific. It does not port to GitHub Copilot, Cursor, or Gemini CLI. Teams running mixed AI coding environments will find it creates a two-tier workflow. The structured artefact generation also adds ceremony that is proportionate on complex multi-phase projects and disproportionate on small feature work.

For enterprise teams standardised on Claude Code who are building substantial applications, GSD offers the most rigorous execution pipeline of any framework in this comparison.

BMAD: Full agile team simulation for greenfield builds

BMAD — Breakthrough Method for Agile AI-Driven Development — is the most architecturally ambitious framework in this space. Where other frameworks augment a single developer's workflow, BMAD simulates an entire agile software team using 12 or more specialised AI agents: Business Analyst, Product Manager, Architect, UX Designer, Scrum Master, Developer, QA Engineer, Technical Writer, and others. Each agent operates from a defined persona with specific responsibilities, handoff protocols, and review gates.

The appeal for enterprise is that BMAD mirrors organisational structures that large development teams already use. Projects managed through it produce the full set of artefacts one would expect from a staffed programme: requirements documents, architecture specifications, test strategies, and release notes — all generated and reviewed through structured agent interactions. For organisations that need a development process AI can participate in rather than replace, BMAD provides the scaffolding.

The cost is real and measurable. In our own internal testing — a single representative CRM-dashboard build run through each framework, not a controlled multi-trial study — BMAD averaged roughly 31,600 tokens per workflow run, with a large project consuming an estimated 230 million tokens across a week of active development. That translates to monthly API costs of $800 to more than $2,000 per developer, with a peak observed spend of $3,200 in one run on Claude Opus. Wall-clock time told the same story: the same dashboard build took 12 minutes under OpenSpec, 90 minutes under Spec Kit, and 5.5 hours under BMAD. Treat the exact figures as directional rather than a statistically powered benchmark — the thoroughness that makes BMAD valuable on complex greenfield builds is the same property that makes it expensive on anything smaller.

BMAD is best suited to new product development where the full lifecycle overhead pays for itself in structured output and where the development team is experienced enough to maintain the workflow under pressure.

OpenSpec: Brownfield change management with an audit trail

OpenSpec occupies a narrower position than the others. Rather than orchestrating the full development lifecycle, it addresses one specific failure mode: changes to existing systems that are poorly documented, making it difficult to understand what changed, why, and what the intended state was before and after. Its proposal-centred workflow uses delta markers — ADDED, MODIFIED, REMOVED — to track precisely what each change introduces relative to the current system state.

The openspec/ directory structure separates stable current-state documentation from active proposals. Each proposal carries its own proposal.md, tasks.md, and delta specifications. Changes require documentation before implementation begins. OpenSpec is also tool-agnostic: it does not require a specific AI coding assistant and imposes less process overhead than BMAD or GSD.

For organisations with formal change approval processes — regulated industries, public sector bodies, or teams where architecture review boards must sign off on technical modifications — that audit trail is not optional. It is what enables the AI coding workflow to exist within governance constraints. OpenSpec is the only framework in this comparison designed with that requirement explicitly in mind.

The limitation is scope. OpenSpec does not address context management, agent orchestration, or quality gates beyond documentation. Teams using it will need a separate execution approach for the implementation work itself — typically Spec Kit or a tool-specific workflow alongside it.

GitHub Spec Kit: Cross-agent standardisation at scale

GitHub's Spec Kit, open-sourced in September 2025, reached more than 90,000 GitHub stars and over 8,000 forks by May 2026, making it the most widely adopted framework in this comparison by that measure. It implements a four-phase workflow — Spec, Plan, Tasks, Implement — through slash commands that work across 29 named AI coding agent integrations, including Claude Code, GitHub Copilot, Cursor, Windsurf, and Gemini CLI, with a generic option covering others.

The positioning is deliberately broad. Spec Kit is not the deepest framework for any single AI tool, but it is the only one that works the same way regardless of which tool a team is using. For organisations running mixed AI coding environments — which describes most large enterprises with varied team preferences or evolving procurement relationships — that portability is the primary differentiator. A standard specification format that travels across tools means the governance and review process does not need to change when a team switches assistants.

A growing ecosystem of 70 or more community extensions adds integrations with Jira, Azure DevOps, and GitHub Issues, along with quality gates for security, testing, and specification drift detection. GitHub's direct involvement provides reasonable confidence in long-term maintenance — a consideration that matters when a framework is being embedded into development standards across a large organisation.

The trade-off is depth. Cross-agent compatibility requires Spec Kit to operate at a level of abstraction that tool-specific frameworks exceed. Teams using Claude Code exclusively will find GSD more rigorous. Teams with complex greenfield builds will find BMAD more structured. Spec Kit earns its position for environments where standardisation across tools and teams matters more than optimisation for any single one.

Five Questions That Determine the Right Framework

Framework features are the wrong starting point. The right starting point is your organisation's actual constraints. These five questions, answered honestly, surface the constraints that matter most.

1. Greenfield or brownfield? Brownfield work tilts toward OpenSpec as the primary framework, with selective BMAD involvement for components being rebuilt from scratch. Greenfield opens up the full range: BMAD for complex new platforms, Spec Kit for teams that need cross-agent portability, GSD for Claude Code environments where execution rigour is the priority.

2. What is your team size and structure? A solo developer or a small pair working at speed is better served by GSD or OpenSpec — both add structure without adding ceremony. A scaling team of four to 15, especially one that has recently hired its first product manager or architect, benefits from Spec Kit's standardised workflow that functions regardless of which AI tool each person uses. Multi-team enterprise environments with explicit role separation are where BMAD's agent personas — Business Analyst, Architect, QA Engineer — map onto real organisational structure rather than simulating one.

3. What are your compliance requirements? Two distinct scenarios produce different answers. For internal governance — architecture review boards, change approval workflows, teams where a technical change must be documented before implementation — OpenSpec's delta specification approach is the right fit; it treats the change record as the primary output. For externally mandated compliance (SOC 2, HIPAA, EU AI Act, or the Colorado AI Act, now delayed to January 2027), BMAD is the stronger choice. Its comprehensive agent-generated documentation — requirements, architecture decisions, test strategy, release notes — provides the kind of structured evidence trail that external auditors expect. OpenSpec documents what changed; BMAD documents why every decision was made and against what requirements.

4. What is your API cost tolerance? BMAD's benchmark costs of $800 to more than $2,000 per developer per month make it a non-starter for many teams. If that range is beyond what the project budget supports, BMAD is off the table regardless of its other merits. OpenSpec and GSD carry the lowest API overhead — they add structure without increasing model call frequency. Spec Kit sits in the middle. Run the numbers before choosing, not after.

5. How tied are you to a specific AI coding tool? GSD is the only framework in this comparison with a hard tool dependency — it requires Claude Code. If your teams use multiple AI coding assistants, or if that is likely to change, Spec Kit's 29 named integrations or BMAD V6's cross-platform compatibility with Claude Code, Cursor and Codex are the practical options. OpenSpec and the other non-GSD frameworks are CLI-agnostic and impose no tooling constraint.

How Organisations Move Between Frameworks

Because these are methodologies and not deeply coupled platforms, migration is pretty painless, in our own personal workflows we often switch. The artefacts — PRDs, specs, stories — port across tools because the underlying data is just Markdown and JSON.

OpenSpec to BMAD. As a brownfield modernisation succeeds and the team starts building new features on top, OpenSpec's delta-only model can feel thin. Carry the archived spec forward as the BMAD Architect agent's input document.
Spec Kit to BMAD. Common when a startup raises a Series A, hires its first PM, and needs explicit role separation. The Spec Kit constitution maps cleanly onto BMAD's master agent prompts.
BMAD to GSD. The reverse direction is real. When a complex BMAD project ships and the team enters maintenance mode, GSD's lean two-prompt loop is often better suited to the bug-fix-and-small-feature cadence that follows.

Frequently asked questions

What is spec-driven development?

A family of approaches that add structure to AI coding to stop agents drifting from intent, hallucinating APIs and degrading as projects grow. The four frameworks compared here — GSD, BMAD, OpenSpec and GitHub Spec Kit — share that goal but differ sharply in scope and ceremony.

How do GSD, BMAD, OpenSpec and Spec Kit differ?

GSD is a context-management and execution system built for Claude Code; BMAD is a full-lifecycle platform that simulates an agile team with 12 or more agents; OpenSpec is a change-documentation standard with an audit trail for brownfield work; and GitHub Spec Kit is a cross-agent workflow standard that works across many AI coding assistants.

Which framework should we choose?

It depends on where your biggest quality risk sits, not on feature lists. Answer five questions: greenfield or brownfield; team size and structure; compliance requirements; API cost tolerance; and how tied you are to a specific AI coding tool.

Why is BMAD more expensive than the others?

In our internal testing (a single representative build, not a statistically powered benchmark), BMAD averaged roughly 31,600 tokens per run and $800 to more than $2,000 per developer per month, and a build that took 12 minutes under OpenSpec took 5.5 hours under BMAD. Its thoroughness on complex greenfield work is the same property that makes it costly on smaller tasks.

To get future posts as they are published, leave your email address.

← All posts