AIDE (Agent-Informed Development Engineering) -- A Software Development Methodology for the Agentic Era v1.0¶
Author: CTO (20+ years of architecture experience, 3 years of hands-on AI agent experience) Based on: GPT/Claude/Gemini triple deep research + Team Alpha (Integrationists) 2 reports + Team Beta (Radicals) 1 report Date: 2026-02-18
Part 1: AIDE Core Principles (10)¶
Principle 1: Context Budget Principle -- The Context Budget Is a First-Class Design Constraint¶
"Just as memory determined programming languages, the context window determines architecture."
Background and Rationale¶
This is the only Tier-1 principle on which all three research reports reached complete consensus: - GPT report: "Context budget as a design input" (Core Principle #2, P0 requirement) - Claude report: "The context window is the new CPU" - Gemini report: "Context engineering is the new scarce resource"
Even with a 1-million-token context window, performance does not scale linearly. In Chroma's study measuring 18 LLMs, performance became unstable as input length increased, and the Lost in the Middle phenomenon caused information loss at middle positions. Tool definitions alone consume tens of thousands of tokens, degrading both reasoning quality and cost.
Specific Guidelines¶
| Item | Recommended | Upper Limit | Rationale |
|---|---|---|---|
| File size | 200-300 lines | 500 lines | 300 lines = ~5,400 tokens, safe even combined with system prompt + conversation history |
| Function size | 30 lines (parsers/policies/wrappers) | 50 lines | Fully comprehensible within a single reasoning turn |
| Line length | 100 characters | 120 characters | Diff review convenience |
| Meta files (CLAUDE.md) | 200 lines | 300 lines | Instruction compliance rate decreases linearly as instruction count increases |
| Files loaded per feature modification | 1-2 | 3 | Minimize indirection cost |
Team Alpha/Beta Discussion¶
Both teams reached complete consensus on this principle. The only difference was in implementation intensity: - Team Alpha: "This is an extension of existing SRP and cohesion concepts. The quantitative basis of token cost has simply been added." - Team Beta: "This should be the starting point for all architectural decisions. Maximum 3 files loaded per feature modification, maximum 1 level of indirection."
CTO Judgment: Team Beta's specific metrics (file load count, indirection depth) are adopted as recommended guidelines, but not enforced as upper limits. This is because infrastructure-layer DIP implementations may require up to 2 levels of indirection in some cases.
Principle 2: Locality of Behavior -- Locality of Behavior Takes Priority Over Abstraction¶
"All code related to a single feature should be physically co-located."
Background and Rationale¶
The traditional layered structure where an agent must navigate 8 files (Controller, Service, Repository, Entity, DTO, Mapper, Interface, Validator) to modify a single feature causes context fragmentation. Factory.ai's research shows that AI agents experience dramatic performance degradation in multi-hop reasoning (reasoning that follows references across multiple files).
This principle does not abolish "Separation of Concerns." It changes the axis of separation: - Traditional: Separation by technical role (presentation / business / data) - AIDE: Separation by feature/domain (user-auth / payment / order)
Specific Guidelines¶
// AIDE Pattern: Feature-Based Structure
features/
user-auth/
types.ts -- Feature-specific type/schema definitions
logic.ts -- Pure function business logic
handler.ts -- HTTP/event handlers (side-effect boundary)
store.ts -- Data store access (side-effect boundary)
user-auth.test.ts -- All tests for this feature
AGENTS.md -- Domain context for agents (Tier 2)
- Each Feature directory is self-contained: an agent can understand the entire feature by reading only that folder
- Code shared across Features goes in
shared/, kept to a minimum - Logical layers within a Feature (pure logic / side-effect boundaries) are separated at the file level
Team Alpha/Beta Discussion¶
- Team Alpha: "Feature-based structure is compatible with Clean Architecture's Vertical Slice Architecture. Keep the dependency rule but reduce the physical layers."
- Team Beta: "A fat file is better than beautiful abstraction. The moment you separate interface from implementation, the agent must load two files."
CTO Judgment: Feature-based structure is adopted as the default. As Team Alpha pointed out, this does not conflict with Clean Architecture and is a natural extension of Vertical Slice Architecture. However, the file separation of types/logic/handler/store within a Feature is maintained -- three 100-line files with distinct roles are clearer for agents than a single 300-line file containing everything. The key is to minimize indirection crossing Feature boundaries.
Principle 3: Functional Core, Structural Shell -- Pure Function Core + Structural Shell¶
"Business logic in pure functions, side effects handled at explicit boundaries."
Background and Rationale¶
All three reports agreed that the functional paradigm provides structural advantages for AI agents: - Pure functions always produce the same output for the same input, enabling agents to reason perfectly from a single function block - Immutable data allows understanding data flow as a chain without tracking state - Strong type systems serve as guardrails that catch agent hallucinations at compile time
// [1] Data: Defined as immutable structs
type User = Readonly<{
id: string
email: string
name: string
role: 'admin' | 'member' | 'viewer'
}>
// [2] Pure logic: Input -> Output, no side effects
const promote_user_to_admin = (user: User): User => ({
...user,
role: 'admin'
})
// [3] Side-effect boundary: Dependency injection, explicit error handling
const handle_promote_user = async (
userId: string,
deps: { db: Database; logger: Logger }
): Promise<Result<User, Error>> => {
const user = await deps.db.findUser(userId)
if (!user) return err(new UserNotFoundError(userId))
const promoted = promote_user_to_admin(user)
await deps.db.saveUser(promoted)
deps.logger.info({ event: 'user_promoted', userId })
return ok(promoted)
}
Specific Guidelines¶
| Area | Recommended Paradigm | Class Usage |
|---|---|---|
| Business logic | Pure functions | Prohibited |
| Domain model | Immutable data structures + types | Replaced with immutable Record/Type |
| Infrastructure/IO layer | Functions first, classes when necessary | Allowed (DB connections, sockets, resource management) |
| Policies/Validation | Functional pipelines | Prohibited |
| Domain boundary definition | DDD Bounded Context (type + function composition) | Not needed |
Team Alpha/Beta Discussion¶
This principle was the point of most heated debate between the two teams:
- Team Alpha: "Functional Core + OOP Shell + DDD. DDD's Aggregate, Entity, and Value Object are still valid for structuring domain knowledge. Implement them as immutable, but keep the OOP concepts."
- Team Beta: "FP-only. Classes hide state and increase agent cognitive load. Do not use classes for business logic."
CTO Judgment: The practical difference is smaller than it seems. Both teams agree that "business logic should be pure functions, data should be immutable." The difference lies in whether DDD concepts like Aggregate Root are expressed as classes or as type+function compositions. AIDE recommends the type+function composition approach. DDD's domain modeling concepts (Bounded Context, Aggregate, Value Object) are preserved, but the implementation is shifted to immutable types + pure functions. This satisfies both Alpha's DDD values and Beta's FP values.
Inheritance is limited to a maximum of 1 level, and deep inheritance trees are not allowed under any circumstances. Composition over Inheritance applies even more strongly in the AI era.
Principle 4: Knowledge DRY, Code WET-tolerant -- Knowledge Is DRY, Code Trades Off with Locality¶
"Business rules must live in exactly one place. Duplication of utility code is tolerated for the sake of locality."
Background and Rationale¶
The reinterpretation of the DRY principle showed the widest spectrum of opinions across the three reports: - GPT: Manage duplication through structural solutions (cataloging) - Claude: "DRY is not dead but transformed" -- Apply AHA (Avoid Hasty Abstractions) principle - Gemini: Actively embrace WET/DAMP, "5 lines of logic repeated in 10 places is OK"
The self-contradiction discovered by the Claude report is the key insight: "Allow duplication -> AI generates more code -> Context window exceeded -> DRY is needed after all." Unlimited duplication tolerance is self-defeating.
Specific Guidelines¶
| Level | Strategy | Example | Duplication Tolerance |
|---|---|---|---|
| Business rules | Strict DRY | "Discount rate calculation formula," "pricing policy" | 0 (must have a single source of truth) |
| Domain types | Allow re-declaration at Feature boundaries | Feature-local subset of shared User type | Reference via interface or partial re-declaration |
| Utility code | AHA principle | Email validation, date formatting | 2-3 duplications allowed; review extraction at 4+ |
| Boilerplate | Structured duplication allowed | try-catch patterns, logging patterns | Unlimited (serves as pattern anchors) |
Duplication Management Framework: - Conscious Duplication: When duplicating, state the reason in a comment - Drift Detection: Automate agent-based duplicate code drift detection in CI - Periodic Review: Verify consistency of duplicate code on a quarterly basis
Team Alpha/Beta Discussion¶
- Team Alpha: "Knowledge DRY + Code AHA. Duplication is consciously allowed, but visible management is a prerequisite. Gemini's '5-line duplication in 10 places is OK' is extreme."
- Team Beta: "Aggressively WET/DAMP. Self-containment of each file is the top priority. Sharing through abstraction carries indirection costs that should be minimized."
CTO Judgment: Team Alpha's "Knowledge DRY + Code AHA" is adopted. Key arguments: 1. Beta effectively acknowledges that unlimited code duplication eventually exceeds the context window, creating a self-contradiction 2. However, as Alpha also acknowledges, excessive abstraction (extracting every 3-line utility into a shared module) creates harmful indirection for agents 3. Therefore, "Business knowledge is DRY, utility code allows conscious duplication under AHA guidelines" is the balance point
Principle 5: Test as Specification -- Tests Are a Specification Language¶
"Tests are not verification tools but specification documents that communicate intent to agents. Apply the triple framework of TDG + PBT + EDD."
Background and Rationale¶
Key insight from the Claude report: "TDD becomes more important in the AI era. Tests become prompt engineering." Academic validation by Matthews & Nagappan confirmed that presenting problems alongside tests improves code generation quality for both GPT-4 and Llama 3.
The revolutionary effect of Property-Based Testing (PBT) (Claude report): - 23.1-37.3% relative improvement over TDD - On Hard tasks: direct code generation 1.1% accuracy vs. property-based verification 48.9% accuracy - LLMs are far better at defining correctness properties than generating correct code
Specific Guidelines¶
Triple Test Framework:
+------------------+
| Human Review | Architecture, security, domain knowledge
+------------------+
+--------------------+
| Eval Suites (EDD) | Scenario/dataset-based behavioral evaluation
+--------------------+
+------------------------+
| Integration Tests | Integration verification of AI-generated code
+------------------------+
+----------------------------+
| Property-Based Tests (PBT) | Invariant property verification
+----------------------------+
+--------------------------------+
| Unit Tests (TDD) | Deterministic code: parsers, policies, tool wrappers
+--------------------------------+
| Test Type | Target | Tools | Author |
|---|---|---|---|
| Unit (TDD) | Deterministic code -- parsers, policies, state transitions, tool wrappers | Jest/Vitest/pytest | Human spec -> AI implementation |
| PBT | Business invariant properties -- "total is always >= 0," "order preserved after sort" | fast-check/Hypothesis | Humans define properties, AI generates |
| Integration | Integration scenarios of AI-generated code -- cross-Feature coordination, data flow verification | Custom test framework | AI-generated, human-reviewed |
| Eval (EDD) | Model output quality -- accuracy, safety, usefulness | Custom eval framework | Human-designed + production failure incorporation |
| Security | Security vulnerabilities in AI-generated code (XSS, SQL Injection, logic errors) | OWASP-based scenarios + Security linters | Security team designs, automated execution |
Confirmation Bias Prevention Is Mandatory: When AI writes both tests and implementation, there is a risk of creating "tests that verify bugs." Use different models for test writing and code writing, or have humans review the test specifications.
Team Alpha/Beta Discussion¶
- Team Alpha: "TDG (Test-Driven Generation) + PBT + EDD extension. Don't discard TDD; extend it for the AI era."
- Team Beta: "Dual framework -- Traditional TDD for deterministic code, EDD for probabilistic behavior. Actively adopt PBT."
CTO Judgment: These are practically identical proposals. The test strategies from both teams are merged into the triple framework above. The only difference was in naming.
Principle 6: Progressive Disclosure -- Progressive Disclosure of Information¶
"Do not give agents all information at once. Provide only what is needed, when it is needed."
Background and Rationale¶
GPT report's progressive skill loading, Claude report's 3-Tier Progressive Disclosure, and Gemini report's dynamic information loading all express the same principle: Like virtual memory in an operating system, do not load everything into physical memory; load it when needed.
Specific Guidelines¶
Meta File 3-Tier System:
| Tier | File | Role | Size Limit | Loading Method |
|---|---|---|---|---|
| Tier 1: Constitution | CLAUDE.md / AGENTS.md (root) |
Project identity, absolute rules, architecture map | 300 lines max | Always loaded |
| Tier 2: Local Laws | AGENTS.md in subdirectories |
Component-specific patterns, domain context | 200 lines max | Lazy loaded when working in that directory |
| Tier 3: Technical Manuals | .agents/skills/*/SKILL.md |
Procedural knowledge, workflow guides | YAML frontmatter + body | On-demand loading |
Progressive Provision of Dependency/Library Information:
When conveying information about external libraries and internal shared modules used in the project to agents, a progressive approach is also needed:
| Level | Information Provided | Purpose |
|---|---|---|
| Summary | Library name + version + one-line purpose description | Agent grasps the overall technology stack |
| API Signatures | Only signatures of functions/types in use | Agent writes code that integrates with the library |
| Detailed Documentation | Example code, configuration methods, caveats | Agent builds new integrations or troubleshoots |
The key is to "never load the full documentation of every library into the context." Providing only the needed depth of information at the needed time allows efficient use of the context budget.
Team Alpha/Beta Discussion¶
Both teams reached complete consensus. Implementation details were also nearly identical. The 3-Tier meta file system and the progressive information provision principle were combined to establish the system above.
Principle 7: Deterministic Guardrails -- Deterministic Guardrails for Probabilistic Generation¶
"Trust the AI agent, but verify. And verification must be deterministic."
Background and Rationale¶
Security status of AI-generated code (Claude report, Veracode 2025): - Approximately 45% of generated code contains security flaws - Logic error rate 1.75x that of humans - XSS vulnerabilities 2.74x - Independent of model size -- smarter models do not produce safer code
This data clearly demonstrates that "prompting agents to 'do well'" is insufficient. Deterministic tools must verify agent output.
Specific Guidelines¶
Probabilistic Generation (AI) --> Deterministic Verification --> Pass/Fail
|
+-- TypeScript strict mode (type verification)
+-- ESLint/Prettier (style enforcement)
+-- Zod/io-ts (runtime type verification)
+-- Pre-commit hooks (automatic execution)
+-- Security linters (security verification)
+-- CI test suite (regression prevention)
Absolute Rule: "Never send an LLM to do a linter's job" (Claude report). Style enforcement, type verification, and security pattern detection are all delegated to deterministic tools.
Self-Healing Loop (Gemini report's Reflexion Pattern):
For this loop to work effectively, error messages must be provided in a machine-readable structured format (JSON).
Team Alpha/Beta Discussion¶
Both teams reached complete consensus. Team Beta emphasized this principle most strongly, presenting the intuitive expression "trust me but verify," and Team Alpha agreed.
Principle 8: Observability as Structure -- Observability Is Part of the Structure¶
"AI-generated code must include structured logging and tracing by default. Observability is a first-class citizen."
Background and Rationale¶
All three reports reached complete consensus: If you cannot trace "why this behaves this way" for code that AI generates rapidly, operations and debugging become impossible. AI agents must structurally embed observability when generating code.
- GPT report: Include Observability as a cross-cutting concern in architecture
- Claude report: Tracing ON by default, traces mandatory from development stage
- Gemini report: Adopt semantic logging (JSON-LD) standard
Specific Guidelines¶
// Structured log format -- Must be included in all code generated by AI
type StructuredLog = {
level: 'info' | 'warning' | 'error' | 'critical'
timestamp: string // ISO 8601
service: string // Service/Feature identifier
event: string // Business event name
trace_id: string // Distributed tracing ID
span_id: string // Current work unit ID
data: Record<string, unknown> // Structured supplementary data
}
// Usage example: E-commerce payment processing
const handle_payment = async (
order_id: string,
deps: { db: Database; pg: PaymentGateway; logger: Logger }
): Promise<Result<PaymentResult, Error>> => {
deps.logger.info({
event: 'payment_initiated',
data: { order_id }
})
const result = await deps.pg.charge(order_id)
if (result.success) {
deps.logger.info({
event: 'payment_completed',
data: { order_id, transaction_id: result.transaction_id }
})
} else {
deps.logger.error({
event: 'payment_failed',
data: { order_id, reason: result.error }
})
}
return result
}
Key Guidelines:
- Distributed tracing by default: Track request flow with trace_id -> span_id. Leverage standards such as OpenTelemetry
- ON by default from the development stage: Activate structured logging not only in production but also in local development
- Cost/performance metrics: Track API response time, DB query count, and external API call count in real time
- Mandate observability in AI-generated code: When requesting code from agents, specify in CLAUDE.md: "Include structured logging in all handlers"
Team Alpha/Beta Discussion¶
Both teams reached complete consensus. There was no disagreement that observability is fundamental to software operations and becomes even more critical in AI-generated code.
Principle 9: Security by Structure -- Structural Security Verification¶
"45% of AI-generated code has security flaws. Security verification must be structurally embedded."
Background and Rationale¶
As AI has become the primary producer of code, the nature of security threats has changed. Security vulnerabilities in AI-generated code itself are the core threat:
- Veracode 2025: Approximately 45% of AI-generated code contains security flaws
- XSS vulnerabilities 2.74x humans, logic errors 1.75x
- Model size does not correlate with security quality
Specific Guidelines¶
Threat-Control Mapping:
| Threat | Representative Scenario | Defense Point | Recommended Control |
|---|---|---|---|
| SQL Injection | AI generates string concatenation instead of parameterized queries | Security linter + Code review | Linter rules to detect raw query usage, enforce ORM/Query Builder |
| XSS | AI omits user input escaping | Security linter + Template engine | Enforce auto-escaping frameworks, DOMPurify, etc. |
| Logic errors | Missing authorization checks, unhandled boundary conditions | PBT + Integration test | Verify invariant properties with Property-Based Testing |
| Auth/AuthZ flaws | AI omits authentication middleware | Architecture enforcement | Apply authentication middleware by default at router level, allow explicit opt-out only |
| Dependency vulnerabilities | AI adds packages with vulnerable versions | SCA (Software Composition Analysis) | Automated scanning with npm audit, Snyk, etc. |
Three Security Principles: 1. Automated security verification: Automatically run security linters in CI for all AI-generated code 2. Mandatory security review: Code changes involving authentication, payments, and personal data must undergo security review 3. Audit trail: All sensitive data access and state changes are recorded in structured logs
Team Alpha/Beta Discussion¶
Both teams reached complete consensus. Security is an area with no room for compromise.
Principle 10: Meta-Code as First-Class -- Meta-Code as a First-Class Citizen¶
"AGENTS.md, CLAUDE.md, and Skills files are version-controlled and tested with the same rigor as source code."
Background and Rationale¶
- AGENTS.md is used in 60,000+ open-source projects (managed by the Agentic AI Foundation under the Linux Foundation)
- Research shows that practices failing to preserve prompts/context weaken reproducibility
- A single-line change in a meta file can alter the agent's entire behavior, meaning it can have higher impact than code
Specific Guidelines¶
Meta-Code Management Principles: 1. Version control: Same workflow as code in Git -- PR, code review, changelogs, release tags 2. Run evals on change: Meta file changes automatically trigger eval suite execution in CI (behavioral regression detection) 3. Size monitoring: CI warns/blocks when Tier 1 files exceed 300 lines 4. Use negative instructions: "Do not do X" is often clearer and easier to detect violations 5. Example-based instructions: Concrete code examples dramatically improve agent output quality over abstract principles
Lock Down Full Configuration with manifest.yaml:
# manifest.yaml
spec_version: "1.0"
project_name: "my-ecommerce"
project_type: "backend"
tech_stack:
language: "typescript"
runtime: "node"
framework: "express"
database: "postgresql"
cache: "redis"
ai_development:
primary_model: "claude-opus-4-6"
instruction_files:
tier1: ["CLAUDE.md", "AGENTS.md"]
tier2_pattern: "src/features/*/AGENTS.md"
code_standards:
max_file_lines: 300
max_function_lines: 50
paradigm: "functional-core"
type_strictness: "strict"
testing:
unit: "vitest"
property: "fast-check"
e2e: "playwright"
observability:
logging: "structured_json"
tracing: true
Team Alpha/Beta Discussion¶
Both teams reached complete consensus. Both teams accepted Gemini's "Meta-Control Plane" concept.
← Previous: 00-PREFACE | Next: 02-ARCHITECTURE-PATTERNS →