Knowledge as Code
A development pattern for building knowledge bases that verify themselves, resist decay, and serve both humans and machines from plain text files.
Status: Working title. This pattern emerged from building the AI Tool Watch. We're documenting it as we go and actively looking for prior art. Join the discussion.
The pattern
Knowledge as Code applies software engineering practices to knowledge management. The knowledge lives in version-controlled plain text files. It is validated by automated processes. It produces multiple outputs from a single source. And it actively resists becoming outdated.
This site is an instance of the pattern. The page you're reading was built by it.
Six properties
| Property | What it means | In this project |
|---|---|---|
| Plain text canonical | Knowledge lives in human-readable, version-controlled files. No database, no CMS, no vendor lock-in. | Markdown and YAML files in data/ |
| Self-healing | Automated verification detects when the knowledge has drifted from reality. The system flags decay before humans notice it. | Multi-model cascade cross-checks all data twice weekly, opens GitHub issues for human review |
| Multi-output | One source produces every format needed — human-readable, machine-readable, agent-queryable, search-optimized. | HTML site, JSON API, MCP server, 125 SEO bridge pages, sitemap, llms.txt |
| Zero-dependency | No external packages. The build uses only language built-ins. Nothing breaks when you come back in a year. | One Node.js script, no package.json, no node_modules |
| Git-native | Git is the collaboration layer, the audit trail, the deployment trigger, and the contribution workflow. | Issues, PRs, CI/CD, version history — all through Git |
| Ontology-driven | A vendor-neutral taxonomy of concepts maps to vendor-specific implementations. The structure is the data model. | 18 capabilities map to 72 implementations across 12 products |
Why these choices compound
Any one of these properties is a reasonable design choice. The value is in the combination:
- Plain text + Git means anyone can contribute with no dev environment. Edit a file, open a PR.
- Plain text + zero-dependency build means the project still builds in five years. Nothing to update, nothing to break.
- Ontology + multi-output means one correction fixes the site, the API, the MCP server, and every bridge page at once.
- Self-healing + Git means verification results are tracked as issues with full audit trail. Nothing is silently changed.
- Zero-dependency + self-healing means maintenance cost stays low even as the knowledge grows. The system scales through automation, not staffing.
Standing on shoulders
This pattern didn't appear from nowhere. It draws from established ideas and the people who developed them:
File over app
"If you want to create digital artifacts that last, they must be files you can control, in formats that are easy to retrieve and read." — Steph Ango, 2023
Also: Derek Sivers on plain text permanence. The permacomputing movement on resilient, minimal-dependency software.
Docs as code
The practice of managing documentation with the same tools as software — version control, pull requests, CI, plain text formats. Popularized by the Write the Docs community. Key figures: Tom Preston-Werner (Jekyll, 2008), Eric Holscher (Read the Docs), Anne Gentle (Docs Like Code, 2017), Andrew Etter (Modern Technical Writing, 2016), Riona MacNamara (docs-as-code at Google).
Living documentation
Cyrille Martraire's Living Documentation (2019) argues that documentation should evolve at the same pace as the system it describes. His framework generates docs from code annotations and tests. This project extends the idea: the knowledge isn't derived from code, and verification uses AI models rather than test suites.
GitOps
Coined by Weaveworks (2017). Git as single source of truth, with automated agents that detect drift between declared state and actual state, then reconcile. Originally for infrastructure — but the pattern maps directly to knowledge:
| GitOps (infrastructure) | Knowledge as Code |
|---|---|
| YAML declares desired state | Markdown declares what's true |
| Controller detects drift | AI cascade detects drift from reality |
| Auto-reconciliation or alert | GitHub issues for human review |
| Git as single source of truth | Git as single source of truth |
Anti-entropy
In distributed systems (Dynamo, Cassandra), anti-entropy is the process that detects and repairs divergence from desired state. The scheduled verification cascade is an anti-entropy process for knowledge — it finds where reality has moved away from what the files say and flags the gap.
Multi-model verification
Academic foundations for using multiple AI models as cross-checking judges:
- Zheng et al., "Judging LLM-as-a-Judge" (NeurIPS 2023) — established AI-as-evaluator with identified biases
- Verga et al., "PoLL: Panel of LLM Evaluators" (2024) — multiple smaller models outperform a single large judge
- Du et al., "Multiagent Debate" (2023) — LLM instances debate to reduce hallucinations
- Huang & Zhou, "LLMs Cannot Self-Correct Reasoning Yet" (ICLR 2024) — validates why multi-model beats single-model
Wiki philosophy
Ward Cunningham's original wiki (1995) established the idea of knowledge that evolves incrementally, maintained by a community, with every change tracked. Mike Caulfield's "The Garden and the Stream" (2015) distinguished gardens (networked, iterative knowledge) from streams (chronological feeds). The digital gardens movement extended this into personal knowledge management. This project is a tended garden with AI gardeners.
What we think is new
We haven't found prior art for these specific applications. If you know of any, please tell us:
- "Knowledge as Code" as a named pattern — the "-as-code" lineage (infrastructure, policy, docs, everything) is well-established, but this specific application to maintained knowledge bases doesn't appear to be named
- AI verification cascades for documentation — multi-model evaluation exists in academic literature, but applying it as a scheduled process to maintain a knowledge base's factual accuracy
- Multi-format output from the same plain text — HTML + JSON API + MCP endpoints + SEO bridge pages, all from markdown/YAML, with zero dependencies
- Ontology-driven static site generation — using a formal taxonomy to drive site structure, navigation, and programmatic pages
Try it
The entire project is open source. There is nothing to install.
git clone https://github.com/snapsynapse/ai-tool-watch.git
cd ai-tool-watch
node scripts/build.js
open docs/index.html
To understand the architecture: design/ARCHITECTURE_PATTERNS.md
To understand the data model: design/ONTOLOGY.md
To understand the verification system: VERIFICATION.md
To contribute: CONTRIBUTING.md
This page is itself an instance of the pattern. It lives in a Git repo, gets deployed by CI, and will be verified alongside everything else. If something here is wrong or incomplete, open an issue or start a discussion.