Project Context – Intelligence Engine
What It Does
Intelligence Engine is a domain-agnostic platform for building, searching, and visualizing knowledge graphs from structured data. It started as a code intelligence tool – parsing source code into typed knowledge graphs – and evolved into a general-purpose engine where any domain can be defined through a YAML schema.
In concrete terms:
- Define a domain via a YAML schema (entity types, relationships, properties, search profiles)
- Index data using Tree-sitter (for code) or custom extractors (for any other format)
- Query through hybrid search (keyword + semantic + graph), Cypher queries, or the REST API
- Explore via an interactive web UI with force-directed graph visualization
- Integrate with AI assistants through 15 MCP tools
- Analyze with AI-powered summaries, Q&A, code quality metrics, and community detection
Everything runs locally. No data leaves the machine.
Why It Exists
Managing a portfolio of 100+ projects creates a specific set of problems:
- “What calls what?” – Understanding call chains, dependency flow, and the blast radius of a change across a large codebase
- “Where is this pattern used?” – Finding implementations across dozens of projects that share conventions but aren’t formally linked
- “What’s the quality landscape?” – Measuring complexity, documentation coverage, and coupling across the entire portfolio
- “What connects to what?” – Understanding structural relationships that grep and text search can’t reveal
Existing tools (IDE search, grep, static analyzers) work well within a single project but don’t scale to portfolio-level analysis. Intelligence Engine fills that gap by building a unified knowledge graph that spans all indexed projects.
The domain generalization step (v0.21.0) extended this beyond code. The same infrastructure now handles archaeological data – and any future domain that can be described by entities, relationships, and properties.
Current Status
Version: 0.21.0 Tests: 1261+ passing Maturity: Production-ready for personal use
What Works
- Full pipeline from parsing through search, visualization, and AI integration
- 8 programming languages parsed via Tree-sitter
- 1 non-code domain (archaeology) validated via custom extractor
- Shared database mode supporting 100+ projects with cross-project queries
- 15 MCP tools for AI assistant integration
- 33 REST endpoints
- React web UI with graph explorer, Cypher console, and 6-tab dashboard
- AI summaries and Q&A via 4 LLM providers
- Incremental indexing with git diff detection
- AI data preservation across re-indexing and migration
Design Philosophy
- MVP-first: Built in 12 phases, each extending the previous
- Schema-driven: Domain knowledge lives in YAML, not code
- Fail loudly: No silent failures – crashes are preferable to silent data corruption
- Local-only: No cloud dependencies for core functionality (LLM providers are optional)
- Extensible: New domains require only a YAML schema and an extractor
The Domain Generalization Story
Intelligence Engine started as a code-only tool (v0.1 through v0.20). The entire architecture was built around code concepts: functions, classes, methods, modules, CALLS, IMPORTS, EXTENDS.
At v0.21.0, the architecture was generalized:
- Hardcoded entity types became schema-defined entity types from YAML
- Single Entity table became domain-scoped tables (
Entity_code,Entity_archaeology) - Tree-sitter-only parsing became pluggable extractors (Tree-sitter or custom modules)
- Code-specific search fields became configurable search profiles per domain
- Code-specific health checks became domain-aware health checks with skip types and metric selection
The archaeology domain was chosen as the validation case because it’s maximally different from code – no AST, no Tree-sitter, completely different entity semantics. Its successful implementation confirms that the domain abstraction works.
Technology Choices
| Decision | Choice | Reason |
|---|---|---|
| Graph DB | KuzuDB | Embedded, Cypher support, excellent Python bindings |
| Vector DB | LanceDB | Embedded, fast, incremental operations |
| Embeddings | all-MiniLM-L6-v2 | 384-dim, CPU-friendly, good quality/size ratio |
| Parser | Tree-sitter | Native Python bindings, multi-language, fast |
| MCP | FastMCP | Python-native, simple tool definition |
| Web backend | FastAPI | Async, fast, good typing support |
| Web frontend | React + Sigma.js | Component model + purpose-built graph visualization |
| Search fusion | RRF | Simple, robust, no training data needed |
See decisions.md for the full architectural decision log.