Architecture – Intelligence Engine
Overview
Intelligence Engine is a domain-agnostic knowledge graph platform. It ingests structured data (source code, archaeological records, or any domain defined by a YAML schema), builds a typed knowledge graph, and exposes it through hybrid search, a REST API, an MCP server, and a web UI.
The core abstraction is the domain schema – a single YAML file that defines everything the engine needs to handle a new knowledge domain without code changes.
High-Level Pipeline
Pipeline Flow
| Domain Schema | → | Extractors | → | Knowledge Graph | → | Hybrid Search | → | API Layer |
|---|---|---|---|---|---|---|---|---|
| YAML configs | Tree-sitter + Custom | KuzuDB per-domain | BM25 + Semantic + Graph | MCP · REST · Web UI |
Component Details
| Layer | Component | Details |
|---|---|---|
| Domain Schema | YAML config | code.yaml, archaeology.yaml, extensible |
| Extractors | Tree-sitter (code) | Python, JS, TS/TSX, Java, Go, HTML, CSS |
| Custom (non-code) | YAML/JSON readers | |
| Knowledge Graph | KuzuDB | Entity_code, Rel_code_CALLS, per-domain tables |
| Embedding | MiniLM-L6-v2 | 384-dim vectors, LanceDB storage |
| Hybrid Search | 3-way RRF | BM25 (0.35) + Semantic (0.40) + Graph (0.25) |
| API | MCP Server | 15 tools for AI assistants |
| REST | 33 endpoints (FastAPI) | |
| Web UI | React 18 + Sigma.js | |
| AI | LLM providers | Claude, OpenAI, Gemini, Ollama |
| Change Detector | Incremental | git diff + SHA-256 hash fallback |
| Quality | Metrics | Complexity, docs coverage, coupling |
Domain Schema System
Each domain is defined by a YAML file in config/domains/. A schema specifies:
domain: archaeology
version: 1
display_name: "Archaeology"
identity_rules:
uniqueness: per_project
id_pattern: "{project}::{source_ref}::{name}"
entity_types:
- name: find
label: Find
color: "#D4A574"
category: artifact
- name: site
label: Site
color: "#4A90D9"
category: location
# ...
relationship_types:
- name: FOUND_AT
source_types: [find]
target_types: [site]
cardinality: many_to_one
edge_properties:
context: string
depth: double
# ...
entity_properties:
- name: find_category
type: string
required: false
description: "Category of find (axe, jewelry, pottery, etc.)"
# ...
search_profiles:
default:
bm25_fields: [name, entity_type, description, region]
bm25_weights:
name: 3
description: 2
embedding_fields: [name, entity_type, description]
display:
default_label_field: name
detail_fields: [name, entity_type, find_category, region, ...]
snippet_field: description
health_checks:
callable_types: []
dependency_edge: null
skip_types: [external]
extractors:
type: custom # or "tree-sitter"
module: intelligence_engine.extractors.archaeology_extractor
embedding:
model: "all-MiniLM-L6-v2"
When the engine encounters a domain, it:
- Loads the schema YAML
- Creates
Entity_<domain>andRel_<domain>_<type>tables in KuzuDB - Routes indexing to the correct extractor (Tree-sitter or custom)
- Configures search profiles for the domain’s fields
- Sets up display and health check rules
Component Responsibilities
| Component | Responsibility |
|---|---|
src/parser/ |
Tree-sitter AST parsing, 8-language entity extraction |
src/extractors/ |
Custom domain extractors (archaeology, etc.) |
src/domain/ |
Domain schema loading, validation, registry |
src/graph/ |
Knowledge graph (KuzuDB + NetworkX dual-backend), Cypher queries |
src/search/ |
BM25, semantic (LanceDB), graph search, 3-way RRF fusion |
src/llm/ |
LLM provider abstraction (Claude/OpenAI/Gemini/Ollama), entity summarizer |
src/mcp/ |
MCP server with 15 tools |
src/web/ |
FastAPI backend (33 endpoints) + React/Sigma.js frontend |
src/registry.py |
Multi-project registry, cross-project search |
src/change_detector.py |
Incremental indexing: git diff + SHA-256 hash fallback |
src/indexer.py |
Shared pipeline (CLI/MCP/Web), mode=auto/full/incremental |
src/quality.py |
Code quality metrics (complexity, doc coverage, coupling) |
src/storage.py |
Storage mode detection (per-project / shared), path resolution |
src/migrate.py |
Migration tool: per-project <-> shared, verification |
src/ai_overlay.py |
AI data preservation across re-indexing |
src/memory_aggregator.py |
Unified memory records from graph + external sources |
Storage Architecture
Per-Project Mode (default)
Each project gets isolated databases:
data/
myproject/
parse.json # Extracted entities and relationships
kuzu_db/ # KuzuDB graph database
lancedb/ # LanceDB vector store
index_history.json # Performance tracking
Shared Mode (multi-tenant)
All projects share a single database with project column filtering:
data/
_shared/
kuzu_db/ # All projects in one graph
lancedb/ # All embeddings in one vector store
myproject/
parse.json # Still per-project (source data)
index_history.json
Shared mode enables:
- Cross-project Cypher queries (find dependencies across projects)
- Single-query semantic search across all projects
- Global graph analysis (community detection across the full graph)
Switch between modes using python -m intelligence_engine migrate.
Domain-Scoped Tables
KuzuDB tables are scoped by domain to prevent schema conflicts:
Entity_code # 15 entity types, code-specific properties
Rel_code_CALLS # Code relationship tables
Rel_code_IMPORTS
Rel_code_EXTENDS
...
Entity_archaeology # 6 entity types, archaeology-specific properties
Rel_archaeology_FOUND_AT
Rel_archaeology_DATED_TO
...
This allows multiple domains to coexist in the same database without column or type collisions.
Routing Key: (project, domain)
All operations in the engine are routed by the tuple (project, domain):
- Indexing: Selects the correct extractor and target tables
- Search: Scopes results and applies domain-specific search profiles
- API: Every endpoint accepts
projectanddomainparameters - MCP: Tools route through the same key
- Web UI: Domain selector in the project dropdown
This design means a single project can contain entities from multiple domains (e.g., a project with both code and documentation entities).
Adding a New Domain
- Define the schema – Create
config/domains/your-domain.yamlwith entity types, relationships, properties, search profiles, and display config - Write an extractor – Either use
tree-sitter(for code-like domains) or create a custom extractor module that reads your data format and returns entity/relationship dicts - Index – Run the indexer with
--domain your-domainand the engine creates all necessary tables - Use – Search, query, visualize, and analyze through the same interfaces as any other domain
No core engine code needs to change. The schema YAML drives everything.