Architecture – Intelligence Engine

Overview

Intelligence Engine is a domain-agnostic knowledge graph platform. It ingests structured data (source code, archaeological records, or any domain defined by a YAML schema), builds a typed knowledge graph, and exposes it through hybrid search, a REST API, an MCP server, and a web UI.

The core abstraction is the domain schema – a single YAML file that defines everything the engine needs to handle a new knowledge domain without code changes.

High-Level Pipeline

Pipeline Flow

Domain Schema	→	Extractors	→	Knowledge Graph	→	Hybrid Search	→	API Layer
YAML configs		Tree-sitter + Custom		KuzuDB per-domain		BM25 + Semantic + Graph		MCP · REST · Web UI

Component Details

Layer	Component	Details
Domain Schema	YAML config	`code.yaml`, `archaeology.yaml`, extensible
Extractors	Tree-sitter (code)	Python, JS, TS/TSX, Java, Go, HTML, CSS
	Custom (non-code)	YAML/JSON readers
Knowledge Graph	KuzuDB	`Entity_code`, `Rel_code_CALLS`, per-domain tables
Embedding	MiniLM-L6-v2	384-dim vectors, LanceDB storage
Hybrid Search	3-way RRF	BM25 (0.35) + Semantic (0.40) + Graph (0.25)
API	MCP Server	15 tools for AI assistants
	REST	33 endpoints (FastAPI)
	Web UI	React 18 + Sigma.js
AI	LLM providers	Claude, OpenAI, Gemini, Ollama
Change Detector	Incremental	git diff + SHA-256 hash fallback
Quality	Metrics	Complexity, docs coverage, coupling

Domain Schema System

Each domain is defined by a YAML file in config/domains/. A schema specifies:

domain: archaeology
version: 1
display_name: "Archaeology"

identity_rules:
  uniqueness: per_project
  id_pattern: "{project}::{source_ref}::{name}"

entity_types:
  - name: find
    label: Find
    color: "#D4A574"
    category: artifact
  - name: site
    label: Site
    color: "#4A90D9"
    category: location
  # ...

relationship_types:
  - name: FOUND_AT
    source_types: [find]
    target_types: [site]
    cardinality: many_to_one
    edge_properties:
      context: string
      depth: double
  # ...

entity_properties:
  - name: find_category
    type: string
    required: false
    description: "Category of find (axe, jewelry, pottery, etc.)"
  # ...

search_profiles:
  default:
    bm25_fields: [name, entity_type, description, region]
    bm25_weights:
      name: 3
      description: 2
    embedding_fields: [name, entity_type, description]

display:
  default_label_field: name
  detail_fields: [name, entity_type, find_category, region, ...]
  snippet_field: description

health_checks:
  callable_types: []
  dependency_edge: null
  skip_types: [external]

extractors:
  type: custom                                          # or "tree-sitter"
  module: intelligence_engine.extractors.archaeology_extractor

embedding:
  model: "all-MiniLM-L6-v2"

When the engine encounters a domain, it:

Loads the schema YAML
Creates Entity_<domain> and Rel_<domain>_<type> tables in KuzuDB
Routes indexing to the correct extractor (Tree-sitter or custom)
Configures search profiles for the domain’s fields
Sets up display and health check rules

Component Responsibilities

Component	Responsibility
`src/parser/`	Tree-sitter AST parsing, 8-language entity extraction
`src/extractors/`	Custom domain extractors (archaeology, etc.)
`src/domain/`	Domain schema loading, validation, registry
`src/graph/`	Knowledge graph (KuzuDB + NetworkX dual-backend), Cypher queries
`src/search/`	BM25, semantic (LanceDB), graph search, 3-way RRF fusion
`src/llm/`	LLM provider abstraction (Claude/OpenAI/Gemini/Ollama), entity summarizer
`src/mcp/`	MCP server with 15 tools
`src/web/`	FastAPI backend (33 endpoints) + React/Sigma.js frontend
`src/registry.py`	Multi-project registry, cross-project search
`src/change_detector.py`	Incremental indexing: git diff + SHA-256 hash fallback
`src/indexer.py`	Shared pipeline (CLI/MCP/Web), mode=auto/full/incremental
`src/quality.py`	Code quality metrics (complexity, doc coverage, coupling)
`src/storage.py`	Storage mode detection (per-project / shared), path resolution
`src/migrate.py`	Migration tool: per-project <-> shared, verification
`src/ai_overlay.py`	AI data preservation across re-indexing
`src/memory_aggregator.py`	Unified memory records from graph + external sources

Storage Architecture

Per-Project Mode (default)

Each project gets isolated databases:

data/
  myproject/
    parse.json          # Extracted entities and relationships
    kuzu_db/            # KuzuDB graph database
    lancedb/            # LanceDB vector store
    index_history.json  # Performance tracking

Shared Mode (multi-tenant)

All projects share a single database with project column filtering:

data/
  _shared/
    kuzu_db/            # All projects in one graph
    lancedb/            # All embeddings in one vector store
  myproject/
    parse.json          # Still per-project (source data)
    index_history.json

Shared mode enables:

Cross-project Cypher queries (find dependencies across projects)
Single-query semantic search across all projects
Global graph analysis (community detection across the full graph)

Switch between modes using python -m intelligence_engine migrate.

Domain-Scoped Tables

KuzuDB tables are scoped by domain to prevent schema conflicts:

Entity_code             # 15 entity types, code-specific properties
Rel_code_CALLS          # Code relationship tables
Rel_code_IMPORTS
Rel_code_EXTENDS
...

Entity_archaeology      # 6 entity types, archaeology-specific properties
Rel_archaeology_FOUND_AT
Rel_archaeology_DATED_TO
...

This allows multiple domains to coexist in the same database without column or type collisions.

Routing Key: (project, domain)

All operations in the engine are routed by the tuple (project, domain):

Indexing: Selects the correct extractor and target tables
Search: Scopes results and applies domain-specific search profiles
API: Every endpoint accepts project and domain parameters
MCP: Tools route through the same key
Web UI: Domain selector in the project dropdown

This design means a single project can contain entities from multiple domains (e.g., a project with both code and documentation entities).

Adding a New Domain

Define the schema – Create config/domains/your-domain.yaml with entity types, relationships, properties, search profiles, and display config
Write an extractor – Either use tree-sitter (for code-like domains) or create a custom extractor module that reads your data format and returns entity/relationship dicts
Index – Run the indexer with --domain your-domain and the engine creates all necessary tables
Use – Search, query, visualize, and analyze through the same interfaces as any other domain

No core engine code needs to change. The schema YAML drives everything.