Initialize System
// SYSTEM STATUS: OPTIMAL // GPU: CUDA // EMBEDDING: FASTEMBED

THE KNOWLEDGE SPINE

Semantic Memory for AI Agents.
Knowledge Graph. Spreading Activation. GPU Recall. Dream Consolidation.
🧠

System Metrics

Embedding Backend
ONNX
FastEmbed (intfloat/multilingual-e5-large) — 1024d, ~50ms per embedding
GPU Recall
~100ms
CUDA torch.matmul cosine similarity — 10K memories in 100ms vs 500ms CPU
Storage
SQLite
Source of truth — always available, no external DB required
Dream Engine
3 PHASES
NREM → REM → Insight — autonomous background consolidation

Live Demo

Core Architecture

FastEmbed (ONNX)

Primary embedding backend. intfloat/multilingual-e5-large — 1024 dimensions. ONNX runtime, no PyTorch dependency. ~50ms per embedding on CPU. Falls back through sentence-transformerstfidfhash.

The Brain — neural exosuit concept

GPU Recall Engine

CUDA-accelerated cosine similarity via torch.matmul. Loads all embeddings into GPU memory. Batch query in ~100ms for 10K memories. Auto-detects CUDA. Falls back to numpy if no GPU.

Knowledge Graph

Automatic connection creation via cosine similarity threshold. BFS spreading activation with decay factor. Edge types: semantic, bridge (from Dream REM), temporal. SQLite connections table with unique constraint.

Dream Engine

Autonomous background consolidation inspired by biological sleep. NREM: replay & strengthen/weaken connections. REM: bridge discovery between isolated memories. Insight: community detection via BFS connected components.

Data Flow

01
Encode
FastEmbed ONNX
content → 1024d vector
~50ms CPU
02
Store
SQLite INSERT
memory + embedding
create connections
03
Recall
GPU torch.matmul
or CPU numpy fallback
top-k similarity
04
Consolidate
Dream Engine
NREM → REM → Insight
autonomous cleanup

Knowledge Graph Dashboard

Neural Memory Dashboard — 3D Force Graph

3D Force Graph — WebSocket live updates — Amber theme — Category filtering — Search