NEURAL MEMORY | The Knowledge Spine

System Metrics

Embedding Backend

ONNX

FastEmbed (intfloat/multilingual-e5-large) — 1024d, ~50ms per embedding

GPU Recall

~100ms

CUDA torch.matmul cosine similarity — 10K memories in 100ms vs 500ms CPU

Storage

SQLite

Source of truth — always available, no external DB required

Dream Engine

3 PHASES

NREM → REM → Insight — autonomous background consolidation

Core Architecture

FastEmbed (ONNX)

Primary embedding backend. intfloat/multilingual-e5-large — 1024 dimensions. ONNX runtime, no PyTorch dependency. ~50ms per embedding on CPU. Falls back through sentence-transformers → tfidf → hash.

GPU Recall Engine

CUDA-accelerated cosine similarity via torch.matmul. Loads all embeddings into GPU memory. Batch query in ~100ms for 10K memories. Auto-detects CUDA. Falls back to numpy if no GPU.

Knowledge Graph

Automatic connection creation via cosine similarity threshold. BFS spreading activation with decay factor. Edge types: semantic, bridge (from Dream REM), temporal. SQLite connections table with unique constraint.

Dream Engine

Autonomous background consolidation inspired by biological sleep. NREM: replay & strengthen/weaken connections. REM: bridge discovery between isolated memories. Insight: community detection via BFS connected components.

THE KNOWLEDGE SPINE

System Metrics

Live Demo

Core Architecture

FastEmbed (ONNX)

GPU Recall Engine

Knowledge Graph

Dream Engine

Data Flow

Knowledge Graph Dashboard