# Neural Memory Adapter for Hermes Agent

Semantic memory system with knowledge graph, spreading activation, embedding-based recall, autonomous dream consolidation, and GPU-accelerated recall for the Hermes Agent.

> **[🌐 Live Website](https://itsxactly.github.io/neural-memory/)** — interactive demos, architecture, and dashboard.

![Neural Brain Hero](assets/neural_brain_hero.png)

---

## Features

- **Semantic Memory Storage**: Store memories with auto embedding (FastEmbed ONNX, 1024d)
- **Knowledge Graph**: Auto-connect related memories via cosine similarity
- **Spreading Activation**: Explore connected ideas (BFS graph traversal with decay)
- **Conflict Detection**: Detect and supersede conflicting memories
- **Dream Engine**: Autonomous consolidation (NREM/REM/Insight phases)
- **GPU Recall**: CUDA-accelerated cosine similarity (~100ms, torch.matmul)
- **SQLite-First**: Always works, no external DB needed
- **MSSQL Optional**: Shared DB for multi-agent setups

## Quick Start

```bash
cd ~/projects/neural-memory-adapter
bash install.sh          # auto-detect hermes-agent
bash install.sh /path    # explicit path
```

The installer handles everything:
1. Python deps (FastEmbed, torch, numpy)
2. CUDA detection for GPU recall
3. Plugin deployment to hermes-agent
4. Database init (SQLite at ~/.neural_memory/memory.db)
5. config.yaml setup

Restart hermes after install: `hermes gateway restart`

**Live Dashboard — Knowledge Graph**

[![Dashboard](assets/neural_memory.png)](https://raw.githubusercontent.com/itsXactlY/neural-memory/refs/heads/master/assets/neural_memory.png)

---

## Architecture

### Embedding Backends (auto-priority)

| Priority | Backend | Model | Speed | Requirements |
|----------|---------|-------|-------|--------------|
| 1st | FastEmbed | intfloat/multilingual-e5-large | ~50ms | `pip install fastembed` |
| 2nd | sentence-transformers | BAAI/bge-m3 1024d | ~200ms | GPU recommended |
| 3rd | tfidf | — | varies | numpy only |
| 4th | hash | — | instant | nothing |

FastEmbed uses ONNX runtime — no PyTorch conflict, works on CPU. Falls back automatically.

### GPU Recall Engine

```python
# gpu_recall.py — CUDA cosine similarity
# Loads all embeddings into GPU, does torch.matmul for batch similarity
# ~100ms for 10K memories vs ~500ms CPU

from gpu_recall import GPURecall
engine = GPURecall()
results = engine.recall(query_embedding, all_embeddings, top_k=10)
```

Auto-detects CUDA. Falls back to Python/numpy if no GPU.

### Data Flow

```mermaid
flowchart TD
    subgraph Store["neural_remember"]
        A[User content] --> B[FastEmbed encode]
        B --> C[1024d vector]
        C --> D[SQLite INSERT]
        C --> E[Cosine similarity search]
        E --> F[Create connections]
    end

    subgraph Recall["neural_recall"]
        G[User query] --> H[FastEmbed encode]
        H --> I{CUDA available?}
        I -->|Yes| J[GPU torch.matmul]
        I -->|No| K[CPU numpy dot]
        J --> L[Top-k results]
        K --> L
    end

    subgraph Think["neural_think"]
        M[Source memory] --> N[BFS on connections]
        N --> O[Apply decay factor]
        O --> P[Ranked activation]
    end

    subgraph Dream["neural_dream"]
        Q[Idle trigger] --> R[NREM replay]
        R --> S[REM bridge discovery]
        S --> T[Insight communities]
        T --> U[Consolidated graph]
    end
```

### Storage

- **SQLite (always)**: `~/.neural_memory/memory.db` — source of truth
- **Embeddings cache**: `~/.neural_memory/models/` (auto-downloaded, ~2.2 GB)
- **GPU cache**: `~/.neural_memory/gpu_cache/` (embeddings.npy + metadata.pkl)
- **Access logs**: `~/.neural_memory/access_logs/` (JSON Lines)
- **MSSQL (optional)**: 127.0.0.1/NeuralMemory — multi-agent mirror

### SQLite Schema

```sql
-- Core tables
memories (id, content, embedding, category, salience, ...)
connections (source_id, target_id, weight, edge_type)
connection_history (source_id, target_id, last_weight, last_updated)

-- Dream engine
dream_sessions (id, phase, started_at, completed_at, stats)
dream_insights (id, session_id, type, data)

-- Indexes
idx_memories_category ON memories(category)
idx_connections_source ON connections(source_id)
idx_connections_target ON connections(target_id)
```

## Configuration

All settings in `~/.hermes/config.yaml`:

```yaml
memory:
  provider: neural
  neural:
    db_path: ~/.neural_memory/memory.db
    embedding_backend: fastembed       # auto | fastembed | sentence-transformers | tfidf | hash
    prefetch_limit: 10
    search_limit: 10
    dream:
      enabled: true
      idle_threshold: 600              # seconds before dream cycle
      memory_threshold: 50             # dream after N new memories
      mssql:                           # optional — only if using MSSQL
        server: 127.0.0.1
        database: NeuralMemory
        username: SA
        password: 'your_password'
        driver: '{ODBC Driver 18 for SQL Server}'
```

## Tools

When active, these tools are available in Hermes:

| Tool | Description |
|------|-------------|
| `neural_remember` | Store a memory (with conflict detection) |
| `neural_recall` | Search memories by semantic similarity |
| `neural_think` | Spreading activation from a memory |
| `neural_graph` | View knowledge graph statistics |
| `neural_dream` | Force a dream cycle (all/nrem/rem/insight) |
| `neural_dream_stats` | Dream engine statistics |

## Dream Engine

Autonomous background memory consolidation (biological sleep inspired):

```mermaid
flowchart LR
    subgraph Trigger
        T1[Idle 600s] --> D
        T2[50 new memories] --> D
        T3[Manual / Cron] --> D
    end

    D[Dream Cycle] --> NREM
    D --> REM
    D --> INSIGHT

    subgraph NREM["Phase 1 — NREM"]
        direction TB
        N1[Replay 100 recent memories] --> N2[Spreading activation]
        N2 --> N3{Connection active?}
        N3 -->|Yes| N4[Strengthen +0.05]
        N3 -->|No| N5[Weaken -0.01]
        N3 -->|Dead <0.05| N6[Prune]
    end

    subgraph REM["Phase 2 — REM"]
        direction TB
        R1[Find 50 isolated memories] --> R2[Search similar unconnected]
        R2 --> R3[Create bridge connections]
        R3 --> R4[weight = similarity × 0.3]
    end

    subgraph INSIGHT["Phase 3 — Insight"]
        direction TB
        I1[BFS connected components] --> I2[Identify communities]
        I2 --> I3[Find bridge nodes]
        I3 --> I4[Store dream_insights]
    end
```

### Triggers

- Automatic: after 600s idle (configurable)
- Automatic: every 50 new memories (configurable)
- Manual: `neural_dream` tool
- Standalone: `python python/dream_worker.py --daemon`

## Testing

### Smoke Test (Quick)

```bash
cd ~/projects/neural-memory-adapter/python
python3 demo.py
```

### Full Test Suite

```bash
# Plugin test suite
cd ~/.hermes/hermes-agent/plugins/memory/neural
python3 test_suite.py

# Upside-Down Test Suite — edge cases, corruption, concurrency, SQL injection
cd ~/projects/neural-memory-adapter
python3 tests/test_upside_down.py
```

### Clean Smoke Test (Any Machine)

```bash
cd ~/projects/neural-memory-adapter
python3 -c "
import sys; sys.path.insert(0, 'python')
from neural_memory import NeuralMemory
nm = NeuralMemory(db_path='/tmp/test.db', embedding_backend='cpu', use_cpp=False)
mid = nm.remember('test memory', label='smoke')
results = nm.recall('test')
assert len(results) > 0, 'recall failed'
print(f'SMOKE TEST PASS: {len(results)} results')
"
```

### Verified: Clean VM — Debian 12 (2026-04-21)

Tested on a fresh Debian 12 QEMU/KVM VM — hermes-agent + neural memory only, no jack-in-a-box.

| Property | Value |
|----------|-------|
| VM | Debian 12, 4 GB RAM, KVM enabled |
| hermes-agent | git clone (itsXactlY fork) |
| neural-memory | git clone + FastEmbed ONNX |
| Embedding | intfloat/multilingual-e5-large (1024d) |
| C++ bridge | Not built (Python fallback) |

**All 12 integration tests passed:**

| # | Test | Result |
|---|------|--------|
| 1 | NeuralMemory standalone (remember/recall/graph) | PASS |
| 2 | Memory Provider (FastEmbed 1024d) | PASS |
| 3 | NeuralMemoryProvider.__init__ | PASS |
| 4 | is_available() | PASS |
| 5 | initialize(session_id) | PASS |
| 6 | get_tool_schemas() → 4 tools | PASS |
| 7 | system_prompt_block() (250 chars) | PASS |
| 8 | handle_tool_call — neural_remember | PASS |
| 9 | handle_tool_call — neural_recall | PASS |
| 10 | handle_tool_call — neural_graph | PASS |
| 11 | prefetch() | PASS |
| 12 | shutdown() | PASS |

### VM / Constrained Environment Notes

- **4 GB RAM minimum** — FastEmbed model download (~500 MB). 2 GB = OOM killed.
- **HashBackend** works as fallback on low-RAM systems (1024d, instant, no deps).
- **C++ bridge optional** — Python fallback covers all functionality.
- **FastEmbed >= 0.5.1** — earlier versions default to CLS embedding (deprecated).
- **`python3-venv` required** on Debian — `apt install python3.11-venv` if missing.
- **PEP 668 (Debian)** — `pip install` needs venv or `--break-system-packages`.
- **Cloud-init delay** — 60–90 s on first boot. Don't assume SSH is ready immediately.
- **prefetch() returns empty** on fresh DB — expected, no prior memories to pre-load.

## File Structure

```
neural-memory-adapter/
├── install.sh                    # Installer
├── hermes-plugin/                # Plugin (deployed to hermes-agent)
│   ├── __init__.py               # MemoryProvider + tools
│   ├── config.py                 # Config loader
│   ├── plugin.yaml               # Plugin metadata
│   ├── neural_memory.py          # Unified Memory class
│   ├── memory_client.py          # Main client (NeuralMemory, SQLiteStore)
│   ├── embed_provider.py         # Embedding backends (FastEmbed, st, tfidf, hash)
│   ├── gpu_recall.py             # CUDA cosine similarity engine
│   ├── dream_engine.py           # Dream engine (NREM/REM/Insight)
│   ├── dream_worker.py           # Standalone daemon
│   ├── access_logger.py          # Recall event logger
│   └── ...
├── python/                       # Python source (mirrors hermes-plugin)
│   └── ...
├── src/                          # C++ source (optional, legacy)
│   ├── memory/lstm.cpp           # LSTM predictor
│   ├── memory/knn.cpp            # kNN engine
│   └── memory/hopfield.cpp       # Hopfield network
└── README.md
```

## Production Lessons

### Embedding & Runtime

- **FastEmbed > sentence-transformers** — ONNX runtime, no PyTorch conflict, fast on CPU.
- **FastEmbed >= 0.5.1** — earlier versions default to CLS embedding. Pin version or set `add_custom_model`.
- **GPU recall > C++ Bridge** — C++ Hopfield had bias issues; GPU matmul is clean.
- **numpy before FastEmbed** — FastEmbed imports numpy at load time; install order matters.
- **Don't force PyTorch** — let FastEmbed handle CPU. torch only needed for GPU recall.

### Storage & Architecture

- **SQLite = Source of Truth** — MSSQL is optional. SQLite always works.
- **Auto-detect everything** — CUDA, backends, venv paths. Minimize config burden.
- **4 tool schemas** exposed by NeuralMemoryProvider: `neural_remember`, `neural_recall`, `neural_think`, `neural_graph`. (`neural_dream` / `neural_dream_stats` are standalone Memory class only.)

## License

See [LICENSE](LICENSE).