Embedded. Encrypted. Unapologetically hybrid.

MongrelDB

The mixed-breed database that refuses to pick one fight.

OLTP writes, columnar analytics, full-text search, vector ANN, learned-sparse retrieval, range indexes, searchable encryption, Arrow, Node, and multi-table ACID are forged into one embeddable file. MongrelDB is built for workloads that make purebred engines blink.

7intersectable index families over one RowId space
6.7µssingle-row durable write in the dev benchmark profile
O(1)COUNT(*) metadata path for instant cardinality
1 fileembedded, encrypted, queryable while encrypted
What makes it mongrel

A single engine wearing every armor type at once.

MongrelDB is not just a column store with a plugin, a vector store with SQL bolted on, or SQLite with a mascot. It is an embedded engine designed around hybrid access paths from the start.

Seven-index portfolio

HOT, Bitmap, PGM, FM, HNSW, PMA, and Sparse/SPLADE all resolve through one shared RowId space, so retrieval modes can intersect instead of living in separate kingdoms.

OLTP on columnar

A WAL group-commit path feeds a Bε-tree memtable and immutable PAX-columnar .sr runs, keeping point writes sharp while scans stay compressed.

Queryable encryption

Page-level AES-256-GCM protects storage while ENCRYPTED_INDEXABLE columns expose equality and range tokens for queries without full plaintext exposure.

Arrow + Node native

In-process deployment, DataFusion SQL, zero-copy Arrow-IPC paths, and a NAPI addon let JavaScript apps hold the blade without stalling the event loop.

Index arsenal

Choose a blade. They all cut to the same RowId.

The trick is not having many indexes. The trick is making them speak one row identity language, so semantic, lexical, range, equality, and mutable-run access paths can be combined.

Primary key strike

HOT trie point lookup

Height-optimized trie access gives the point-lookup path a fast, compact weapon for primary keys and stable RowId resolution.

  • Purpose-built for PK lookup and exact targeting.
  • Feeds the same RowId set used by every other blade.
  • Ideal for OLTP-style get/update paths.
Equality phalanx

Roaring bitmap equality

Low-cardinality values compress into roaring bitmaps for fast equality filters and clean intersections with other candidate sets.

  • Made for tenant, status, type, flag, and other low-cardinality columns.
  • Turns equality filters into fast set operations.
  • Plays especially well with ANN and substring constraints.
Learned range cut

PGM range index

A shrinking-cone, epsilon-bounded learned model predicts where range keys live, then resolves the qualifying rows back into RowIds.

  • Designed for date, numeric, and ordered key ranges.
  • Small model footprint with bounded lookup correction.
  • Combines with dense vector, sparse, and substring gates.
Substring ambush

FM-index containment

BWT plus wavelet-tree substring search means LIKE-style containment can become an index-backed filter instead of a helpless scan.

  • Finds substring containment without forcing a search-only engine.
  • Converts text hits into RowIds for hybrid set math.
  • Complements SPLADE-style sparse retrieval and HNSW ANN.
Dense-vector duel

HNSW approximate nearest neighbor

Semantic similarity search lives alongside OLTP, text, equality, and range filters instead of forcing a separate vector store.

  • ANN candidate sets resolve to RowIds like every other access path.
  • Supports dense-vector AI-native retrieval flows.
  • Intersect ANN with substring, bitmap, range, and sparse gates.
Mutable run discipline

PMA cache-oblivious runs

Packed memory arrays keep mutable sorted structures efficient without overfitting to a single cache size.

  • Designed for cache-oblivious ordered maintenance.
  • Helps bridge write-friendly ingestion and sorted-run reads.
  • Another blade in the shared RowId set arsenal.
Learned-sparse arrow

SPLADE-style sparse top-k

Learned sparse retrieval brings inverted-token scoring into the same engine, giving lexical AI retrieval a native home next to vectors.

  • Inverted token lists score top-k by sparse dot product.
  • Works with dense-vector and substring retrieval instead of replacing them.
  • Designed for AI-native access patterns in embedded apps.
Interactive query forge

Toggle the constraints. Watch the RowId core keep up.

The real wedge is the hybrid surface: semantic similarity, substring containment, equality, and range filters are not separate products. They are candidate sets that meet at the same RowId anvil.

HNSWvector ANN
FMsubstring
Bitmapequality
PGMrange
Shared RowIdintersection core
ann_search(embedding, q, 50) fm_contains(body, "samurai") tenant = "visorcraft" created_at BETWEEN a AND b
Architecture

A write path with claws. A read path with discipline.

The page design echoes the engine: lacquer, steel, and gold over a practical pipeline. The write path is log-structured; the read path resolves predicates to shared RowIds, decodes only what is needed, and keeps hot answers warm.

01

WAL group commit

Durability enters through an append-only write-ahead log designed to batch fsync pressure without losing OLTP edge.

02

Bε-tree memtable

A composite-key MVCC skip-list memtable absorbs updates before flushing into immutable runs.

03

PAX columnar .sr runs

Sorted-run pages keep scans, compression, and projection pushdown close to the metal.

04

Hybrid pushdown

Equality, range, substring, vector, and sparse matches intersect before decoding rows.

05

Arrow + DataFusion

SQL, Arrow IPC, and Node-native bindings bring embedded performance into modern app stacks.

Columnar, but not fragile.

Most columnar engines are bulk analytical swords. MongrelDB takes the weird route: durable single-row writes and updates without throwing away compressed scan behavior.

Durable write6.7µs
Durable update6.1µs
Bulk ingest26.2 Melem/s

Analytical, but not distant.

Memory-mapped runs, adaptive encodings, projection pushdown, page pruning, and a warm result cache keep read paths tight for embedded analytics.

Bitmap pushdown64.8 Melem/s
Range pushdown65.9 Melem/s
Warm cache hit0.1-0.3µs
Sealed scroll security

Encrypted storage that still knows how to answer.

The security story is not just "encrypt the file and pray." MongrelDB treats the WAL, sorted runs, caches, index checkpoints, metadata authentication, and queryable encrypted columns as first-class parts of the dojo.

Japanese temple in autumn used as visual inspiration

Authenticated pages. Searchable tokens. No ceremony.

Page-level AES-256-GCM and a domain-separated key hierarchy guard the file, while encrypted-indexable columns can carry deterministic equality tokens and order-preserving range tokens.

AES-256-GCMArgon2idHKDFHMAC tokensOPE ranges
Search while sealed

Encrypted-indexable columns expose query tokens so equality and range predicates can resolve candidate rows without making the database decrypt everything up front.

Tamper-evident runs

Cleartext run metadata stays authenticated, while encrypted page payloads are individually authenticated by AES-GCM tags. Corruption has to get past the seals.

Key hierarchy
  • Passphrase plus salt derives a table-level KEK through Argon2id and HKDF.
  • Per-run DEKs protect page payloads.
  • Separate domains protect WAL, result cache, index checkpoints, metadata MACs, and per-column tokens.
Adaptive footprint

Delta, Dictionary, Zstd, and passthrough encoding choices happen per column, while memory-mapped runs and an O(1) count path keep the embedded profile aggressive.

Embedded developer experience

SQLite energy. Arrow bridge. Node-native stance.

MongrelDB is designed to sit inside the application, not behind another service hop. The API surface can speak SQL through DataFusion, typed native calls through NAPI, and Arrow batches when your pipeline needs columns fast.

In-process, but not small-minded.

Use it where an embedded engine belongs: local-first apps, edge analytics, AI retrieval, desktop tools, test harnesses, agents, and Node services that need one file with many access paths.

Single fileDataFusion SQLNAPI addonPromise asyncBigInt RowIdsArrow IPC
// One engine, one file, many access paths.
const db = openMongrel("./dojo.mdb");

const hits = await db.hybridQuery({
  ann:      { column: "embedding", k: 50, vector: query },
  contains: { column: "body", text: "samurai" },
  eq:       { column: "tenant", value: "visorcraft" },
  range:    { column: "created_at", between: [from, to] },
  return:   ["rowid", "title", "score"]
});

console.log(hits.length);
MongrelDB samurai helmet mark

Bring the shogun to your repo.

Clone MongrelDB, test the hybrid query surface, and wire the embedded engine into the workloads that are too operational for analytics engines and too analytical for OLTP stores.