Seven-index portfolio
HOT, Bitmap, PGM, FM, HNSW, PMA, and Sparse/SPLADE all resolve through one shared RowId space, so retrieval modes can intersect instead of living in separate kingdoms.
The mixed-breed database that refuses to pick one fight.
OLTP writes, columnar analytics, full-text search, vector ANN, learned-sparse retrieval, range indexes, searchable encryption, Arrow, Node, and multi-table ACID are forged into one embeddable file. MongrelDB is built for workloads that make purebred engines blink.
MongrelDB is not just a column store with a plugin, a vector store with SQL bolted on, or SQLite with a mascot. It is an embedded engine designed around hybrid access paths from the start.
HOT, Bitmap, PGM, FM, HNSW, PMA, and Sparse/SPLADE all resolve through one shared RowId space, so retrieval modes can intersect instead of living in separate kingdoms.
A WAL group-commit path feeds a Bε-tree memtable and immutable PAX-columnar .sr runs, keeping point writes sharp while scans stay compressed.
Page-level AES-256-GCM protects storage while ENCRYPTED_INDEXABLE columns expose equality and range tokens for queries without full plaintext exposure.
In-process deployment, DataFusion SQL, zero-copy Arrow-IPC paths, and a NAPI addon let JavaScript apps hold the blade without stalling the event loop.
The trick is not having many indexes. The trick is making them speak one row identity language, so semantic, lexical, range, equality, and mutable-run access paths can be combined.
Height-optimized trie access gives the point-lookup path a fast, compact weapon for primary keys and stable RowId resolution.
Low-cardinality values compress into roaring bitmaps for fast equality filters and clean intersections with other candidate sets.
A shrinking-cone, epsilon-bounded learned model predicts where range keys live, then resolves the qualifying rows back into RowIds.
BWT plus wavelet-tree substring search means LIKE-style containment can become an index-backed filter instead of a helpless scan.
Semantic similarity search lives alongside OLTP, text, equality, and range filters instead of forcing a separate vector store.
Packed memory arrays keep mutable sorted structures efficient without overfitting to a single cache size.
Learned sparse retrieval brings inverted-token scoring into the same engine, giving lexical AI retrieval a native home next to vectors.
The real wedge is the hybrid surface: semantic similarity, substring containment, equality, and range filters are not separate products. They are candidate sets that meet at the same RowId anvil.
The page design echoes the engine: lacquer, steel, and gold over a practical pipeline. The write path is log-structured; the read path resolves predicates to shared RowIds, decodes only what is needed, and keeps hot answers warm.
Durability enters through an append-only write-ahead log designed to batch fsync pressure without losing OLTP edge.
A composite-key MVCC skip-list memtable absorbs updates before flushing into immutable runs.
Sorted-run pages keep scans, compression, and projection pushdown close to the metal.
Equality, range, substring, vector, and sparse matches intersect before decoding rows.
SQL, Arrow IPC, and Node-native bindings bring embedded performance into modern app stacks.
Most columnar engines are bulk analytical swords. MongrelDB takes the weird route: durable single-row writes and updates without throwing away compressed scan behavior.
Memory-mapped runs, adaptive encodings, projection pushdown, page pruning, and a warm result cache keep read paths tight for embedded analytics.
The security story is not just "encrypt the file and pray." MongrelDB treats the WAL, sorted runs, caches, index checkpoints, metadata authentication, and queryable encrypted columns as first-class parts of the dojo.
Page-level AES-256-GCM and a domain-separated key hierarchy guard the file, while encrypted-indexable columns can carry deterministic equality tokens and order-preserving range tokens.
Encrypted-indexable columns expose query tokens so equality and range predicates can resolve candidate rows without making the database decrypt everything up front.
Cleartext run metadata stays authenticated, while encrypted page payloads are individually authenticated by AES-GCM tags. Corruption has to get past the seals.
Delta, Dictionary, Zstd, and passthrough encoding choices happen per column, while memory-mapped runs and an O(1) count path keep the embedded profile aggressive.
MongrelDB is designed to sit inside the application, not behind another service hop. The API surface can speak SQL through DataFusion, typed native calls through NAPI, and Arrow batches when your pipeline needs columns fast.
Use it where an embedded engine belongs: local-first apps, edge analytics, AI retrieval, desktop tools, test harnesses, agents, and Node services that need one file with many access paths.
// One engine, one file, many access paths.
const db = openMongrel("./dojo.mdb");
const hits = await db.hybridQuery({
ann: { column: "embedding", k: 50, vector: query },
contains: { column: "body", text: "samurai" },
eq: { column: "tenant", value: "visorcraft" },
range: { column: "created_at", between: [from, to] },
return: ["rowid", "title", "score"]
});
console.log(hits.length);
Clone MongrelDB, test the hybrid query surface, and wire the embedded engine into the workloads that are too operational for analytics engines and too analytical for OLTP stores.