Embedded KV with sled

A filing cabinet has one job — hand you a folder when you say its label, and put a folder back where it belongs when you slide it in. A key-value store is the same cabinet in software. The label is a string of bytes. The folder is another string of bytes. The drawer is sorted so two folders with neighbor labels sit next to each other, which means you can also pull a whole range of folders by walking the drawer in order. That is the entire shape of the data model. Every other database — SQL tables, document stores, time-series engines — is built on top of this cabinet or something close to it.

A key-value store is a sorted drawer of labeled folders — keys on the tabs, values inside.

Google built the first famous version of this cabinet in 2006 and called it Bigtable. The two engineers who wrote it, Sanjay Ghemawat and Jeff Dean, later wrote a smaller open-source cousin called LevelDB so the rest of the industry could use the same idea. Facebook took LevelDB, hardened it for spinning disks and flash, and shipped RocksDB. The lineage matters because every embedded KV store since — including sled — borrows the same trick. Writes go into an append-only log on disk first, then a background task folds them into a sorted tree. The reader sees one cabinet. The writer never has to wait for the cabinet to be reorganized.

sled itself comes from a single engineer named Tyler Neely, who started building it around 2017 while working on Rust infrastructure at Comcast. He wanted RocksDB's reliability without dragging in a C++ dependency, and he wanted the API to feel like a Rust BTreeMap so a programmer who had never touched a database could use it. The result is a crate that opens a folder on disk, hands you back a handle, and lets you insert, get, remove, and iter against it the same way you would against a map in memory. The cabinet just happens to survive a crash.

Open the cabinet first. Every sled database lives inside one directory on disk — the drawer.

fn open_fresh() -> sled::Result<sled::Db> {
    let db = sled::open(DB_PATH)?;
    println!("opened fresh database");
    println!();
    Ok(db)
}

The hardcoded path under /tmp is wiped at the top of main so the demo prints the same bytes every time you run it. A real program would put the directory somewhere persistent and never delete it. sled::open returns a Result<Db, sled::Error> because opening a database can fail for honest reasons — the path doesn't exist, the file is corrupt, another process is holding the lock — and the type forces the caller to handle each one. The returned Db is cheap to clone and safe to share across threads. One handle, many readers and writers.

Put four folders into the drawer. Three are todo items with the prefix todo:, and one is a stray note with the prefix note:. The prefix is not a sled feature — it is just part of the key — but because the drawer is sorted, all four todo: keys end up sitting next to each other, which is the whole reason this naming pattern works.

fn seed(db: &sled::Db) -> sled::Result<()> {
    let rows: [(&str, &str); 4] = [
        ("todo:001", "buy boots"),
        ("todo:002", "lift legs"),
        ("todo:003", "call mom"),
        ("note:042", "GTA loads slow"),
    ];
    println!("insert 4 keys:");
    for (k, v) in rows {
        db.insert(k, v)?;
        println!("  put {k} = {v:?}");
    }
    db.flush()?;
    println!();
    Ok(())
}

Each insert writes the new pair to an append-only log called the write-ahead log, then updates an in-memory index. The pair is durable the moment flush returns. Without the flush the pair is still safe against a normal program exit, but a sudden power loss could lose the last few writes — which is why production code calls flush at the end of a logical transaction and not after every insert. Both the key and the value are stored as raw bytes. sled never tries to interpret them. If you want to store a struct, you encode it first and decode it after get.

How sled stores writes: an append-only log up front, a sorted B-tree behind it, both living inside one directory on disk.

Pull one folder out of the drawer.

fn fetch_one(db: &sled::Db) -> sled::Result<()> {
    let key = "todo:002";
    match db.get(key)? {
        Some(bytes) => {
            let text = std::str::from_utf8(&bytes).unwrap_or("<bad utf8>");
            println!("get {key} -> {text:?}");
        }
        None => println!("get {key} -> <missing>"),
    }
    println!();
    Ok(())
}

get returns an Option<IVec> wrapped in a Result. The outer Result is for disk-level failure. The inner Option is for "the key was not in the cabinet" — which is not an error, it is a normal answer, and sled refuses to conflate the two. IVec is sled's own byte-slice type that avoids copying when the value lives in the page cache, so reads stay fast even when the value is large.

Now walk a whole range. scan_prefix("todo:") starts at the first key with that prefix and walks forward until the prefix changes, which gives you every todo without ever loading the stray note.

fn scan_prefix(db: &sled::Db) -> sled::Result<()> {
    println!("scan prefix \"todo:\":");
    for row in db.scan_prefix("todo:") {
        let (k, v) = row?;
        let key = std::str::from_utf8(&k).unwrap_or("<bad utf8>");
        let val = std::str::from_utf8(&v).unwrap_or("<bad utf8>");
        println!("  {key} = {val:?}");
    }
    println!();
    Ok(())
}

This is the move that separates a real KV store from a hash map. A hash map can answer get in constant time but it cannot answer "give me every key that starts with todo:" without scanning all the keys. sled keeps the drawer sorted, so a prefix scan is one seek to the start and a forward walk until the prefix changes. The same trick powers paginated queries, secondary indexes, and time-bucketed reads — any time the answer to a question is "a contiguous slice of the drawer."

Delete a folder.

fn delete_one(db: &sled::Db) -> sled::Result<()> {
    let key = "todo:001";
    let old = db.remove(key)?;
    match old {
        Some(bytes) => {
            let text = std::str::from_utf8(&bytes).unwrap_or("<bad utf8>");
            println!("remove {key} (was {text:?})");
        }
        None => println!("remove {key} (was missing)"),
    }
    db.flush()?;
    println!();
    Ok(())
}

remove returns the old value if there was one, so the caller can log what was lost or refund a balance or whatever the situation calls for. Internally sled does not actually erase the bytes — it writes a tombstone record that says "this key no longer exists." A background compactor later folds the tombstones into the tree and reclaims the space. The reader never sees the tombstone. As far as get and scan_prefix are concerned, the key is gone the moment remove returns.

Finally, walk the whole drawer and list what's left.

fn final_keys(db: &sled::Db) -> sled::Result<()> {
    println!("final keys (sorted):");
    let mut keys: Vec<String> = Vec::new();
    for row in db.iter() {
        let (k, _v) = row?;
        keys.push(String::from_utf8_lossy(&k).into_owned());
    }
    keys.sort();
    for k in keys {
        println!("  {k}");
    }
    Ok(())
}

opened fresh database

insert 4 keys:
  put todo:001 = "buy boots"
  put todo:002 = "lift legs"
  put todo:003 = "call mom"
  put note:042 = "GTA loads slow"

get todo:002 -> "lift legs"

scan prefix "todo:":
  todo:001 = "buy boots"
  todo:002 = "lift legs"
  todo:003 = "call mom"

remove todo:001 (was "buy boots")

final keys (sorted):
  note:042
  todo:002
  todo:003

Read the output top to bottom. The four inserts land in the order they were sent — the keys are not yet sorted by what the print shows, because insert does not return order, it just acknowledges the write. The get pulls back the exact value that was put in. The prefix scan returns three rows in sorted order, which is the drawer doing what it is built for. The remove reports the old value "buy boots", which is the proof that the cabinet handed back what was actually in the folder before the delete. The final listing shows three keys — the deleted todo is gone, and the remaining keys are sorted because that is the drawer's natural order.

One question worth asking — why does this whole demo run inside one process with no server, no port, and no network? That is the meaning of "embedded." sled is a library, not a daemon. The cabinet lives inside your binary's memory and on the disk your binary can reach. There is no localhost:5432 to connect to and no separate process to start. The trade is that only one program at a time can open the directory, which is exactly what you want for a single-binary service or a desktop app but exactly the wrong fit for a system where ten different services need to share the same dataset.

Embedded vs server: sled lives inside your binary; a server database lives in its own process behind a socket.

The thing this cabinet cannot do is let two processes write to it at the same time. The moment a second binary needs to read or write the same keys, the file lock fails and the second binary refuses to start — which is the bottleneck the next lesson solves by putting the storage behind a server you talk to over a socket.