Skip to content
kura

Incremental and resumable captures

Re-run a capture to fetch only what is new, resume an interrupted run or a half-downloaded video, upgrade a catalog to a vault, force a clean recapture, and pin the stamp for reproducible output.

A kura archive is meant to be kept, not captured once and forgotten. Re-running is cheap because kura knows what it already holds.

Fetching only what is new

kura add (alias kura update) re-captures an existing target but fetches only what is newer than the newest video already on disk, then re-renders only the affected pages:

kura add @mkbhd

Under the hood, an add reads the resume cursor from state.json and, when the last capture finished exhaustively, seeds the uploads reader with the newest captured id, so it pages only the gap since your last capture and stops at the boundary. The capture summary shows the delta as (+N new). kura archive on an existing repo does the same incremental fetch; add is just the clearer name for a re-run. An interrupted backfill (one whose state.json is not yet marked complete) re-walks instead, with the records already on disk keeping the already-captured prefix cheap, so it picks up where it stopped and finishes.

For a playlist, the merge re-reads the list and adds only video ids not already stored. An already-held video is refreshed in place (its counts update via overwrite) without duplicating files.

Upgrading a catalog to a vault

A meta catalog and a media vault differ only in the stream files. kura add --depth media over a meta repo fetches only the streams for videos whose records are already present, upgrading the catalog to a playable vault without re-fetching anything else:

kura add @mkbhd --depth media

Resuming an interrupted run

Every record is written to disk the instant it arrives, not buffered for the end. That means an interrupted run is never wasted:

  • Press Ctrl-C and kura stops, keeping every record it already wrote. It exits with code 130.
  • Hit a rate limit on a long run and kura exits 5 with the partial archive intact.

Either way, run the same command again (or kura add) and it continues from what is on disk, fetching only the rest. --resume is on by default; pass --resume=false to ignore the cursor and re-walk the spine from the top.

Stream files are crash-safe. A download lands in a .part sibling and is renamed into its final name only once every byte is present, so a run killed mid-download never leaves a truncated file that a later run would mistake for complete. The unit of resume for a stream is the whole file: an incomplete stream is fetched again, while every finished stream is kept and skipped. The resume cursor and the capture range live in state.json next to the manifest.

Forcing a clean recapture

To ignore the held state and recapture from scratch (for example, to pull in updated counts on videos you already have), use --force. Records and views overwrite; cached media still de-dupes:

kura add @mkbhd --force

The response cache

kura keeps a shared on-disk cache of the InnerTube reads it makes (the metadata, channel, and listing calls, never the stream bytes or thumbnails). A repeated call within a freshness window is served from disk, so a re-render, a re-run, or dev iteration is fast and works offline. The cache is shared across every archive because it lives in your user cache directory ($XDG_CACHE_HOME/kura, or ~/.cache/kura), not inside a repository, and it is advisory: deleting it only costs a re-fetch.

kura archive @mkbhd --no-cache     # ignore the cache, always fetch fresh

The freshness window defaults to one hour; set KURA_CACHE_TTL (a Go duration like 30m or 6h) to widen or narrow it, or --no-cache to bypass it entirely when you need the very latest counts. Stream files are never cached here; their de-duplication is the local archive itself.

Reproducible output

kura's output is deterministic by design: record paths and media filenames are pure functions of their content, the manifest's record-bearing fields are sorted by id, and transcript offsets come from the stored .vtt. The one wall-clock value is the capture stamp. Pin it with --date to make a run byte-for-byte reproducible:

kura archive @mkbhd --date 2026-06-17T00:00:00Z

The same --date is available on kura render to fix the footer stamp when you rebuild the views.

Previewing without fetching

--dry-run prints the capture plan and the requests it would make without touching the network:

kura archive @mkbhd --dry-run

A self-contained, movable archive

The repository is fully self-contained. Records, media, views, CSS, and the manifest all live under the one root directory, with every internal reference written as a relative path. Move the folder to another disk or machine, open index.html, and it still works with the network unplugged. To browse it over HTTP so the <video> range requests work, point kura serve at it:

kura serve $HOME/data/kura/youtube/@mkbhd

Next