Vectorbea Engineering

Experiment

Toy Demo: A Durable Runner Concept in Kotlin

A small, sanitized Kotlin sketch of the durable-execution ideas from this blog, checkpointing, step-level retry, and replay, built as a standalone learning exercise.

Susmit Banerjee·April 2, 2026
kotlindurable-executionexperiment

This is a write-up of a small, standalone Kotlin experiment, not a piece of Vectorbea's actual codebase, and deliberately simplified to the point where it could fit in a single file and be understood in one sitting. The goal was to answer a question for myself: "if I had to explain durable execution to someone with nothing but a whiteboard and an afternoon, what's the smallest thing I could build that demonstrates the core idea?"

The shape of the toy

The demo models a "run" as a list of steps, a sequence number, and an append-only list of events. A DurableRunner interface exposes one operation: given a run ID, execute the next step that hasn't completed yet, append the resulting event, and, critically, persist that event before returning. "Persist," in the toy, means writing to an in-memory map guarded by a mutex, standing in for what would be a database write in a real system. The substitution is intentional: the interesting part isn't the storage technology, it's the shape of the guarantee.

interface EventStore {
    fun append(runId: String, event: RunEvent)
    fun eventsFor(runId: String): List<RunEvent>
}
 
sealed interface RunEvent {
    val sequence: Int
    data class StepStarted(override val sequence: Int, val stepId: String) : RunEvent
    data class StepCompleted(override val sequence: Int, val stepId: String, val result: String) : RunEvent
    data class StepFailed(override val sequence: Int, val stepId: String, val error: String) : RunEvent
}

Replay as the recovery mechanism

The part I most wanted to demonstrate, because it's the part that feels like magic until you've built a tiny version of it, is that "resume after a crash" and "compute current state" are the same operation. The toy includes a replay function that folds a run's event list into a RunState, and the runner's "what should I do next" logic calls exactly that function, every time, rather than consulting any separately maintained status field.

fun replay(events: List<RunEvent>): RunState =
    events.fold(RunState.initial()) { state, event ->
        when (event) {
            is RunEvent.StepStarted   -> state.markRunning(event.stepId)
            is RunEvent.StepCompleted -> state.markCompleted(event.stepId, event.result)
            is RunEvent.StepFailed    -> state.markFailed(event.stepId, event.error)
        }
    }

To "simulate a crash," the demo just throws away the in-memory RunState and calls replay again from the stored events. Watching the reconstructed state come out identical to the one that existed before the simulated crash is, I think, the clearest way to feel why event sourcing buys you crash recovery, it's not a separate feature, it's a direct consequence of "current state is a pure function of the event log."

Bounded retry with backoff

The toy also includes a minimal retry loop, attempt a step, and on failure, append a StepFailed event and schedule a retry with exponential backoff up to a configured maximum, after which the run is marked terminal. This is the smallest possible version of the ideas in the retries and idempotency article, it doesn't attempt idempotency keys or external side-effect deduplication, because doing that convincingly requires a real external system to deduplicate against, which would have made the demo much larger without making the core idea any clearer.

What this demo deliberately doesn't do

It has no concurrency control, no real persistence, no checkpointing distinct from the event log itself, and no notion of multiple workers competing for the same run. All of those are real and important in a production system, and all of them would have obscured, rather than clarified, the one idea this toy exists to demonstrate: that durability comes from making "what happened" the single source of truth, and deriving everything else from it.

Where to find it

The full sketch, around 200 lines, including a small test harness that simulates failures and crashes, will be published as a standalone repository. We'll link it here once it's up: github.com/vectorbea/durable-runner-demo (placeholder, repository to be published).

If you build something similar yourself, I'd genuinely like to hear what you found confusing or surprising, that's usually the most useful signal for what to write about next.