Engineering durable AI workflows.
Notes from the team building Vectorbea: a durable, long-running visual agentic workflow builder. Execution guarantees, checkpoints, retries, approvals, BYOK, and worker orchestration, written up as we build them.
- 12:04:01.203RUN_STARTEDrun_8f2e checkpoint=0
- 12:04:01.812STEP_COMPLETEDfetch_ticket → ok (412ms)
- 12:04:02.044APPROVAL_REQUESTEDgate=publish_release awaiting human
- 12:04:58.391APPROVAL_GRANTEDby=susmit reason='looks good'
- 12:04:58.402STEP_RETRIEDdeploy_worker attempt=2/5
- 12:05:00.118RUN_CHECKPOINTEDcheckpoint=4 durable=true
$ tail -f events.log ▌
Featured
FeaturedWhy Long-Running AI Workflows Need Durable Execution
Async jobs and retry decorators get you most of the way to a working agent, and then they don't. Here's why we built Vectorbea around durable execution from day one.
Building Vectorbea
A running series on the design and engineering decisions behind Vectorbea's durable execution engine: from event history to approval gates to BYOK.
- Part 1Why Long-Running AI Workflows Need Durable Execution
- Part 3Designing Event History as a Primitive for AI Workflows
- Part 4Retries, Resume, and Idempotency: The Unglamorous Core of Reliability
- Part 5Human Approval Gates in Agentic Systems
- Part 6BYOK Architecture for an AI SaaS: Benefits, Risks, and Boundaries
- Part 7Worker Scaling with Redis Streams: Consumer Groups, PEL, and When to Reach for Kafka
- Part 8Cost Budgets and Rate Limits for Agentic Workflows
- Part 9Self-Correction Loops for Failed Workflows: Blind Retry Isn't Intelligence
- Part 10Lessons from Building Vectorbea v1
Browse by category
Agentic Systems
Approval gates, self-correction, and human-in-the-loop design.
Execution Engine
Checkpoints, retries, replay, and the core run model.
Infrastructure
Workers, queues, Redis Streams, and scaling.
Lessons
Retrospectives from building and shipping Vectorbea.
Reliability
Idempotency, retries, resume, and failure handling.
Security
BYOK, secret boundaries, and trust models.
Recent articles
View all →Lessons from Building Vectorbea v1
What we'd keep and what we'd change across UI, backend, security, observability, and positioning, after shipping the first version of Vectorbea's durable workflow engine.
Self-Correction Loops for Failed Workflows: Blind Retry Isn't Intelligence
The difference between retrying a failed step and helping a workflow understand why it failed, error classification, bounded self-correction, and where we draw the line and call a human.
Cost Budgets and Rate Limits for Agentic Workflows
How we estimate token costs before and during a run, enforce per-run and per-workspace budgets, apply rate limits, and build kill switches that actually stop a runaway workflow.
Worker Scaling with Redis Streams: Consumer Groups, PEL, and When to Reach for Kafka
How Vectorbea's worker fleet pulls work from Redis Streams, consumer groups, the pending entries list, retry and DLQ handling, and the honest answer to 'why not Kafka?'
BYOK Architecture for an AI SaaS: Benefits, Risks, and Boundaries
Why we let customers bring their own LLM provider keys, what it costs them and us, and the security boundaries we think any BYOK system needs, without the implementation specifics.
Human Approval Gates in Agentic Systems
Modeling 'wait for a human' as a first-class workflow step, waiting states, timeouts, escalation, and why the audit trail has to be airtight.
About this project
Vectorbea is a durable, long-running visual agentic workflow builder. It gives teams execution guarantees, retry and resume, checkpointing, human approval gates, BYOK LLM keys, worker orchestration, full event history, run timelines, cost budgets, and rate limits. Vectorbea Engineering is where we write publicly about the decisions, tradeoffs, and mistakes behind it. The product is private; these notes are not marketing copy , they're the kind of write-up we'd want to read if we were building something similar.
More about this blog and its author →