cvoya

Rebuilding My AI Team in Twelve Days — And Why

In February, I wrote about the small team I’d stood up instead of hiring humans: Ada and Dijkstra on the backend, Kay on the frontend, Hopper on infrastructure, and a Python “team leader” watching a GitHub project board, handing out work, arbitrating plan reviews, and collecting PRs. It was held together by Python modules, git worktrees, and a handful of careful conventions. It worked. I even had a multi-tenant, hosted service that others could use via an AWS-hosted virtual machine. I shipped more, and faster, than I had any right to.

Screenshots of v1 in action

Two months on, I’ve rewritten the whole thing.

The platform has a name — Spring Voyage — and I am about to be open-source it. This post is the story of why I rewrote it, what v1 actually taught me, and why the second version took roughly ten days instead of the six-to-eight months Claude Code estimated when I asked it.

Why rewrite something that worked

Two things happened between February and April.

The first was that the rest of the ecosystem caught up. Claude Code, Codex, Gemini, and a growing list of others started shipping the coordination features that were the whole point of v1 — parallel sessions, plan-then-implement flows, even rudimentary multi-agent orchestration. The raw “I can now point N agents at N issues” capability stopped being something I had to build. That was fine. Good, even. My own scaffolding was always a means, not an end.

The second was more important, and it was always going to break v1 eventually. Everything about it was hardwired to software engineering on GitHub. Issues were work items. Labels were state. PRs were output. Webhooks were the event bus. The workflow — READY → IN_PROGRESS → PLAN_REVIEW → IMPLEMENTATION → PR_REVIEW → MERGED (and this is the simplified version of the implemented state machine) — was baked into the Python team leader. The team leader itself was baked into the idea that there was one human (me), one repo per agent, and one kind of work in the world.

None of that is wrong as a first draft. But the pattern I kept seeing in v1 wasn’t really about code. It was about a group of autonomous things with different expertise, doing different kinds of work, with a shared sense of what was happening and a human occasionally in the loop. That’s a domain-agnostic pattern. GitHub was scaffolding; the scaffolding had started to dictate the building.

So, at the beginning of April I spent 2-3 days writing down what v2 would have to be and then iterated on a plan with Claude Code.

The plan, and the twelve days

The plan document got long. By the time it was coherent it was roughly 15,000 words — 2,500 lines of architecture spec covering actors, messaging, units, connectors, orchestration, initiative, cloning, workflows, observability, tenancy, packaging, the CLI, deployment, and a future-work section for a cognitive memory backbone. I asked Claude Code for an estimate. It came back with “6–8 months, realistically.”

I started on 2026-04-08. Today is 2026-04-20. The git log for the brand new Spring Voyage repository says 329 PRs, 413 issues, 225K lines of code, 120 files/18.3K lines documentation, across those twelve days. Most of what I wanted is implemented or the foundations are there. There is much more to come.

Some example features:

actor-based runtime (using Dapr)
pluggable orchestration strategies
CLI
per-collaboration memory, learning/gaining expertise across tasks/collaborations
Web portal
A2A wiring
agent cloning
agent initiative

(realtime) activity stream
cost tracking
connectors to third party systems (users can still use GitHub as their main interface)
package system for new agents, runtimes, skills
deployment
expertise discovery
and more

There are many edges/bugs that still need work — there always are — but the substance is there. I am working hard to harden the initial version before releasing it to the world. There is also going to be a hosted service… More on that soon.

I want to be honest about that 12-days number, because it’s real and it’s also easy to misread. The twelve days were not agents running autonomously while I slept — that’s the Spring Voyage someday, not the Spring Voyage today. The ten days were me, overseeing, being engaged when online, driving a small team of Claude Code’s parallel agents — somewhere between one and ten at a time — from a laptop or, when I wasn’t at a desk, from my phone via Claude’s remote control. Once the plan was in place, Claude Code and I spent real time planning *execution*: breaking the work into phases, grouping issues that could move at the same time, sequencing them to maximize parallelism without agents stepping on each other, and only then opening the floodgates.

I was instructing, reviewing, clarifying, catching over-engineering, asking for PRs that had grown too big to be split, and always thinking about architecture. Less so about implementation. There were conflicts. There was drift. And there were bugs! The speed was a product of attention, not an absence of it. A lot of thought was put in the instructions for how execution should happen. Those instructions evolved over time.

A few working conventions turned into a small rulebook I re-applied run after run: keep PRs small and reviewable; if you find a problem mid-flight, file a GitHub issue rather than fixing it inline, so nothing gets lost; push back on scope creep before it compounds; prefer capturing a question over guessing at the answer. These were some of my instructions to the agents. Perhaps the details of that rulebook and how I work with coding agents is another post.

The typing of code was parallelized and delegated. The judgement wasn’t.

What the twelve-day number hides, then, is that the thinking wasn’t in the twelve days. It was in the plan, in every hour of v1 that taught me which abstractions to trust, and in the lesson I wrote about in January — “knowing what to ask matters more than I expected.” That’s still the skill. The typing of code was parallelized and delegated. The judgement wasn’t.

What v1 actually taught me

The v1-to-v2 rewrite isn’t about which framework is trendier. It’s about specific things v1 got in my way about, which I wrote down and then designed against.

GitHub was scaffolding, not substance. A product team working with documents, a research agent reading papers, a creative team living in Figma — none of these fit v1, because v1’s bloodstream was GitHub issues and PRs. v2 replaces that with connectors: pluggable adapters that bridge an external system (GitHub, Slack, Linear, Figma, Notion, arxiv, a webhook, a cron job) to the platform. The same agent pattern — receive work, plan/discuss/brainstorm, do, deliver, learn — now works anywhere there’s a connector.

There isn’t one right way to orchestrate. Sometimes you want a rigid workflow — triage, assign, implement, review, merge, in that order, every time. Sometimes you want an LLM to read the room and decide. Sometimes you want a hybrid: a workflow skeleton with LLM judgment inside each phase. Sometimes you want to broadcast to a peer group and let initiative decide. v2 makes this a first-class choice: an IOrchestrationStrategy per unit, with five implementations shipped (rule-based, workflow, AI-orchestrated, AI+workflow hybrid, peer) and room for users to bring their own. The v1 Python team leader is now one strategy among five, not the whole architecture.

Flat teams don’t compose. v1 had one organizational shape: a team of peers under a leader. v2 has units — composite agents that can contain other units recursively. A unit is an agent. From the outside, a five-engineer backend team looks like a single agent you can talk to. Inside, it might be an opaque workflow coordinating 10 agents; in another unit, it might be a transparent mesh of directly-reachable sub-units. Boundaries are configurable. Units can decide to filter, synthesize, project the activities of their agents or sub-units.

Single-human was a floor, not a ceiling. v1 assumed one user. v2 has a HumanActor, unit-scoped permissions, and a clean path to multi-tenant isolation. The open-source core stays single-tenant-agnostic; the commercial extension layers tenancy on top via DI.

Talking to agents directly should be easy. In v1, “talking to an agent” meant opening a GitHub issue and writing a comment. In v2, every agent, unit, connector, and human has an address — agent://engineering-team/ada, human://engineering-team/savasp, connector://engineering-team/github — and a typed Message carries the content. Humans, CLI, portal, other agents: same primitive. Connectors act as the bridge between Spring Voyage’s messages and their own primitives. So yes… you can still write that GitHub comment and it will find its way to the right agent.

Python’s runtime surprises wear you down. v1 had thousands of tests and still lost hours to errors that only fired in production-shaped environments. v2 puts the platform plumbing — actors, routing, persistence, workflow — in .NET 10. Agent brains can still be Python (Dapr Agents, ADK, custom), but they talk to the platform through the Dapr sidecar, not through typeless bindings.

Cloning should be the platform’s job. Needing three backend engineers at once in v1 meant defining three identical agents. v2 has platform-managed cloning: ephemeral (with or without memory carry-back), or persistent, with an explicit attachment model for how clones relate to their parent unit.

Humans should be able to see what’s happening. v1 observability was log-diving. v2 emits structured ActivityEvent‘s through Rx.NET streams that flow to a portal, to SSE subscribers, and — crucially — to other agents, so one agent can observe another’s work without being in its chain of command.

Agents shouldn’t only wait. v1 agents were human-in-the-loop by construction — they waited for an assignment, did the work, waited for review, waited for the next assignment. That’s fine for some jobs and wrong for most. v2 adds initiative as a first-class property of an agent, with four levels: passive (only acts on explicit triggers), attentive (watches events, decides whether to act), proactive (adjusts its own schedule, picks from an allowed action set), and autonomous (creates its own triggers and subscriptions). A two-tier cognition loop keeps cost honest — a cheap local model screens the firehose, and only the 1–2% of events that survive screening ever reach the expensive model. And because humans are addressable — human://engineering-team/savasp is a real address — an agent that decides it needs input can send a message to a person. Agents now reach out instead of only being reached for. A research agent notices a paper worth flagging; a backend agent notices that a migration should be re-run; a monitor agent notices that a connector’s been degraded for six hours. They don’t wait to be asked.

The shape of v2, quickly

If I had to compress v2 into a paragraph: every agent is a Dapr virtual actor with a partitioned mailbox. Agents are grouped into units; units are agents too. Work moves as typed messages between addresses. Each unit picks a pluggable orchestration strategy — including “bring your own.” Connectors adapt external systems in. The platform speaks A2A (Google’s open agent-to-agent protocol) on the outside and in, which means a LangGraph graph, an ADK agent, or any custom process can participate as a unit member or drive a unit as an external orchestrator, without caring what language or framework the rest of the team uses. Agents can take initiative at four levels, governed by a cheap-LLM-screens-expensive-LLM cognition loop. Cost tracking is in the substrate.

The whole thing is language-agnostic where it matters (agent brains) and type-safe where it matters (the infrastructure). That was the shape that kept surviving every argument I had with the plan.

What this lets you do

The architecture is the means; here is what falls out of it for whoever is actually using the platform. At least that’s the goal.

You can stand up a team of AI agents for whatever kind of work you can describe — software, research, product, operations, creative — without rebuilding anything underneath. You can connect that team to the systems where the work actually happens (GitHub, Slack, Linear, Notion, Figma, arxiv, a webhook, a cron) without writing platform code. You can pick how the team coordinates — rigid workflow, LLM judgement, hybrid, peer broadcast — or write your own strategy if none of those fits. You can use whichever LLM you trust (Claude, OpenAI, Ollama, Gemini, Codex) and mix providers across agents in the same unit. You can bring in agents built in other frameworks — Google ADK, LangGraph, anything that speaks A2A — and have them participate as first-class members alongside native ones.

You can talk to your team from the CLI, the portal, or by sending a message. They can talk back, and — for the first time in this lineage — they can reach out to you without being asked first. You can bring colleagues in with scoped permissions. You can watch every agent’s activity in real time and see what every action costs. When something needs doing differently, you don’t fork the platform; you change a unit’s configuration.

Your agents can gather experiences, learn, evolve. They can become more useful over time, more in-tune with your and your organization’s work and goals.

Self-host it on a workstation or a single VPS. When the hosted service is up, hand the operations off and keep the same workflow.

A small example, concretely

None of the above is worth much if using it is hard. Here is roughly what creating a small team and talking to it looks like from the spring CLI — abbreviated from Spring Voyage’s getting started guide.

That’s the minimum. Declarative YAML, connectors, policies, cloning, initiative, cost budgets, and multi-human access are all layered on top — but the first experience is meant to fit on a single screen.

# Create the unit
spring unit create engineering-team

# Create two agents, add them to the unit
spring agent create ada \
  --role backend-engineer \
  --capabilities "csharp,postgresql" \
  --ai-backend claude --execution delegated --tool claude-code

spring agent create kay \
  --role frontend-engineer \
  --capabilities "typescript,react" \
  --ai-backend claude --execution delegated --tool claude-code

spring unit members add engineering-team ada
spring unit members add engineering-team kay

# Start the unit and send it the first message
spring unit start engineering-team
spring message send agent://engineering-team/ada \
  "Review the README and suggest improvements"

# Watch what the team does, in real time
spring activity stream --unit engineering-team

The same thing, from the portal

Everything above has a matching surface in the web portal. A few in-progress screenshots:

The next test: can a way of working become a unit?

The interesting thing about how I used Claude Code to build v2 is that the way I worked — the phase breakdown, the parallelism shaping, the “small PRs, file an issue, don’t solve it inline” rulebook, the feature-creep pushback — is itself a form of expertise. It was tacit, it lived in my head, and I wrote it down for the agents run after run until it stopped living only in my head.

That’s the shape of a Spring Voyage unit. An orchestration strategy for phasing and grouping work. A set of skills for the agents. A set of policies that say “PRs must be small” and “capture problems as issues.” Agent instructions that carry the judgement about when to push back on scope. The conventions I was re-typing into a chat window are, one level up, a configuration — a software-engineering unit that another person could adopt, tune, and improve.

The real question v2 is there to answer is whether the same trick works for other domains. Can a research unit capture the way a good researcher decomposes a literature review? Can a product-management unit carry the judgement a senior PM uses when a roadmap starts to bend? Can an operations unit encode how a team actually runs an on-call rotation? And — the harder question — can agents within those units evolve, over months of actually doing the work, so that they become useful to a specific person or a specific organisation, rather than generically capable at everything?

That’s the bet the platform is built around. It’s also the clean hand-off to the next part.

The personal-intelligence thread

I haven’t forgotten the personal-intelligence project, the reason I formed CVOYA in the first place.

After the first rounds of UX feedback, I’ve been pivoting. Some of the ideas coming out of that have been the best I’ve had on it in a long time. I’ll write about them separately — this post is already long.

But the architectural joke of v2, if you squint, is that Spring Voyage has a specific shape of hole in it, exactly where a personal-intelligence system wants to plug in. The agent platform defines three interfaces — how to plug a memory store, hot to plug a cognition provider, and how to track expertise — with default implementations that are unglamorous but functional (a Postgres key-value memory, straight LLM calls, static expertise profiles). An external observer can replace any of them.

When that observer is a personal-intelligence system, something interesting happens. Agents stop being processes that forget. They accumulate experience across conversations, recognize patterns in the work they do, watch their expertise profile evolve from what they actually handle rather than what their YAML says, and — eventually — spawn sub-agents for sub-specializations they’ve grown into. The personal-intelligence infrastructure becomes the cognitive backbone for a whole team of AI collaborators, not just for one human.

I’m not claiming that works today. I’m claiming the platform is designed so that it can. The integration points are in the codebase right now.

That’s the through-line I care about most. Spring Voyage is CVOYA’s AI teaming platform; the personal-intelligence work is CVOYA’s expertise/knowledge layer for humans and agents alike. They were always going to meet in the middle. v2 is the version where the meeting point is explicit.

What’s next

Spring Voyage is about to go public as an open-source project. It will be self-hostable — a single docker compose up on a Virtual Private Server, or a Podman deployment if that’s your preference — and usable as a standalone platform for personal or small-team use. A hosted service will offer the multi-tenancy, SSO, billing, and a small set of advanced features on top, for people who don’t want to operate it themselves.

I’ll write a proper launch post when that happens, with the “why open source,” the license rationale, and how to contribute. This post is the “how we got here” half.

A note on what I care as I build

The thing I care about in v2 isn’t speed. Don’t get me wrong. I like building stuff fast. In fact, faster than ever in my career, even when I had small teams working under my direction. I care about user value and system observability.

That’s the point. I don’t want AI agents that are impressive. I want AI agents that are legible and useful — legible, so that I, other humans, and other agents can see what’s happening and why; useful, so that the legibility earns its keep. Neither is worth much without the other. An inscrutable agent that ships work is a liability. A fully-observable agent that never does anything useful is a toy. v2 is a substrate that hopefully makes both possible at the same time.

We’ll see if it holds up. For now, I have a “team” to get back to.

—

Spring Voyage will be released as open source under BSL 1.1 (converts to Apache 2.0 on 2030-04-10). Stay tuned for follow up posts.

Savas Parastatidis

Savas Parastatidis works at Amazon as a Sr. Principal Engineer in Alexa AI'. Previously, he worked at Microsoft where he co-founded Cortana and led the effort as the team's architect. While at Microsoft, Savas also worked on distributed data storage and high-performance data processing technologies. He was involved in various e-Science projects while at Microsoft Research where he also investigated technologies related to knowledge representation & reasoning. Savas also worked on language understanding technologies at Facebook. Prior to joining Microsoft, Savas was a Principal Research Associate at Newcastle University where he undertook research in the areas of distributed, service-oriented computing and e-Science. He was also the Chief Software Architect at the North-East Regional e-Science Centre where he oversaw the architecture and the application of Web Services technologies for a number of large research projects. Savas worked as a Senior Software Engineer for Hewlett Packard where he co-lead the R&D effort for the industry's Web Service transactions service and protocol. You can find out more about Savas at https://savas.me/about