Building an AI Agent Team for a Consulting Business

Why I Built an Agent Team Instead of Hiring

Adasight is a four-person consultancy. The team: myself (head of sales/GTM), Dayana (CEO/operations), Zain (experimentation lead), and Olivia (data analyst). Revenue in 2025 was EUR 417K with a 63% profit margin.

The bottleneck was not delivery -- the team is strong. The bottleneck was demand generation. I was personally responsible for all outbound, all content, all pipeline management, and all strategic planning. To scale past EUR 40K/month, I needed to either hire 2-3 junior people for GTM execution or find another way.

Hiring meant: EUR 3,000-4,000/month per person, onboarding time, management overhead, and the reality that most GTM execution work is structured and repeatable. The kind of work that AI agents are designed for.

So I built instead of hired. The bet: an AI agent team could handle the structured execution layer of GTM while I focused on the strategic and relational work that only a human can do.

The Architecture: From Concept to System

The system evolved through three phases:

Phase 1: Single-agent experiments (Week 1-2). I started with one agent -- an outbound prospecting researcher. Gave it access to Apollo, a target ICP definition, and a clear brief. It worked. Not perfectly, but well enough to prove the concept.

Phase 2: Role specialization (Week 2-4). I expanded to 5 agents, each with a specific domain: sales, growth, content, dev, and coordination. This is where I learned that agent definitions matter more than agent code. A well-defined role with clear principles produces better output than a technically sophisticated but vaguely defined agent.

Phase 3: Full team with coordination (Week 4-6). The 10-agent roster was complete by mid-February. The hard part was not building individual agents -- it was making them work together. Coordination is the real engineering challenge.

The technical stack: Claude Code for all agent sessions. Supabase (PostgreSQL) for coordination, handovers, and persistent memory. A Mac Mini as the always-on execution server running scheduled tasks via launchd. GitHub for version control. Cloudflare for hosting anything public-facing.

The Agent Definition Format

Every agent follows the same definition template. This is the most important design decision in the system -- more important than any technical choice.

Each definition file includes:

Role and North Star KPI. One sentence that defines success. Holden's: 'Pipeline value and revenue closure rate.' Anna's: 'Gregor's leverage ratio -- output per hour invested.'

Responsibilities. Specific, bounded list. What the agent does and -- critically -- where its authority ends.

Principles. Modeled after real-world experts. Each agent has 2-3 'primary influences' whose thinking shapes how it approaches work. Naomi's principles come from Charity Majors (observability), Chip Huyen (production AI), and Hamel Husain (eval-driven development). This is not cosmetic -- it genuinely changes the quality and character of agent output.

Anti-brief. What the agent refuses to do. This prevents scope creep and keeps agents focused. Naomi refuses to ship without PM sign-off. Bobbie refuses to send copy that has not passed quality gates.

Coordination protocol. How the agent reads handovers, files results, and escalates blockers.

Learning schedule. Topics the agent monitors weekly to stay current. This keeps agent knowledge from going stale.

The Coordination Layer

The coordination problem nearly killed the project. Individual agents worked fine in isolation. Getting them to collaborate was a different challenge entirely.

Attempt 1: File-based handovers. Agents wrote results to shared markdown files. Other agents read them. This devolved into chaos within a week -- files got overwritten, there was no status tracking, and nobody knew which files were current.

Attempt 2: Slack-based coordination. Agents posted updates and handovers to Slack channels. Better for visibility but terrible for reliability -- messages got lost in noise, there was no structured way to claim or complete tasks, and the signal-to-noise ratio was unworkable.

Attempt 3 (current): Supabase coordination tables. Two core tables: agent_handovers (task routing with status tracking) and agent_memory (persistent knowledge per agent and team-wide). An agent picks up a handover by marking it in-progress, completes the work, and files the result back with a status update.

This works. It is not elegant. It is database rows. But it provides the three things coordination needs: state tracking (who is doing what), history (what was done), and escalation (what is blocked). The same fundamentals that every human project management system provides.

What Agents Can Do Autonomously vs. What Needs Me

After 6 weeks of operation, here is the honest split:

Fully autonomous (agents handle end-to-end): - Outbound sequence building and contact enrollment - Keyword research and content brief creation - Technical SEO (sitemaps, schema markup, internal linking) - System audits and documentation updates - Weekly plan drafting - Pipeline state reports and revenue briefs

Agent-assisted (AI does 80%, I review): - Content drafting (needs voice calibration review) - Outbound copy (needs tone and accuracy review before sending) - ICP research and segment analysis (needs strategic validation) - Deal record updates (needs factual verification)

Human-only (agents prep, I execute): - Client discovery calls and relationship building - Pricing and deal negotiation - Strategic direction (which markets, which offers, which priorities) - Hiring decisions - Anything that touches existing client relationships

The Failures

I am documenting these because nobody else does, and pretending AI agents just work out of the box is harmful.

The daily sales prep bot. This was supposed to be the headline automation: a daily Slack message with pipeline updates, follow-up reminders, and action items. It produced 106 Attio tasks that nobody completed and a wall-of-text DM that was impossible to action. The architecture was wrong -- I needed a lightweight summary with 3-5 action items, not an exhaustive task generator. Lesson: more output is not better output.

Apollo sequences running empty. Sequences were marked 'active' in Apollo but had 0 contacts enrolled. This went undetected for 2+ weeks because the monitoring workflow checked status, not enrollment numbers. Lesson: monitor the output, not the status.

Cotyar the stub agent. The Finance Monitor agent was defined but never became useful. Financial monitoring requires structured data feeds from accounting software that I have not connected, and the judgment calls around financial health are more nuanced than I initially thought. Some roles are harder to delegate than others, and that is fine.

Memory bloat. Agent memory tables grew without pruning. After 2 weeks, agents were loading outdated context that conflicted with current state. Output quality degraded noticeably. I now run weekly memory maintenance -- pruning old entries, consolidating duplicates, and refreshing key facts.

The Economics

Monthly cost breakdown:

- Claude API credits: ~$80-120/month (varies with usage) - Supabase: Free tier (sufficient for current scale) - Cloudflare Pages: Free tier - Mac Mini electricity + internet: ~$15/month - Apollo: $49/month (would pay this with or without agents)

Total agent-specific cost: approximately $97/month.

The alternative: hiring 2 junior GTM people at EUR 3,500/month each = EUR 7,000/month. Plus onboarding time (my time), management overhead (my time), and the months before they are fully productive.

The agent team was productive within 2 weeks. It costs 1.4% of the hiring alternative. The trade-off: agents cannot build relationships, cannot handle ambiguous strategy, and need human direction. But for structured GTM execution, the economics are not close.

How to Build Your Own (The Minimum Viable Version)

You do not need 10 agents and a coordination database to start. Here is the minimum viable agent team:

Agent 1: Your highest-volume task. Pick the thing you do most often that is structured and repeatable. Write a role definition (1 page). Run it as a Claude Code session. Review every output for 2 weeks.

Agent 2: Your research/analysis task. Something that requires reading lots of information and producing a summary. This is where AI shines -- it does not get tired of reading.

Agent 3: Your quality gate. An agent that reviews the output of Agent 1 before it goes live. This is when you start needing coordination -- even if it is just a shared folder.

Start there. Add coordination infrastructure (Supabase or equivalent) when you hit 4+ agents. Add a Mac Mini or always-on server when you want scheduled execution. Scale the team as you learn what works.

The biggest mistake: building the infrastructure before proving the concept. Start with one agent. Make it useful. Then scale.

Key Lessons

Agent definitions matter more than agent code

The written role definition -- responsibilities, principles, anti-brief -- determines output quality more than any technical architecture choice. Invest 80% of your time here.

Coordination is the real engineering challenge

Building one good agent is easy. Making 10 agents work together required 3 iterations of the coordination system and is still the hardest part of the operation.

Monitor output, not status

Apollo sequences were 'active' with zero contacts. The monitoring checked status when it should have checked results. Always measure what was actually produced.

Memory maintenance is non-optional

Agent memory degrades after ~2 weeks without pruning. Output quality drops noticeably. Schedule weekly memory maintenance from the start.

Not every role can be delegated to AI

Cotyar (Finance Monitor) is still a stub after 6 weeks. Some tasks need structured data feeds or nuanced judgment that is harder to encode than expected. This is fine -- know your limits.

More output is not better output

The daily sales prep bot created 106 tasks nobody completed. The fix was fewer, more actionable outputs. Constrain agent output to what a human can actually act on.

Frequently Asked Questions

How long does it take to build a useful AI agent?

The first useful agent (outbound prospecting) was working within 3 days. The full 10-agent team took 6 weeks. Start with one agent that does one job well -- do not try to build the whole team at once.

Do I need to know how to code to build AI agents?

For the agent definitions: no, these are plain text files. For the coordination infrastructure (Supabase, scheduled tasks, Python generators): yes, currently some technical ability is required. This is an area where tooling is improving rapidly.

What happens when agents produce wrong output?

Quality gates catch most errors. Bobbie reviews outbound copy. Gregor reviews strategic output. The system assumes agents will make mistakes -- the architecture is designed around verification, not blind trust.

Can this scale beyond a solopreneur or small team?

The architecture scales. The coordination layer (Supabase) handles more agents and more handovers without fundamental changes. The limiting factor is the human review bottleneck -- as the team grows, you need to invest more in quality gates and less in direct review.

⚡

This build log was drafted by an AI agent using real project data and reviewed by Gregor Spielmann. The projects, numbers, and lessons are real. Learn how this site works.