Workflows vs Agents: Why Most AI Automation Fails

The Distinction That Changes Everything

This framework comes from Anthropic's Agent SDK principles, and I've battle-tested it across my own operations for three months.

Agents are like Claude Code. You talk to them in natural language. They take autonomous action. They make judgment calls. They decide what to do next based on ambiguous inputs. Good for: discovery call strategy, revenue path brainstorming, deal negotiation prep, ICP research, content strategy.

Workflows are like GitHub Actions. Defined inputs, defined outputs, deterministic steps. The same input produces the same output every time. No judgment required. Good for: daily sales prep, Slack ticket sweeps, post-call deal detection, SEO metric check-ins, CRM hygiene.

Both can be built on the same infrastructure. Both use language models. But they serve fundamentally different purposes, and confusing them is the most expensive mistake in AI automation.

Here's the table I use to route every new automation:

Task	Should Be	Why
Daily sales prep	Workflow	Same steps every day, defined inputs and outputs
Slack ticket sweep	Workflow	Scan, classify, create ticket — deterministic
Post-call deal detection	Workflow	Check transcript, extract signals, create CRM record
SEO check-ins	Workflow	Pull metrics, format, post — no judgment needed
Discovery call strategy	Agent	Needs judgment, context-dependent
Revenue path brainstorming	Agent	Needs deep thinking across multiple data points
Deal negotiation prep	Agent	Needs judgment on positioning

The Daily-Sales-Prep Disaster: A Case Study in Over-Engineering

Let me tell you about my most expensive routing mistake.

Early in building our AI operations, I created a "daily-sales-prep" system. It was supposed to review our pipeline every morning and post a summary to Slack. Simple enough, right?

I built it as an agent. A full Claude Code session that would wake up, read from our CRM (Attio), analyze the pipeline, identify priorities, draft action items, and post a rich summary to Slack.

Here's what happened:

The agent produced wall-of-text DMs. Each morning message was 800+ words of analysis, recommendations, contextual notes, and "things to think about." It was thorough, creative, and completely impossible to action at 7am while drinking coffee.

Worse, it created tasks. 106 Attio tasks over the course of several weeks. Tasks like "Follow up with prospect X about their Q3 planning" and "Research competitor Y's new pricing." All reasonable. All unactioned. Because nobody — including me — had the context or time to process 106 tasks created by an autonomous agent.

The quote from our internal sales playbook, written after I finally killed this system: "Don't use Attio for task management. Attio = CRM data only. Urgent alerts go to Slack. Execution goes to Claude Code sessions."

What It Should Have Been

A workflow. Defined steps:

Pull pipeline data from Attio API
Format into a table: deal name, stage, value, days since last activity, next step
Flag anything with >7 days of inactivity
Post the table to Slack. No analysis. No recommendations. No tasks.

That's it. Deterministic. Same input, same output format. Takes 2 seconds to scan at 7am. No 800-word essay. No orphaned tasks. And it could run on the cheapest model available because there's zero judgment involved.

The agent version cost 10x more in API credits and produced output that was worse for its intended purpose. Over-engineering disguised as capability.

The Cost Difference Is Staggering

This isn't just about quality. The economics of workflows vs agents are dramatically different.

Agents run on expensive models. If you want genuine judgment, ambiguity handling, and autonomous decision-making, you need the best model available (currently Opus-class). These models are slower and cost significantly more per token.

Workflows can run on cheap models. When the task is deterministic — pull data, format it, apply rules, output result — you can use Haiku or equivalent. Fast, cheap, reliable. The model doesn't need to "think" because the workflow already defines what to do.

The math for my operations:

Agent session (discovery call prep, strategic analysis): uses the full-capability model, takes 5-10 minutes of compute, costs meaningfully per session
Workflow execution (daily pipeline summary, content publishing, SEO metrics): could use a model that costs a fraction per session, runs in seconds

Proper routing = more automation at lower cost.

Most of what I automated in the first two months was built as agents. Once I reclassified them, roughly 60% of our automation moved to the workflow bucket. That's 60% of our compute running on cheaper models with better reliability.

YC is now funding AI-native agencies because AI changes the economics — the margin structure looks like software, not services. But that's only true if you route correctly. Run everything as agents and your margins look like a consultancy with an expensive compute habit.

The Decision Framework: When to Use Which

Here's how I decide whether something should be an agent or a workflow. I ask three questions:

Question 1: Does the same input always require the same output?

If yes: workflow. The daily pipeline summary always needs the same format. A sitemap update always follows the same steps. A ticket classification always uses the same categories.

If no: agent. A discovery call prep varies based on the prospect's industry, company size, and what we already know about them. A content strategy depends on what's already published, what's trending, and what the competitive landscape looks like.

Question 2: Does this require judgment about ambiguous information?

If no: workflow. Checking whether an Apollo sequence has contacts enrolled is binary — it does or it doesn't. Formatting CRM data into a Slack message is mechanical. Generating a sitemap from a JSON file is deterministic.

If yes: agent. Deciding which deal to prioritize this week requires weighing multiple factors. Writing outbound copy that resonates with a specific ICP requires understanding context and nuance. Analyzing why a content piece isn't ranking requires investigative reasoning.

Question 3: Would I be comfortable with this running identically every time?

If yes: workflow. I want my daily sales data in the same format every single day. Consistency is the feature.

If no: agent. I want my weekly strategic recommendations to adapt based on what happened this week. Variability is the feature.

In practice, this means about 60-70% of business automation should be workflows. The industry treats it as 100% agents because "AI agent" is the buzzword. That's expensive and unreliable.

Real Examples From My Operations

Let me walk through specific automations and how I reclassified them:

Pipeline Monitoring (Reclassified: Agent → Workflow)

Before: Holden (CRO agent) ran a full analysis session every morning — reading CRM data, thinking about priorities, generating recommendations.

After: A workflow pulls deal data from Attio, formats it into a standard table, flags staleness. Holden still does the weekly strategic analysis (that genuinely needs agent judgment), but the daily check is now a workflow.

Content Publishing (Already a Workflow)

Prax's publishing pipeline was correctly built as a workflow from the start: read JSON data, run Python generator, output HTML, push to Git, auto-deploy. No judgment required. This runs on the cheapest model possible and has never failed in a way that required debugging the AI — only debugging the templates.

Outbound Sequence Building (Correctly an Agent)

Amos building outbound sequences is genuinely agent work. He needs to understand the ICP, adapt messaging angles based on the target segment, choose value propositions, and craft emails that sound human. You can't deterministically generate a cold email that works — it requires judgment about the prospect's likely pain points and what will resonate. Agent work, correctly classified.

Post-Call Processing (Reclassified: Agent → Workflow + Agent)

The Fireflies-to-CRM pipeline was built as one big agent session: read transcript, analyze, classify, extract, update CRM. The problem: the classification step ("is this a sales call or an internal meeting?") needs a human check that breaks the autonomous flow.

The fix: split it. A workflow handles the deterministic parts (transcript arrives → extract participant names → check against CRM → format summary). An agent handles the judgment part (analyze conversation for deal signals, draft follow-up recommendations). The human approves the classification. Three components instead of one, each matched to the right tool.

How to Audit Your Own AI Automation

If you're already running AI automation, here's how to apply this framework:

Step 1: List Everything

Write down every automated or AI-assisted process in your operations. Include half-built things, things you tried and abandoned, and things running on autopilot that you haven't checked in weeks.

Step 2: Classify Each One

For each item, ask the three questions from the decision framework. Mark it as "agent" (genuinely needs judgment), "workflow" (deterministic, could be templated), or "hybrid" (has both components).

Step 3: Identify the Expensive Mistakes

Any item classified as "workflow" that's currently running as an agent session is costing you too much. These are your quick wins — reclassify them, simplify the implementation, and move them to cheaper models.

Step 4: Split the Hybrids

Items with both workflow and agent components should be split. The deterministic parts run as workflows. The judgment parts run as agents. The human handles the handoff points between them.

When I did this audit for my own operations, 60% of what I'd built as "agents" should have been workflows. The reclassification reduced costs and improved reliability — because workflows don't hallucinate, don't go off-script, and don't produce 800-word Slack messages when you need a 5-line table.

The industry is obsessed with building agents. The unsexy truth is that most business automation should be boring, deterministic, and cheap. Save the expensive, autonomous, judgment-heavy agent work for where it actually matters. That's how you get AI economics that look like software margins instead of consulting margins.

Frequently Asked Questions

What's the difference between an AI agent and an AI workflow?

An AI agent operates autonomously, makes judgment calls, and can take different actions based on ambiguous inputs — like a senior employee figuring out what to do. An AI workflow follows deterministic steps with defined inputs and outputs — like a checklist or a script. Both use language models, but agents need expensive, capable models while workflows can run on fast, cheap ones.

Can I use the same AI model for both agents and workflows?

Technically yes, but you shouldn't. Running a deterministic workflow on an expensive agent-class model wastes money and often produces worse results (the model 'overthinks' simple tasks). Use the cheapest model that reliably handles the defined steps for workflows, and reserve capable models for genuine agent work that requires judgment.

How do I know if my automation is over-engineered?

Three red flags: (1) the output varies significantly between runs even when the input is the same — you probably have an agent doing workflow work; (2) you're paying for a powerful model to do something that could be a template — cost signal; (3) the AI produces long, verbose output when you just need a table or a status update — it's 'thinking' when it should be 'formatting.' If any of these apply, you likely have an agent where you need a workflow.

What tools do you use for workflows vs agents?

Both run through Claude Code in my setup. The difference is in the prompt design and model selection, not the tooling. Workflows get strict, templated prompts with exact output format specifications. Agents get open-ended briefs with role definitions and judgment guidelines. For scheduling, both use launchd on a Mac Mini for autonomous execution. The infrastructure is the same — the routing logic is what differs.

⚡

This article was drafted by an AI agent and reviewed by Gregor Spielmann. The source material, frameworks, and experiences are real. The writing is AI-assisted. Learn how this site works.