Why We Dropped the Planner from Our Agent Architecture
Most computer use agent architectures follow the same pattern: a high-level planner decides the strategy, a mid-level coordinator breaks it into subtasks, and a worker executes each step. Three layers, three sets of model calls, three opportunities for miscommunication.
This made sense when language models were weaker. The worker needed guidance because it could not reliably handle multi-step reasoning on its own. The planner existed to compensate for the worker's limitations.
What Changed With Modern Language Models
But models have gotten dramatically better at multi-step reasoning in the last 18 months. When we actually examined what our planner was producing, the worker could have figured most of it out on its own. The planner was adding latency without adding intelligence.
So we tried something simple: remove the planner entirely. One model, one call per action. The model sees a screenshot, its conversation history, and the task description. It decides what to do next.
The Results: 3x Faster With Simpler Architecture
The result surprised us. Tasks completed roughly three times faster. Not because the individual model calls were faster, but because we eliminated all the coordination overhead between layers: passing context between agents, re-encoding screenshots at each layer, synchronizing state.
More importantly, debugging got dramatically easier. With a hierarchy, when something goes wrong, you have to figure out which layer made the mistake. Was the planner's strategy wrong? Did the coordinator misinterpret the plan? Did the worker execute correctly but on the wrong subtask? With a flat architecture, there is one conversation to inspect. The reasoning chain is transparent.
The tradeoff is real: a hierarchical agent can theoretically handle more complex planning scenarios. But for the types of tasks that production desktop automation requires (navigating forms, filling fields, clicking through dialogs), the "planning" is straightforward enough that a single capable model handles it without help.
Where Planning Still Matters
For the small percentage of tasks that do need structured planning, we handle that at the workflow definition level on the platform, not inside the agent. The platform orchestrates the high-level steps. The agent handles each step autonomously.
Sometimes the best engineering decision is removing something.
Want to see this in action?
We ship EHR automations in weeks, not months. See what production looks like for your workflows.
Book a Demo