When AI Meets Complexity, Architecture Decides

Paul Monasterio
Apr 20, 2026
Apr 20, 2026

AI is improving fast. Today's reasoning models are dramatically more accurate than what existed even six months ago. They can parse complex documents, synthesize information from hundreds of sources, and produce outputs that would have seemed impossible a year ago.

And yet the majority of enterprise AI deployments still fail to deliver measurable results. 

The models aren't the problem. The gap is in what surrounds them.

Where capability meets complexity

Accuracy on a discrete task is not the same as reliability across a complex one. 

A model can extract data from a submission with near-perfect precision - and still fail when the task extends across multiple steps, conflicting inputs, ambiguous context, and decisions that compound over time.

This is the frontier that matters now. AI can increasingly get the right answer on discrete tasks. The question is whether the system around it is architected for the tasks where getting it right is harder to define:

  • When the task requires sustained reasoning across dozens of dependent steps
  • When individual outputs look correct but the cumulative result drifts
  • When there's no straightforward way to verify the end-to-end outcome

A 2026 enterprise AI study found that individual AI users report productivity gains of up to 5x - yet only 29% of organizations see significant ROI from those same tools.

The models didn't fail on the tasks they were given. The systems around them weren't designed for the complexity of how those tasks actually unfold in production.

The gap isn't technical. It's architectural. And it's one that other fields - where the stakes have always been high and the decisions have always been complex - closed a long time ago.

What other fields already know

Medicine, engineering, and organizational management all operate under a version of the same constraint: capable professionals making high-stakes decisions across complex, multi-step processes where conditions shift, information conflicts, and outcomes compound. 

They've spent decades building design frameworks for exactly this, and the principles are transferable. The people building AI applications just haven't borrowed them.

People management: Don’t send the intern to the investor meeting

Every organization runs on a basic insight: calibrate authority to a person’s demonstrated competence. 

Junior staff get supervised, senior staff get autonomy. When someone’s new, you structure their environment so they operate where they're effective and escalate where they're not. As they prove themselves, you expand their scope.

This is exactly how AI should work. But when an organization deploys a model with the same level of oversight across every task - whether the task falls within its proven capability or well outside it - it's the equivalent of letting a first-year analyst draft the earnings call script and present it to the board, without a mechanism for intervention.

The fix isn't a more capable agent. It's better management of the ones you have. Which tasks can be fully automated? Where does the agent need human oversight? Where should it not be operating at all? And importantly: how does that calibration evolve as the agents improve?

These aren't technical questions. They're organizational ones, and most enterprises aren't asking them.

Engineering: No single decision should cascade unchecked

In mechanical and civil engineering, systems are designed around a straightforward principle: when something goes wrong in one component, it shouldn't cascade into everything connected to it. 

This is why planes have independent hydraulic systems, cars have crumple zones, and bridges carry load margins well beyond expected capacity. Engineers don’t expect failure on every flight, drive, or crossing, but they’re very aware that the cost of an unchecked cascade is catastrophic.

Models today are increasingly accurate on discrete tasks. But complex, multi-step processes are inherently vulnerable to compounding errors - and that's where this principle matters for AI.

When a model's output on step one feeds into step two, which informs step three, which shapes a final decision, even a small deviation early in the chain can produce a materially wrong outcome downstream. The longer the task horizon, the more this matters.

Engineering disciplines solve this with structural checkpoints - mechanisms that detect deviation and course correct before it compounds

When an AI system lacks those checkpoints, a wrong answer in step one shapes every decision that follows. The error doesn't announce itself, it just flows forward.

Medicine: Slow down when the case doesn't match the textbook

Medical decision-making is defined by what happens when conditions deviate from what's expected: a patient presents with atypical symptoms, lab results conflict with the clinical picture, and the case doesn't match the textbook.

Clinical systems are designed for exactly these scenarios. They slow down, seek more evidence, bring in specialists, and escalate. The response isn't to proceed with false confidence or to halt entirely - it's to shift the approach based on how much uncertainty the situation carries.

This design philosophy extends even to routine treatment. Doctors know patients miss doses, so they build treatment plans around that reality: simplified regimens, extended-release formulations, delivery mechanisms that work even when real-world conditions don't match the clinical trial. 

The push to convert injectable GLP-1 drugs to pill form isn't about the chemistry changing. It's about removing the needle from the equation because patients skip injections, cold storage chains fail in developing countries, and the delivery environment is never as controlled as the one the drug was tested in.

The pattern across all of this is consistent: design for the reality of how complex processes actually unfold, not the idealized version. 

When an AI system encounters conflicting inputs, ambiguous data, or a scenario outside its training distribution, the question is whether the system has any mechanism to recognize that and adjust. Clinical systems do

The automation continuum

The common thread across all three fields is a shared principle: the level of oversight should match the complexity of the task. 

Think of it as a continuum:

  • AI decides → the machine acts, no human involved
  • AI decides, human can override → the machine acts by default, the human can intervene
  • AI recommends → the machine presents choices, the human decides
  • AI informs → the machine surfaces data, makes no recommendation
  • Human decides → no AI involvement

Where any given task sits on this continuum should depend on two things - the complexity of the decision and the cost of getting it wrong. 

A straightforward data extraction with a clear right answer is a different kind of task than a multi-step judgment call that shapes an entire portfolio. Both can involve AI. They shouldn't involve AI the same way.

This is also why AI has taken hold faster in creative fields than in high-stakes analytical ones. 

Art tolerates approximation. There’s no "wrong" answer to "generate a picture of a sunset." But there’s a very wrong answer to "what is the total insured value of this property" or "should we bind this risk." 

The nature of the task should dictate how much autonomy the system gets - and how much human judgment stays in the loop.

The question leaders should be asking

The capability question has been answered. Models today reason through problems, self-correct, and deliver accuracy that would have been unimaginable two years ago. The question that remains is what sits around them.

When an AI system encounters a long-horizon task - one where inputs conflict, conditions evolve, and individual decisions compound into portfolio-level outcomes - does it have the architecture to match? 

Can it calibrate its own authority based on the complexity of each decision? Does it know when to slow down, seek more information, or route to a human? Does it detect when conditions have shifted beyond its training distribution?

These aren't futuristic requirements. Medicine, engineering, and organizational management have been designing for exactly this - capable actors operating across complex, high-stakes processes - for decades. 

The principles are proven. They're just waiting to be applied.

The technology will keep improving, but capability was never the bottleneck. The bottleneck is the architecture between a powerful model and a consequential decision - and that's where the real work lives.

Stay ahead with underwriting intelligence: insights, product updates and industry trends

OUR CUSTOMERS

What real teams say
after turning on Kalepa

See how Kalepa helps insurers improve speed, consistency, and portfolio performance.

Trusted by top-tier INSURERS. Proven in production.
We use cookies to improve your experience on our site. For more information, please read our Privacy Policy.