FEATURED

AI in underwriting

Why Predicting Submission Value Starts with How Risks Relate

Matt Kielo

Senior Machine Learning Engineer

May 1, 2026

Why Predicting Submission Value Starts with How Risks Relate

The most expensive thing an underwriter does is spend the day building a quote that was never going to bind.

It doesn't feel like waste in the moment. The submission looked reasonable, the broker relationship is strong, and the risk falls within appetite.

But somewhere in the market - invisible to the underwriter - a competitor had a structural advantage on that account, and the outcome was decided before the quote was built.

That's the obvious waste. Less obvious: the submission that binds but turns out to be unprofitable, or the submission that could have bound profitably was declined.

The real question isn't just whether you can win the business. It's whether quoting that submission is worth the time, effort, and risk.

Will it generate positive lifetime value?

What guidelines can’t tell you

Triage decisions require quick assessments across multiple dimensions: does this submission fit our appetite and guidelines? If we quote it, will we actually win? And if we bind it, will it be profitable over time?

Experienced underwriters develop an instinct for this. They learn which brokers convert, which segments they're competitive in, and which submissions carry signals that something is off.

But that instinct lives in individual heads, is inconsistent across a team of twenty, and doesn't scale when submission volume outpaces headcount - which, for most carriers, it already has.

The organizational answer is structured guidelines.

Leadership defines strategic priorities (which segments to grow, where to pull back) and portfolio managers translate those into codifiable logic. This works well when the criteria are explicit and known in advance.

But guidelines are static by nature. They encode what leadership believed was true when the guidelines were written, not what's true now.

Market conditions shift. Segments that looked unpromising become competitive. Risks that pass every filter turn out to be systematically unprofitable. Guidelines can't adapt to what they can't see.

The deeper question is lifetime value - the overall worth of quoting a submission - but estimating it depends on competitive dynamics that are difficult to codify.

How does your pricing compare to an unknown market? Does the broker have a stronger relationship with a competitor? Are you structurally advantaged in this segment, or quoting into a headwind?

These factors are latent, vary by submission, and shift with market conditions.

Addressing the triage problem properly means combining both: structured guidelines for appetite and fit, and AI that can estimate the lifetime value of quoting a submission from your carrier's own competitive and loss history.

This post focuses on the AI side: how we estimate that value, and why doing it well requires rethinking how risks are represented before training any prediction model.

Why obvious solutions fall short

Most attempts to estimate lifetime value fall into one of a few categories, and each one hits a ceiling for the same underlying reason.

The most straightforward approach is rules. Define what a good submission looks like, score against that definition, and prioritize accordingly.

A more modern version uses an LLM to score each submission against a handcrafted rubric.

Both are improvements over no systematic triage at all, but they share a fundamental limitation: they require you to know what drives value in advance.

Rules encode what you already know. LLMs reason about a submission in isolation without learning from your carrier's history. Neither adapts.

The real value of prediction comes from surfacing patterns that aren't already obvious - which brokers convert in which segments, where your pricing is structurally competitive, and which risk characteristics correlate with profitable binding.

That's the bar. The question is whether any existing approach can clear it.

A step closer is computing broker-level bind rates directly: what percentage of recent submissions from this broker, in this line of business, in this region, actually bound?

This approach can surface real signal but the insight is shallow, revealing only the broker’s overall pattern and nothing about the specific submission in front of you.

And as you slice the data more finely by geography, class, and account size, the buckets get sparse fast, with too few data points in each bucket to trust the answer.

The natural next step is a machine learning model that can handle more complexity.

Gradient-boosted trees and similar approaches are standard for this kind of tabular prediction, and the concept is straightforward: take your historical book, filter to quoted submissions, and train a model to predict which ones will generate value.

But insurance submissions aren't standard tabular data. A restaurant doesn't have trucking miles. A contractor doesn't have bed counts. Different risks are fundamentally different things, and the fields that describe them barely overlap.

When a model tries to learn interactions between industry, geography, broker, and dozens of other variables simultaneously, the possible combinations explode - and most have too few submissions to learn from. It's the same sparsity problem as the broker-level buckets at a larger scale.

For underwriters, the resulting explanations are often difficult to act on. Account size might positively predict binding in one segment and negatively in another, with no intuitive reason why.

What every approach misses

Every approach above can only learn from submissions you actually quoted - which, at most carriers, is a fraction of total volume.

The unquoted majority gets discarded. But those submissions still contain rich information about what risks look like, how they relate to each other, and what your submission flow actually consists of.

That's a problem, but it's not the deepest one. These approaches also conflate two distinct challenges: learning how to compare insurance submissions to each other, and using those comparisons to identify where you're competitive and where you're profitable.

Rules and LLMs skip the comparison framework entirely. Supervised models try to learn both simultaneously from a filtered, biased subset of your data.

What if you learned the comparison framework first?

Our approach

The right way to compare risks is specific to each carrier. Two carriers looking at the same 50 states might care about completely different factors.

One carrier groups risk by regional weather patterns. Another groups it by litigation environment. A third might find that their competitiveness clusters around a combination of industry, account size, and broker tenure that no one designed on purpose - it just emerged from their historical book.

This is why fixed taxonomies like class codes and regional groupings break down. The meaningful structure in a carrier's submission flow is shaped by who they are, what they write, and where they compete, not by an industry standard.

To build a comparison framework that reflects this, we separate two problems that most approaches collapse into one: learning how risks relate to each other, and predicting what happens when they're quoted. Our system addresses both through three layers.

The data layer builds the raw picture

Kalepa's data layer is our ingestion pipeline.

Each submission contains first-party documents: cover sheets, loss runs, ACORDs, and similar materials. We extract and normalize these into structured features.

Entity resolution then links each submission to third-party sources - OSHA records, litigation history, business reviews, catastrophe exposure data, and others - enriching the picture well beyond what the submission documents contain.

The output is a broad feature set spanning categories, numeric fields, and freeform text, with hundreds to thousands of columns per submission. It's comprehensive, but it's also sparse and multi-modal - a restaurant and a trucking company share very few fields in common.

The risk encoder learns how risks relate

Rather than feeding this sparse feature set directly into a model that tries to predict binding, we first train the encoder to learn how submissions relate to each other independent of any outcome.

That means training the encoder on all submissions - not just the ones that were quoted - using a self-supervised approach that learns by reconstructing the input data rather than predicting a label.

Individual submissions are sparse, filling only a fraction of the possible fields. But the encoder sees all submissions at once, and in aggregate, they provide dense signal, picking up cross-feature patterns that no single field could reveal alone.

What the encoder produces is a compressed representation of each submission - a coordinate system where similar risks sit near each other and distance between points reflects meaningful similarity.

Think of it like a map of your carrier's risk landscape, not organized by location or class code, but by the actual structure of the risks you see.

Two restaurants might sit near each other on the map not because they’re in the same region, but because they have similar loss profiles, similar operational characteristics, and similar exposure patterns - features that matter for underwriting even though no standard classification system groups them together.

This map isn't just a way to see which risks are similar. Take a hotel in Atlanta. Strip away the location-specific factors (regional pricing dynamics, local regulatory environment, nearby competition) and replace them with Miami's.

The encoder places the result near other hotels in Miami because it learned what makes a hotel a hotel no matter where it sits.

That kind of arithmetic over risk attributes is what makes this representation powerful. It captures relationships that no handcrafted taxonomy can.

The prediction model estimates the probability of binding

The prediction model is a lightweight classification layer trained only on quoted submissions to estimate the probability of binding.

Before this model runs, we add features that describe the circumstances of the quote rather than the risk itself - broker and producer performance, account size, timing relative to effective date, and speed of response.

The risk encoder captures what the risk is. These additional features reflect the competitive context around a specific quoting decision - the factors that determine whether a well-matched risk actually converts.

A risk might be a perfect fit for your appetite, but if you're the fourth carrier to quote and the broker has a long-standing relationship with a competitor, the probability of binding is fundamentally different than if you're first to respond on an account the broker is actively shopping.

Separating the two is what allows the encoder to learn from all submissions while the prediction model focuses on the competitive dynamics of the ones you actually quoted.

For new carriers without sufficient history, the data layer compensates by leaning more heavily on entity resolution and third-party enrichment when carrier-specific signal is thin.

The system learns on two cadences: the risk encoder retrains slowly because the landscape of how risks relate to each other evolves gradually. The prediction model retrains dynamically, adapting as competitive dynamics shift with the market.

Binding probability is the foundation of the lifetime value estimate. Profitability modeling operates separately, drawing on the same underlying representation.

What the underwriter actually sees

We cluster the risk map into neighborhoods of similar risks and generate plain-language descriptions of each cluster.

Within each neighborhood, we isolate the factors that drive lifetime value. Among similar restaurants in similar geographies, what's different about the ones that bind profitably?

The underwriter sees a prioritized queue where each submission carries a score and an explanation grounded in its risk neighborhood - not a black-box number, but a reason tied to risks they'd recognize as comparable.

Why this architecture matters

Triage is where this representation earns its value first, but the architecture carries more weight than a single use case.

Because the risk encoder learns how submissions relate independent of any specific outcome, it serves as shared infrastructure.

Renewal prediction, loss ratio estimation, portfolio segmentation, and submission flow forecasting all operate on the same underlying representation. Each requires only a lightweight model on top - not a separate pipeline or training cycle.

Build the representation once, and every application you layer on top inherits what it learned.

Beyond triage

The triage problem - an underwriter spending a day on a quote that was never going to bind - is ultimately a symptom of a deeper structural gap.

Carriers have always had data about what they write and where they compete. What they haven't had is a way to organize that data into a framework that reflects their own competitive landscape, learns from every submission they see, and translates into decisions an underwriter can act on before committing their time.

That's what a learned representation provides.

Not a score without context. Not a guideline that someone wrote three years ago. A continuously evolving understanding of your risk landscape, specific to your carrier, grounded in your data, and designed to make every decision more informed than the last.

‍

Stay ahead with underwriting intelligence: insights, product updates and industry trends