Turning Raw Filings into Market Intelligence

A Pipeline Built Over Two Decades

Group Market Share started AskGMS in 2001 to answer a question carriers already felt in their bones: Form 5500 filings are full of valuable market data, and the filing is the worst possible place to actually use it.

Which office produced this case? How does this employer's dental premium stack up against peers? Where is compensation drifting above benchmark for plans of this size and industry? The filing holds the ingredients for all three and answers none of them.

For 24 years, AskGMS built proprietary carrier data, benchmarking models, and market-share statistics on top of those filings. Benefeature, launched in 2020, carried that foundation into a full platform: searchable employer profiles, office-level broker attribution, per-product premium modeling, retirement integration, employer contacts, benefit ratings, all on one connected graph.

So the origin story is not a product roadmap. It is a transformation pipeline we engineered over more than two decades and still refine every month. This article walks that pipeline stage by stage, from a raw filing to a decision a team can act on.

Raw Form 5500 filings

Ingestion, Parsing & Normalization

One vocabulary across 23 product categories

Entity Resolution

Many strings, one broker; the full universe re-resolved monthly

Attribution Modeling

Producing offices, not filing hubs

Premium Estimation

Models built and validated against 24 years of AskGMS data

Relationship Mapping

Employer, broker, carrier, and retirement ties on one graph

Semantic Intelligence Cubes

The structured market view Atlas queries

The six-stage transformation pipeline, rebuilt end to end every month.

Stage One: Ingestion, Parsing, and Normalization

Each month the Department of Labor releases new and amended Form 5500 filings. Ingestion pulls them all in: full ERISA filings, short forms for smaller plans, amendments. Parsing turns them into structured records.

Parsing is harder than reading XML or lifting fields off a PDF. Filers use different plan-administrator systems. Schedule A layouts differ in how they present products, carriers, and compensation. Some fields that look optional carry critical signal when populated; others that look populated hold values that fail a basic sanity check on contact.

The parsing layer extracts the entities (employer, plan, carrier, broker, product lines, compensation entries) and recasts the filing into an internal schema built for downstream processing rather than for compliance review. Normalization then maps every filer's dialect onto one internal vocabulary, so that "Dental PPO" and "DENTAL - PPO PLAN" classify identically every time, across 23 benefit product categories.

What comes out of this stage is structured but not yet intelligent. It is parsed, normalized filing data. Everything that follows decides whether it becomes market intelligence.

Stage Two: Entity Resolution

As the previous article laid out, entity resolution decides when different strings point at the same real-world organization, office, or person, and when lookalikes are actually distinct.

At pipeline scale, that runs across:

Employers — EIN changes, restructurings, dba variants
Broker firms — legal names, trade names, abbreviations, punctuation
Broker offices — producing locations versus central filing hubs
Broker agents — individual producers linked across plans and years
Carriers — group and individual entities inside a carrier family

Resolution is not an incremental patch, either. Every monthly run re-resolves the entire universe from scratch: every broker string, every employer, every office, across every plan year we hold. Entity identities stay stable from one rebuild to the next, and every shift in an entity's membership is reviewed against the prior release instead of being discovered in production. A brokerage that acquires another firm shows up through our curated affiliation crosswalk of more than a thousand mergers, acquisitions, and parent-firm relationships, maintained as the market moves.

Skip this and the next stage has nothing trustworthy to attach a relationship to. This is the stage most filing databases never attempt, and the one that decides whether everything above it reflects how the market actually works.

Stage Three: Attribution Modeling

Attribution is where regulatory data turns into sales intelligence.

Schedule A names a broker of record. For large brokerages, that name often points at a central filing hub, an office that submits paperwork for plans sold by producers in other cities. Attribution modeling works out which office actually produced the business and which agent owns the relationship.

This is the core of what broker filing hub solves. Carriers need to know where business is written, not where it is filed. Distribution teams build territories around producing offices. Broker rankings by attributed premium reflect real production instead of lockbox volume.

The model combines resolution output with a multi-tier geographic ranking that separates the office that produced the business from the one that merely filed the paperwork. It favors the offices closest to where the employer actually operates and screens known filing hubs and lockbox addresses out of contention before ranking begins. The result is office-level attribution on every plan in the database, which only holds together because every earlier stage did its job.

Compensation adds a second attribution dimension. Schedule A discloses compensation at the plan level across 14 fee types: commission, service fee, supplemental commission, override, bonus, contingent compensation, marketing fee, general agent fee, TPA fee, technology fee, consulting fee, referral fee, non-monetary compensation, and other. Attribution ties those structures to resolved broker entities, and our broker compensation intelligence benchmarks every rate against its peers.

Stage Four: Premium Estimation

Form 5500 reports premium as one combined number. Intelligence needs it per product.

ERISA plans bundle health, life, dental, disability, and voluntary lines under a single figure. A carrier weighing dental competitiveness in a territory cannot act on a lump sum. A broker hunting cross-sell needs to see the lines.

Premium estimation allocates reported plan premium across 23 benefit product categories. The models compare each plan against peer cohorts of similar employers — matched on the dimensions that actually move benefit pricing — and every estimated line gets flagged very low, low, normal, high, or very high against that peer group.

What makes those models defensible is how they were built. We designed, iterated, and verified them against 24 years of proprietary AskGMS data: carrier-reported sold cases and inforce patterns that told us, case after case, whether the allocations held up against reality. A model like this cannot be built with any degree of accuracy without strong verified data sitting beside it for comparison and tuning. That ground truth exists nowhere else, which is precisely why a filing database cannot reproduce our per product premium layer.

Estimation depends on normalized classification from stage one, validated premium-to-participant relationships, and the hierarchy that makes peer-group selection meaningful. A filing database shows you the total. The pipeline shows you where the money sits, line by line.

Stage Five: Relationship Mapping

A single filing describes a single plan. A market is a web of relationships, and mapping is how the pipeline makes that web traversable.

It connects entities into graphs you can walk:

Which carriers sit on which employer plans, and how carrier mix shifts over time
Which broker offices serve which employers, and how those ties move at renewal
Which agents hold which accounts inside a firm's book
How retirement data on the same employer profile connects to group benefits, with 838,500+ retirement plans integrated for cross-sell signal detection through our retirement and group benefits integration
How benefit ratings and employee sentiment at the employer, broker, and carrier levels attach to the financial and relationship data

Relationship mapping is what makes a book-of-business view possible. It is what lets a carrier see wins and losses across a broker network. It is what links an employer's 401(k) loan rate to a likely voluntary-benefits gap. Filings hold these relationships implicitly. Mapping makes them explicit, durable, and searchable across the whole database.

Stage Six: Semantic Intelligence Cubes

The final stage organizes everything into what we call semantic intelligence cubes: structured, multi-dimensional views of the market built for search, filter, benchmark, and export.

A cube is not a database table. It is an employer-centric (or broker-centric, or carrier-centric) rollup where every dimension connects to every other:

Employer cube — plan data, per-product premium flags, broker and carrier relationships, retirement KPIs, benefit ratings, matched buying-team contacts
Broker agent cube — attributed book, compensation patterns, employer relationships, aggregate ratings
Broker firm cube — office-level production rankings, firm-wide premium, compensation benchmarks
Carrier cube — attributed premium, product penetration, broker network composition

The same intelligence rolls cleanly up and down the hierarchy. Search an employer, see the brokers. Search a broker, see the employers. Filter by a premium flag and get a prospect list where every row carries the full weight of the pipeline behind it, not a raw filing extract.

This is the layer users actually experience. Everything before it is engineering. This is the part that looks like a product.

The Filings-to-Decisions Ladder

We describe the whole pipeline with one picture, the Filings-to-Decisions ladder:

Rung	Stage	What you have
1	Raw filing	The regulatory submission as filed
2	Parsed record	Structured entities pulled from the filing
3	Resolved entities	Organizations, offices, and agents identified consistently
4	Attributed relationships	Producing offices, carrier links, compensation mapped
5	Modeled metrics	Per-product premium, benchmark flags, peer comparisons
6	Decision-ready intelligence	Searchable profiles, books of business, territory signals

Each rung stands on the one below it. Skip a stage and everything above it collapses. A platform that stops at rung two has a filing database. A platform that reaches rung six has an intelligence layer.

This ladder is the engineering story behind Benefeature. The rest of this series builds on it, starting with why most AI products that skip these rungs fall apart the moment they touch benefits data.

Key takeaway

Turning filings into intelligence is not one step. It is a pipeline: ingestion, parsing, resolution, attribution, estimation, mapping, and semantic structuring. We built it over 24 years and rebuild its output every month. The platform users search is the top of that pipeline. Understanding the pipeline is how you tell whether any product is delivering intelligence or just well-organized filings.