A Pipeline Built Over Two Decades
Group Market Share started AskGMS in 2001 to answer a question carriers already felt in their bones: Form 5500 filings are full of valuable market data, and the filing is the worst possible place to actually use it.
Which office produced this case? How does this employer's dental premium stack up against peers? Where is compensation drifting above benchmark for plans of this size and industry? The filing holds the ingredients for all three and answers none of them.
For 24 years, AskGMS built proprietary carrier data, benchmarking models, and market-share statistics on top of those filings. Benefeature, launched in 2020, carried that foundation into a full platform: searchable employer profiles, office-level broker attribution, per-product premium modeling, retirement integration, employer contacts, benefit ratings, all on one connected graph.
So the origin story is not a product roadmap. It is a transformation pipeline we engineered over more than two decades and still refine every month. This article walks that pipeline stage by stage, from a raw filing to a decision a team can act on.
Raw Form 5500 filings
Ingestion, Parsing & Normalization
One vocabulary across 23 product categories
Entity Resolution
Many strings, one broker; the full universe re-resolved monthly
Attribution Modeling
Producing offices, not filing hubs
Premium Estimation
Models built and validated against 24 years of AskGMS data
Relationship Mapping
Employer, broker, carrier, and retirement ties on one graph
Semantic Intelligence Cubes
The structured market view Atlas queries
Stage One: Ingestion, Parsing, and Normalization
Each month the Department of Labor releases new and amended Form 5500 filings. Ingestion pulls them all in: full ERISA filings, short forms for smaller plans, amendments. Parsing turns them into structured records.
Parsing is harder than reading XML or lifting fields off a PDF. Filers use different plan-administrator systems. Schedule A layouts differ in how they present products, carriers, and compensation. Some fields that look optional carry critical signal when populated; others that look populated hold values that fail a basic sanity check on contact.
The parsing layer extracts the entities (employer, plan, carrier, broker, product lines, compensation entries) and recasts the filing into an internal schema built for downstream processing rather than for compliance review. Normalization then maps every filer's dialect onto one internal vocabulary, so that "Dental PPO" and "DENTAL - PPO PLAN" classify identically every time, across 23 benefit product categories.
What comes out of this stage is structured but not yet intelligent. It is parsed, normalized filing data. Everything that follows decides whether it becomes market intelligence.
Stage Two: Entity Resolution
As the previous article laid out, entity resolution decides when different strings point at the same real-world organization, office, or person, and when lookalikes are actually distinct.
At pipeline scale, that runs across:
- Employers — EIN changes, restructurings, dba variants
- Broker firms — legal names, trade names, abbreviations, punctuation
- Broker offices — producing locations versus central filing hubs
- Broker agents — individual producers linked across plans and years
- Carriers — group and individual entities inside a carrier family
Resolution is not an incremental patch, either. Every monthly run re-resolves the entire universe from scratch: every broker string, every employer, every office, across every plan year we hold. Entity identities stay stable from one rebuild to the next, and every shift in an entity's membership is reviewed against the prior release instead of being discovered in production. A brokerage that acquires another firm shows up through our curated affiliation crosswalk of more than a thousand mergers, acquisitions, and parent-firm relationships, maintained as the market moves.
Skip this and the next stage has nothing trustworthy to attach a relationship to. This is the stage most filing databases never attempt, and the one that decides whether everything above it reflects how the market actually works.
Stage Three: Attribution Modeling
Attribution is where regulatory data turns into sales intelligence.
Schedule A names a broker of record. For large brokerages, that name often points at a central filing hub, an office that submits paperwork for plans sold by producers in other cities. Attribution modeling works out which office actually produced the business and which agent owns the relationship.
This is the core of what broker filing hub solves. Carriers need to know where business is written, not where it is filed. Distribution teams build territories around producing offices. Broker rankings by attributed premium reflect real production instead of lockbox volume.
The model combines resolution output with a multi-tier geographic ranking that separates the office that produced the business from the one that merely filed the paperwork. It favors the offices closest to where the employer actually operates and screens known filing hubs and lockbox addresses out of contention before ranking begins. The result is office-level attribution on every plan in the database, which only holds together because every earlier stage did its job.
Compensation adds a second attribution dimension. Schedule A discloses compensation at the plan level across 14 fee types: commission, service fee, supplemental commission, override, bonus, contingent compensation, marketing fee, general agent fee, TPA fee, technology fee, consulting fee, referral fee, non-monetary compensation, and other. Attribution ties those structures to resolved broker entities, and our broker compensation intelligence benchmarks every rate against its peers.
Stage Five: Relationship Mapping
A single filing describes a single plan. A market is a web of relationships, and mapping is how the pipeline makes that web traversable.
It connects entities into graphs you can walk:
- Which carriers sit on which employer plans, and how carrier mix shifts over time
- Which broker offices serve which employers, and how those ties move at renewal
- Which agents hold which accounts inside a firm's book
- How retirement data on the same employer profile connects to group benefits, with 838,500+ retirement plans integrated for cross-sell signal detection through our retirement and group benefits integration
- How benefit ratings and employee sentiment at the employer, broker, and carrier levels attach to the financial and relationship data
Relationship mapping is what makes a book-of-business view possible. It is what lets a carrier see wins and losses across a broker network. It is what links an employer's 401(k) loan rate to a likely voluntary-benefits gap. Filings hold these relationships implicitly. Mapping makes them explicit, durable, and searchable across the whole database.
Stage Six: Semantic Intelligence Cubes
The final stage organizes everything into what we call semantic intelligence cubes: structured, multi-dimensional views of the market built for search, filter, benchmark, and export.
A cube is not a database table. It is an employer-centric (or broker-centric, or carrier-centric) rollup where every dimension connects to every other:
- Employer cube — plan data, per-product premium flags, broker and carrier relationships, retirement KPIs, benefit ratings, matched buying-team contacts
- Broker agent cube — attributed book, compensation patterns, employer relationships, aggregate ratings
- Broker firm cube — office-level production rankings, firm-wide premium, compensation benchmarks
- Carrier cube — attributed premium, product penetration, broker network composition
The same intelligence rolls cleanly up and down the hierarchy. Search an employer, see the brokers. Search a broker, see the employers. Filter by a premium flag and get a prospect list where every row carries the full weight of the pipeline behind it, not a raw filing extract.
This is the layer users actually experience. Everything before it is engineering. This is the part that looks like a product.
The Filings-to-Decisions Ladder
We describe the whole pipeline with one picture, the Filings-to-Decisions ladder:
| Rung | Stage | What you have |
|---|---|---|
| 1 | Raw filing | The regulatory submission as filed |
| 2 | Parsed record | Structured entities pulled from the filing |
| 3 | Resolved entities | Organizations, offices, and agents identified consistently |
| 4 | Attributed relationships | Producing offices, carrier links, compensation mapped |
| 5 | Modeled metrics | Per-product premium, benchmark flags, peer comparisons |
| 6 | Decision-ready intelligence | Searchable profiles, books of business, territory signals |
Each rung stands on the one below it. Skip a stage and everything above it collapses. A platform that stops at rung two has a filing database. A platform that reaches rung six has an intelligence layer.
This ladder is the engineering story behind Benefeature. The rest of this series builds on it, starting with why most AI products that skip these rungs fall apart the moment they touch benefits data.
Key takeaway
Turning filings into intelligence is not one step. It is a pipeline: ingestion, parsing, resolution, attribution, estimation, mapping, and semantic structuring. We built it over 24 years and rebuild its output every month. The platform users search is the top of that pipeline. Understanding the pipeline is how you tell whether any product is delivering intelligence or just well-organized filings.
Related in this series
Next: Why Most AI Fails on Benefits Data
Coming soon
