Field Notes May 2026 8 min read

Why most healthcare and fintech chatbots fail compliance review

The demo works. The CEO is impressed. Then Risk and Compliance sit at the table — and the deal quietly dies. After three years building regulated platforms across HIPAA, PCI-DSS, and FDIC environments, I keep watching the same four patterns kill otherwise impressive AI chatbots. Here's what they are, and what to build instead.

I've spent the last three years operating at the intersection of AI customer experience and regulated environments. At Sentara Healthcare I shipped HIPAA-aligned member onboarding and case management. At KeyBank I owned the PCI-DSS card origination and fraud servicing flows. At LPL Financial I'm currently leading a $49B FDIC-insured Bank Sweep redesign anchored on CIP-verified identity. And in parallel, I founded TechGenie.ai — an AI member engagement platform that processes more than a million interactions a month for clients who genuinely care whether the system survives a security review.

Across those vantage points, I see the same chatbot vendor sales motion play out almost every quarter:

  1. Product team falls in love with the demo.
  2. Business case looks fantastic.
  3. Procurement loops in Risk, Compliance, InfoSec, and Legal.
  4. The vendor cannot answer one of the four questions below.
  5. The deal dies, quietly, in a security questionnaire.

If you are building, buying, or evaluating an AI chatbot for a regulated environment, here are the four patterns that kill them — and what to build instead.

01The model sees data it has no business seeing

The most common failure is also the most invisible. The chatbot is wired directly to a customer database, a CRM, or a ticketing system. The LLM gets a query, the system fetches "relevant context," and the model returns an answer. Beautiful. Fast. Helpful.

And catastrophic, because the context retrieval step has no concept of who the user is, what they are allowed to see, or what counts as protected information. An agent helping User A might surface a snippet of User B's case history. A member asking about a copay might trigger retrieval that pulls in another member's diagnosis code. In healthcare this is a HIPAA breach. In financial services it is an SEC reportable event. In both cases, it kills the deal.

What to build instead

Identity has to come before retrieval, not after generation. In practice that means:

Field Note

At TechGenie, this is why we anchor escalation and retrieval on tenant-scoped identity from the first token. The model's context window only ever contains data the authenticated user already had a right to see.

02There is no defensible reason for the answer

A regulator does not care that your model is "really accurate." They care whether you can explain, after the fact, why your system told a specific person a specific thing on a specific date. Explainability is not a feature, it is an audit requirement.

The chatbots that fail here are the ones built as a single end-to-end model call: question goes in, answer comes out, nobody can reconstruct the decision. The vendor will hand you "we logged the conversation" as if that is the same thing. It is not.

What to build instead

03Escalation is an afterthought, not a primitive

Every AI chatbot vendor includes "human handoff" on the slide. Very few have actually designed for it. In a regulated environment, the escalation layer is where most of your operational risk concentrates — and most of your cost savings.

I learned this the hard way building TechGenie's escalation system. The first version routed to a live agent the moment the bot was unsure. That is the obvious design and it is wrong. Cheap escalations are how AI products bleed money; wrong-headed confidence is how they bleed trust.

The right model is a graded one. Low-stakes informational queries the bot can answer with full citation. Medium-stakes account questions go to the bot only when identity is verified and an audit trail is captured. High-stakes actions — moving money, changing beneficiaries, accessing PHI — never run autonomously. They require a verified human in the loop, regardless of how confident the model is.

What this looks like in practice

At TechGenie, the escalation layer decides routing based on three signals: intent risk (informational vs transactional), identity strength (anonymous vs CIP-verified), and confidence interval (with abstention as a first-class action). That single architecture cut support costs by roughly $3M annually while keeping us audit-clean.

04The vendor cannot show their compliance posture as code

This is the one that most often kills deals at the final stage. The InfoSec questionnaire arrives. The vendor responds with a SOC2 report, a HIPAA "we are compliant" attestation, and a marketing PDF.

That is no longer enough. Modern compliance teams want to see:

If you are a vendor, the cost of building this before your first regulated deal is roughly 10x cheaper than retrofitting it after. If you are a buyer, ask for these as artifacts, not assertions. A vendor who has built for compliance can show you the IAM policy, the encryption configuration, the tenant-isolation test suite. A vendor who has marketed compliance will send you a PDF.

05The pattern underneath all four

Step back from the specifics and the underlying failure is the same: treating compliance as a layer you add on top of an AI product, instead of a constraint that shapes the architecture from day one.

Demoware chatbots can ignore identity, skip explainability, hand-wave escalation, and gesture at compliance. Production chatbots in regulated environments cannot. The companies that will win this category over the next three years are the ones building from the constraint inward — not the ones trying to wrap a chat interface around an LLM and hope the auditors do not notice.

That is the harder path. It is also the only one that survives the security questionnaire.

If you are evaluating an AI chatbot for a regulated environment and want a second opinion, I write about this stuff regularly. Reach out — I am always interested in the messy real-world version of these problems.

Building something in this space?

I lead product for fintech and AI platforms in regulated environments. Always happy to compare notes.