AI agents can absolutely run PHI-touching workflows — patient intake, documentation, front-desk triage — if compliance is an architecture decision made on day one, not a review stage bolted on at the end. The teams that get stuck aren't blocked by HIPAA; they're blocked by having built a demo first and asked the compliance question second.
The foundation is contractual: a BAA at every hop PHI touches. That includes the model provider — the major vendors all offer BAA-covered, zero-retention API tiers now (the consumer apps are not that; using them on PHI is how organizations end up in breach reporting). It includes your cloud, your vector store if retrieval touches PHI, and your implementation partner. We sign BAAs for healthcare engagements as table stakes.
Architecture follows the minimum-necessary principle. The agent sees only the fields the workflow requires — an intake agent needs demographics and insurance details, not the full chart. PHI is masked or tokenized before it reaches any general-purpose model where the task allows, retention is zero at the model layer, and processing stays inside your BAA-covered infrastructure. On-prem or VPC-isolated inference is the right call for the most sensitive flows, and it's an option we design for explicitly.
Then the part reviewers actually probe: auditability. Every agent action logged immutably — what data it accessed, what it produced, who reviewed it, when. Human-in-the-loop checkpoints on anything consequential: an intake summary is drafted by the agent and confirmed by staff; a documentation note is proposed, not auto-filed. This maps directly onto the observability layer of the production agent stack — compliance and operability turn out to want the same instrumentation.
Evals carry extra weight in clinical-adjacent settings, because the failure mode isn't a wrong answer — it's a *confidently* wrong answer entering a record. Test sets built from de-identified historical cases, accuracy thresholds gating deploys, and drift monitoring after every vendor model change. This is the operating discipline our Managed AI Operations retainer runs monthly, and in healthcare it doubles as your evidence file.
Where to start: pick the workflow with high volume and bounded risk — intake and front-desk load, not diagnosis. In one deployment, a regional provider network cut intake and admin time 41% with exactly this shape. The two-week Sprint scopes the workflow, the data boundaries, and the BAA chain before any build begins — see the AI for Healthcare page for the full compliance posture, or start with the readiness assessment to see where your foundations stand.


