AI

How to run Claude in compliance-heavy environments

Running Claude on regulated data is a solved problem in 2026 if you pick the right deployment surface and match it with the right contractual paper. Three architecture patterns cover HIPAA, SOC 2, GDPR, FINRA, FedRAMP, and ITAR. Most compliance objections are fixable, and the real leak paths are almost never the model.

A healthcare prospect asked me last week whether Claude could ever touch their EHR data without getting them sued.

The short answer is yes. The long answer is that most companies I talk to about this aren’t stuck because Claude can’t be compliant. They’re stuck because their legal team is still reading objections written in 2023. Things have moved.

Here’s what’s changed. In 2026 there are three architecture patterns that cover maybe ninety-five percent of the real compliance objections you’ll face, across HIPAA, SOC 2 Type II, GDPR, FINRA, FedRAMP Moderate and High, ITAR, and HITRUST. Pick the right one, sign the right paper, and you’re done. And the most interesting thing? The real leak paths in production LLM deployments almost never involve the model. They involve your observability pipeline, your debug logs, and a prompt cache whose compliance boundary nobody bothered to confirm in writing.

The short version

Three deployment patterns cover regulated Claude usage across almost every compliance regime that matters in 2026. Pick based on what you actually need, not what your legal team read two years ago.

  • Pattern A - Anthropic Enterprise with Zero Data Retention, when you want a direct relationship and the latest features
  • Pattern B - Cloud-hosted Claude (Bedrock, Vertex, Azure Foundry) under the cloud provider's BAA or DPA, when you want the cloud to be your compliance perimeter
  • Pattern C - De-identify before Claude, when you want to shrink the regulated data surface regardless of where the model runs

What compliance-heavy actually means

Every compliance regime that’s ever made me rewrite a deployment asks the same four questions. Where does the regulated data live. Who processes it. Does it get used to train external models. And can you prove all of the above.

HIPAA cares about PHI boundaries and BAAs. SOC 2 Type II cares about whether your controls work in practice, not just on paper. GDPR cares about Chapter V transfer mechanisms and Art. 28 processor agreements. FINRA Notice 24-09 and the 2026 Regulatory Oversight Report care about supervising AI outputs the same way you’d supervise a broker. PCI DSS v4.0 cares about cardholder data wherever it goes. FedRAMP cares about whether the whole stack is authorized for the impact level the agency needs. ITAR cares about US-persons access and export control. HITRUST CSF v11 rolls up most of these into one certifiable control set. ISO 27001 is the underlying ISMS. ISO 42001 is the new AI management system standard that’s starting to show up in enterprise procurement checklists in 2026.

Here’s the thing, and it took me embarrassingly long to internalize it. None of these regulators cares which LLM you use. Not one. They care about the data boundary around it. This is the key shift. Once you stop treating the LLM as the compliance problem and start treating it as a downstream service inside a boundary you already know how to audit, the whole thing gets easier.

That’s what the three patterns are about.

The three deployment patterns that work

Pattern A: Anthropic Enterprise plus Zero Data Retention

You sign a BAA or DPA directly with Anthropic, turn on Zero Data Retention, and call the Claude API like it’s any other processor in your stack. This is the cleanest option when you want the latest models, you trust Anthropic as a direct vendor relationship, and your compliance ask is “give me a contract and tell me prompts aren’t used for training.”

Anthropic announced SOC 2 Type II and ISO 27001 certifications for the Claude API in January 2026, and they were one of the first frontier labs to certify against ISO 42001 in January 2025. HIPAA BAAs are available on sales-assisted Enterprise plans, and any BAA signed after December 2, 2025 covers API plus Enterprise in one agreement. (Anthropic BAA page has the current scope.)

The gotcha is the exclusion list, and it’s not small. Under ZDR the following features are out of scope: Batch API, Files API, Skills API, code execution, programmatic tool calling, and the MCP connector. Under HIPAA the list is wider, adding Web Fetch, Computer Use, Advisor, Context Management (compaction), and Tool Search. Pure Messages API is covered, which is most of what anyone needs. Everything fancy on top requires a case-by-case review. (Zero Data Retention docs list it line by line.)

The other gap is EU residency. As of April 2026 the direct Anthropic API offers us and global inference only. There’s no EU-only routing. If your lawyers want data to stay inside the EU compliance perimeter, this pattern won’t do it today. You go to Pattern B for that.

Pattern B: Cloud-hosted Claude under the cloud provider’s paper

Claude runs inside Amazon Bedrock, Google Vertex AI, or Microsoft Azure AI Foundry. You sign the cloud’s BAA or DPA, and the cloud is your compliance perimeter. Anthropic doesn’t see your prompts or completions. Each model provider has an escrow account on the cloud side with no outbound access to customer traffic.

This is the pattern I recommend first for almost every regulated deployment I see. Bedrock was added to the AWS HIPAA Eligible Services Reference in an update on February 10, 2026, so you need no separate Anthropic BAA if PHI stays inside Bedrock. The Claude catalog on Bedrock as of April 2026 includes Claude Opus 4.6, Opus 4.5, Sonnet 4.6, Sonnet 4.5, and Haiku 4.5, in US, EU, and APAC regions depending on the inference profile. Sonnet 4.6 supports 1M-token context, which matters when you’re feeding it full claim packets or long EHR extracts.

For EU residency, Vertex AI is the cleanest answer today. Ten EU regions host Claude, and a europe-westN regional endpoint or the eu multi-region endpoint guarantees EU-only processing under Google’s DPA. Bedrock EU inference profiles (Frankfurt, Paris, Ireland) are the strong AWS alternative. (Vertex AI data residency docs spell out the options.) Azure AI Foundry went GA on Claude in January 2026 and Opus 4.7 landed on Foundry in April 2026, but HIPAA BAA coverage for Claude-in-Foundry specifically is still “under review” as of this writing, so don’t put PHI through Foundry without confirming the latest with your Microsoft account team.

For government and defense, this is where Pattern B forks. AWS GovCloud Bedrock hit FedRAMP High and DoD IL4/IL5 in May 2025 with Claude 3.5 Sonnet v1 and Claude 3 Haiku, and Claude Sonnet 4.5 was added on November 10, 2025. IL6 workloads run through the Palantir partnership on AWS or through Amazon Bedrock in the AWS Top Secret cloud. ITAR data stays in GovCloud, and as of April 2026 Sonnet 4.5 is the latest Claude available for ITAR workloads. You trade newest models for highest assurance, which is the correct trade at this impact level.

What this pattern gets you mechanically: PrivateLink (AWS) or Private Service Connect (Google) so traffic never hits the public internet. Customer-managed keys for encryption. CloudTrail or Cloud Audit Logs for who-called-what. Model invocation logging to an encrypted S3 bucket, which is what you’ll need to show your auditor when they ask for a full prompt-and-response trail. None of this is new. It’s the same security architecture you’d apply to any managed service.

Pattern C: De-identify before Claude ever sees the data

The third pattern isn’t a replacement for A or B. It’s a complement, and it’s the one that tends to get undervalued. You run a PHI or PII scrub before the prompt leaves your compliance boundary. Claude sees only de-identified text. You re-identify locally on the response path if you need the real values back.

Amazon Comprehend Medical is the default tool for healthcare data. It’s HIPAA-eligible, it maps to most of the 18 HIPAA Safe Harbor identifiers, and it’s the first-pass filter in most of the Bedrock reference architectures you’ll read. Microsoft Presidio is the open-source option, more biased toward consumer-PII categories but now shipping with a MedicalNERRecognizer that handles clinical context. For PCI workloads, you tokenize PANs with something like AWS Payment Cryptography or Basis Theory and let Claude see only the tokens.

Here’s the honest caveat. De-identification is probabilistic, not absolute. F1 scores in the 0.90 to 0.95 range mean five to ten percent of PHI still gets through. That’s an unacceptable miss rate if you’re relying on de-identification as your only control. Use it with a signed BAA, not instead of one. The architecture that actually works in production is A or B as the outer boundary, with C minimizing the data surface inside it.

How the patterns map to the regimes you actually face

The useful output of picking a pattern is deciding what you deploy on Monday. Here’s the pairing that comes up in almost every advisory conversation I’ve had with mid-size companies:

For US healthcare with PHI and you need the latest models: Pattern B on Bedrock with an AWS BAA. No separate Anthropic BAA. PrivateLink endpoint. Invocation logging to an encrypted bucket. Comprehend Medical de-id pre-pass where feasible. This is the well-trodden path, and it’s how you get the HIPAA compliance story your compliance officer will actually sign off on.

For EU personal data with residency requirements: Pattern B on Vertex AI with europe-west3 or the eu multi-region, under Google’s DPA. Or Bedrock EU inference profile if you’re AWS-native. Anthropic direct is off the table until EU inference ships.

For broker-dealers and RIAs under FINRA and SEC rules: Pattern B plus S3 Object Lock in Compliance Mode for the 17a-4 retention requirement. The FINRA 2026 Regulatory Oversight Report expects logs of prompts, outputs, model versions, and human oversight. S3 Object Lock has been assessed against 17a-4(f). Route every Claude call through a logging shim that writes to a WORM bucket with a six-year retention. This is the same pattern I’d use for any regulated communication channel. The specifics are in the financial services compliance piece.

For federal civilian CUI: Claude for Government (FedRAMP High authorized) or Bedrock GovCloud. Both work. Pick based on which cloud footprint you already have.

For DoD IL4 and IL5, and for ITAR: Bedrock GovCloud only. As noted, the model list is smaller than commercial, and you accept that trade for the authorization.

For DoD IL6 and classified workloads: the Palantir IL6 environment, or Claude Gov via the AWS Top Secret cloud.

For PCI workloads: tokenize before the prompt leaves the CDE. Any pattern (A, B, or both plus C) works as long as the LLM never sees an actual PAN. Your QSA will have views; ask them before you build.

For less-regulated enterprise use (internal tooling, employee productivity, non-PHI research): Pattern A with ZDR is the cleanest answer. Direct Anthropic relationship, latest models, no cloud middleware.

If GDPR is the main concern, there’s also a nice adjacent read on privacy by design for AI systems that goes deeper on differential privacy and federated patterns where they fit.

The three leak paths that aren’t the model

Here’s the contrarian part, and it’s the thing that’ll save you a post-mortem.

The published attack surface against LLMs (training-data extraction, model inversion, adversarial jailbreaks) is not where regulated data is actually getting out in 2026. I’ve seen versions of this argument in every vendor pitch deck. It’s mostly noise. Real incidents, from Samsung in 2023 through the DeepSeek database exposure in January 2025 and the Microsoft Office Copilot oversharing bug disclosed in February 2026, come from three much more boring places.

Your observability pipeline. Sentry, Datadog, and CloudWatch are built to capture as much detail as possible. By default they all capture prompt payloads. Sentry breadcrumbs pick up console.log output, so a developer who logs a prompt for local debugging ships it to Sentry. Sentry’s own tracker has an open issue where its LLM monitoring was capturing prompts despite sendDefaultPii: false. Datadog’s APM traces store LLM inputs and outputs as span attributes; its Sensitive Data Scanner is opt-in and rule-based, and new entity types slip through until someone writes a rule. CloudWatch Data Protection Policies work, but only if you configure them before you need them. The common failure mode across all three: scrubbing is configured for the PII categories the team remembered, not the ones the LLM actually sees. Medical jargon and custom identifiers pass through.

Your debug logs, support tickets, and human-review queues. The simplest pattern is also the most common. A developer writes logger.info("prompt: " + user_prompt) to debug a failing request. The logger ships to CloudWatch, CloudWatch exports to a third-party SIEM, and the SIEM is operated by a vendor outside your BAA boundary. Nobody notices for six months. The variants are worse. Support tickets that attach full failing request bodies. Exception stacktraces that serialize request frames. “Replay this request” tooling that copies raw prompts into internal Slack channels. Human-review queues that pull flagged prompts for safety review by contractors who’ve never heard of HIPAA. A prompt can enter a review queue because a safety classifier flagged it for any reason, not because it contains PHI. The classifier doesn’t know it’s PHI. But it’s now out of scope.

The prompt cache’s compliance boundary. Anthropic cut the default prompt cache TTL from one hour to five minutes on March 6, 2026, and moved isolation from organization-level to workspace-level on February 5, 2026. Both changes matter. But the public prompt caching docs still don’t spell out where cached blocks physically live, whether they fall inside a HIPAA BAA by default, or what encryption at rest applies. For non-ZDR customers, data is retained for the cache duration, full stop. And published research from 2025 (Gu et al., “Auditing Prompt Caching in Language Model APIs”) demonstrated that caches can leak cross-user information through timing side channels when caches are shared globally. Workspace isolation helps. It doesn’t completely close the door. For healthcare and other PHI-sensitive deployments, this is a boundary worth confirming in writing with your vendor before production. Don’t assume.

None of these three leak paths are Claude-specific. They apply to every LLM integration, OpenAI, Gemini, Mistral, Llama, your in-house model, every single one. The private deployment patterns in this post shrink your compliance boundary. They don’t eliminate these leak paths. That’s still your job, inside the boundary, on your team’s time. AI security threats at enterprise scale covers the broader version of this argument if you want it.

A pre-flight checklist for production

Before any regulated data hits your Claude deployment, these five things need to be true. Not aspirational. Actually true.

The BAA, DPA, or equivalent processor agreement is signed and executed. Not “sent to legal.” Signed. If it’s HIPAA, signed before any PHI touches Claude. Not after.

Data never leaves your compliance boundary in the clear. TLS in transit, KMS or CMEK at rest, and a VPC endpoint or private service connection for the LLM call itself. Public internet egress is off.

Your observability and logging pipeline has been audited for prompt content. Every span attribute, every log line, every breadcrumb. Anything that can capture a prompt has been configured to scrub it or excluded from the sensitive code path entirely. Then you test it, by submitting a prompt that contains a known PHI-shaped string and checking that the string doesn’t appear downstream.

Invocation logs go to compliance-grade storage. For HIPAA that’s an encrypted, BAA-covered bucket with versioning. For FINRA and SEC that’s S3 Object Lock in Compliance Mode with a six-year retention. Your auditor will ask. Being able to produce a prompt-and-response trail on demand is the difference between a clean review and a finding.

And every code path that touches regulated prompts has a deliberate allowlist of downstream sinks. No “log everything for support.” No “send failing requests to an internal Slack.” No “attach the full payload to the Jira ticket.” The principle, and I’ve seen this violated so many times, is that any code path that can receive a regulated prompt should treat that prompt as contaminated for the lifetime of the request. Every sink it can reach has to be on the allowlist.

Your legal team isn’t wrong to ask hard questions about Claude. They’re often wrong about which ones matter. The model isn’t the risk. The path the data takes to and from it is.

About the Author

Amit Kothari is an experienced consultant, advisor, coach, and educator specializing in AI and operations for executives and their companies. With 25+ years of experience and as the founder of Tallyfy (raised $3.6m), he helps mid-size companies identify, plan, and implement practical AI solutions that actually work. Originally British and now based in St. Louis, MO, Amit combines deep technical expertise with real-world business understanding.

Disclaimer: The content in this article represents personal opinions based on extensive research and practical experience. While every effort has been made to ensure accuracy through data analysis and source verification, this should not be considered professional advice. Always consult with qualified professionals for decisions specific to your situation.

All consulting services delivered via Blue Sheen - bluesheen.com
Contact me Follow