Data & More

White Paper · North America

What organizations should — and should not put into an AI LLM/SLM

A practical guide to personal, confidential, secret, and regulated data in the age of generative AI — and why the line you draw today decides your exposure tomorrow.

Published

June 2026

Focus

PII · Credentials · Confidential & Regulated Data

Jurisdiction

Canada & United States

Audience

Executives, Legal, IT & Compliance

1The core tension

Executive summary

Generative AI tools have moved from novelty to default workflow in barely two years. Employees draft contracts, summarize customer records, debug code, and analyze spreadsheets inside chat windows — often without a single line of policy governing what they are allowed to paste in.

That gap is the risk. The same property that makes an AI assistant useful — it accepts and reasons over whatever text you give it — means a careless paste can move your most sensitive data outside your control, outside your jurisdiction, and in some cases outside the protections of the law. The technology is not the hazard. The data you feed it, and the tier you feed it on, is.

This paper sets out a defensible position for any North American organization: a clear taxonomy of what is and is not safe to process, the legal reasoning behind those lines, and a governance checklist you can adopt this quarter. It is deliberately written about AI tools in general, with one named example where a court has now ruled directly on the question.

“The technology is not the hazard. The data you feed it — and the tier you feed it on — is.”

2Know what you are holding

A risk taxonomy of sensitive data

Before drawing any line, an organization has to name the categories. Not all data carries the same exposure, and treating it uniformly leads either to paralysis or to negligence. The categories below cover the great majority of what flows through a workplace AI tool.

Personal identifiable information (PII)

Names, addresses, government IDs, dates of birth, biometric and location data. Regulated under PIPEDA in Canada and a growing patchwork of US state laws.

Regulated special-category data

Health records (HIPAA/PHIPA), financial account data, children’s data. Carries statutory breach-notification and consent obligations.

Company confidential

Strategy, M&A material, unreleased financials, source code, pricing, internal communications. Loss erodes competitive position and may breach disclosure rules.

Secrets & credentials

Passwords, API keys, access tokens, encryption keys, certificates. Pasting one is functionally equivalent to publishing it.

Third-party & contractual data

Customer data, partner data, and anything you hold under NDA or a data-processing agreement. You may be contractually barred from disclosing it to any sub-processor.

Privileged material

Communications and work product tied to legal advice. As Heppner now shows, this protection can be lost the moment it enters a consumer AI tool.

Authentication & access data

Multi-factor codes, security-question answers, password-reset links, and account-recovery information. A single exposed item can unlock systems far beyond the AI tool itself.

Source code & infrastructure config

Proprietary code, internal endpoints, network topology, and deployment settings. Beyond intellectual-property loss, this hands a would-be attacker a map of your environment.

3Capture the upside safely

What organizations should — and should not — do

The goal is not to ban AI. Organizations that ban it outright simply push usage into unmanaged personal accounts — the “shadow AI” problem — which is worse. The goal is to channel usage toward low-risk, high-value patterns and away from the handful of actions that create real liability.

Should use AI for

Drafting and editing on non-sensitive content — marketing copy, internal docs, public-facing material.
Working over data that is already public or properly de-identified / anonymized.
Coding help with credentials stripped and secrets replaced by placeholders.
Summarizing or restructuring documents inside a contracted enterprise tier with a signed DPA.
Brainstorming, research, and learning — where no confidential input is required.
Formatting and tidying meeting notes — once names and identifying details have been removed.

Should never put in

Passwords, API keys, tokens, certificates — in any tool, on any tier, ever.
Raw PII or regulated records (health, financial, biometric) into a consumer-grade account.
Privileged legal material into a public tool — it can forfeit protection (see §6).
Third-party data held under NDA / DPA that bars onward disclosure.
Trade secrets and unreleased financials — once disclosed, “secret” status can be lost.
Employee & HR records — salaries, reviews, health, and disciplinary files into a consumer-grade account.

Danger — credentials are not “just text”

A password or API key pasted into a prompt may be stored, logged, processed by sub-systems, and — on a consumer tier with training enabled — retained for years. Treat any secret entered into an AI tool as compromised and in need of rotation, exactly as you would a secret pasted into a public webpage.

Danger — de-identification is harder than it looks

Removing a name is not anonymization. Combinations of seemingly innocuous fields — postal code, birth date, employer — routinely re-identify individuals. If re-identification is plausible, the data is still personal data and the rules still apply.

Danger — shadow AI

When no sanctioned tool exists, employees use personal accounts on their phones. Those accounts often default to data retention and, on some consumer tiers, to training. The organization inherits the exposure with none of the visibility.

Danger — copy-paste muscle memory

The most common real-world leak is mundane: someone pastes an entire document to “just summarize it,” not noticing it carries embedded customer PII, a salary figure, or a client name. The intent is harmless; the disclosure is total. Train people to read what they paste before they paste it.

Danger — auto-save and cross-device sync

Conversations are typically saved and synced across every device signed into the account. Data pasted on a managed work laptop can reappear on an employee’s personal phone — outside your controls, and on a device you cannot wipe.

Danger — browser extensions & third-party integrations

“AI assistant” browser extensions and plug-ins may transmit the contents of the pages you view to servers governed by terms you never reviewed. An unvetted integration can quietly route confidential data to an unknown processor.

4The single most important distinction

The tier that changes everything

Most AI providers run two fundamentally different products under one brand: a consumer tier (free and personal-paid plans) and a commercial tier (business, enterprise, and API access governed by commercial terms). The interface looks the same. The data terms are not.

On many consumer tiers, conversations may be retained for extended periods and used to improve future models. The specifics — whether training is opt-in, opt-out, or on by default, and how long data is kept — vary by provider and have changed materially over the past year, so the only safe practice is to read the current terms for the exact plan in use rather than rely on a vendor’s reputation.

Commercial and enterprise tiers typically exclude business data from training by default, offer shorter retention, and — for qualifying customers — support Zero Data Retention arrangements and contractual instruments such as a Data Processing Addendum or a Business Associate Agreement for regulated health data. The protections that make AI defensible for confidential work generally live only on these tiers, and only when the corresponding agreement is actually signed and configured.

The practical rule

If the work touches PII, regulated, confidential, or privileged data, it belongs on a contracted enterprise tier with a signed data agreement — never on a personal or free account. The tier, not the tool, is what determines whether your data is protected.

5When the risk became case law

Case study: United States v. Heppner

For years, the danger of pasting sensitive material into a consumer AI tool was theoretical — a matter of policy and prudence. In early 2026, a federal court made it concrete.

S.D.N.Y. · Judge Jed S. Rakoff

A criminal defendant’s conversations with a consumer AI tool were ruled not protected by attorney-client privilege.

United States v. Heppner — bench ruling Feb 10, 2026; written opinion Feb 17, 2026

Defendant Bradley Heppner, facing securities and wire-fraud charges, used the consumer version of a generative AI tool to prepare materials outlining his defense strategy. Roughly thirty-one such documents were seized. He asserted privilege over them. Judge Rakoff — addressing what the court called a question of first impression nationwide — rejected the claim.

Not a confidential communication. The AI platform is a third party, and an AI tool is not an attorney — so the exchanges were not protected attorney-client communications.

No reasonable expectation of confidentiality. The court pointed to the consumer privacy policy, which described collecting user inputs and outputs and reserved the right to disclose data to third parties, including government regulators.

Work-product doctrine did not apply. The material was generated by the client and the tool, not at the direction of counsel.

Commentators across the legal community treated the ruling as highly fact-specific — but as a clear warning: putting potentially privileged information into a public AI tool can waive the very protection it was meant to preserve.

Danger — protection can be waived on contact

Privilege and trade-secret status are not properties of a document; they are properties of how the document is handled. Disclosing material to a third-party tool can destroy the protection irreversibly, and no after-the-fact policy can restore it.

Danger — the privacy policy is the contract

The court relied on the tool’s own published terms. Those terms — data collection, retention, and the right to disclose to authorities — are what a court will read back to you. Read them before, not after.

Danger — AI content is discoverable evidence

AI conversations are stored records. They can be seized, subpoenaed, and produced in litigation. Anything an employee types may later be read aloud in a courtroom or a regulator’s office.

Danger — a compromised account is a single, searchable archive

If an attacker phishes a password, steals a session, or gets to an unlocked device, they inherit the entire conversation history tied to that account — potentially years of prompts and outputs on a long-retention consumer tier. Every credential, customer record, or confidential document an employee ever pasted in is now sitting in one place, ready to read. The AI client does not become a doorway into unrelated systems, but it does become a concentrated trove: the more sensitive data went in, the more a single compromise exposes. Multi-factor authentication, short retention, and a strict no-secrets rule are what limit the blast radius.

Danger — prompt injection

When an AI tool is asked to read a document, email, or webpage, a malicious instruction hidden inside that content can hijack what the tool does next — for example, coaxing it to reveal earlier parts of the conversation or take an unintended action. As tools gain the ability to browse, open files, and use integrations, untrusted content should be treated as potentially hostile, not merely as data.

Danger — training-data leakage

If confidential inputs are used to train a model — as can happen on consumer tiers with training enabled — fragments of that data can in principle resurface in another user’s output. The risk is rare and hard to trigger deliberately, but it is real, and it is one more reason confidential work belongs only on tiers that exclude your data from training.

6How it actually goes wrong

Anatomy of a bad day

Most breaches are not dramatic. They are a chain of small, reasonable-looking steps. Here is how the risks in this paper combine into a single incident.

9:14 a.m. — A sales associate exports a customer list to answer a quick question and pastes it into a personal AI account on their phone: “Which of these accounts are up for renewal?” The list contains names, emails, and contract values for several hundred customers. The intent is harmless. The data is now stored on a consumer tier with multi-year retention.

That evening — The same account syncs to the associate’s laptop and tablet. The customer list now exists in three places the organization does not control and cannot wipe.

Three weeks later — The associate falls for a phishing email and their password is stolen. Because the account is personal, there is no single sign-on and no enforced multi-factor authentication to stop the attacker.

Within minutes — The attacker opens the conversation history and finds the customer list — plus everything else the associate ever pasted in over months of use. A single compromise has become a searchable archive of confidential data.

The aftermath — The organization remains the accountable party for that personal data. Depending on the jurisdiction and the data involved, a breach-notification duty may now be triggered, along with regulatory exposure and the cost of notifying every affected customer.

The lesson

No single step here was malicious or even unusual. The damage came from the absence of guardrails — a sanctioned tier, a classification rule, SSO and MFA, short retention, and a trained workforce. Each control in §9 breaks this chain at a different link.

7Canada & the United States

The North American legal lens

North America has no single AI privacy statute. Instead, organizations operate under a layered patchwork — federal, provincial, and state — and an AI tool does not suspend any of it.

Canada. The Personal Information Protection and Electronic Documents Act (PIPEDA) governs personal information in commercial activity, with substantially similar provincial regimes in Alberta, British Columbia, and Quebec, and sector rules such as PHIPA for health data. Consent, purpose limitation, and accountability for transfers to third-party processors all apply when personal data is sent to an AI provider.

United States. There is no comprehensive federal privacy law. Instead, a growing roster of state statutes — led by California’s CCPA/CPRA and joined by Virginia, Colorado, Connecticut, Utah, Texas, and many more — create overlapping obligations, alongside sector laws like HIPAA for health and GLBA for financial data. Data & More tracks this expanding body of North American privacy law across both countries.

Danger — cross-border transfer

Sending personal data to an AI provider often means transferring it across borders to wherever the provider processes it. That transfer is itself a regulated act and may require specific contractual safeguards. “The data never left my laptop” is not true the moment it enters the prompt.

Danger — you remain the accountable party

Using a third-party tool does not transfer your legal accountability for the personal data you control. If the processing was unlawful, the obligation — and the breach-notification duty — lands on your organization, not the vendor.

8Adopt this quarter

A governance checklist

A workable AI data policy does not require a six-month committee. A handful of controls cover the great majority of exposure.

Sanction a tool on a commercial tier. Give employees an approved enterprise account with a signed DPA so they are not driven to personal accounts.

Publish a clear data-classification rule. State plainly which of the data categories may never be entered, and where.

Ban secrets outright. No passwords, keys, or tokens in any prompt, on any tier — and rotate any that slip through.

Verify the data terms. Confirm training is disabled and retention is acceptable for the exact plan you use; re-check when terms change.

Sign the right agreements. DPA for personal data; BAA for health data; ZDR where the sensitivity demands it.

Audit for shadow AI. Find unmanaged personal-account usage and bring it onto the sanctioned tool.

Enable SSO and MFA. Put the sanctioned tool behind single sign-on and multi-factor authentication so a stolen password alone cannot open the conversation archive.

Set and enforce a retention policy. Choose the shortest retention the work allows, and review access logs periodically for anomalies.

Name an accountable owner. Assign one person responsibility for AI governance — not “everyone,” which means no one.

10.

Train people on the line. Most leaks are well-intentioned. A fifteen-minute briefing on the data categories prevents most of them.

11.

Treat AI logs as records. Assume conversations are discoverable, and govern their retention as you would any other business record.

9At a glance

Decision matrix

A summary view of what belongs where. When in doubt, treat the data as one level more sensitive than it first appears.

Data type	Consumer tier	Commercial / enterprise + DPA	Notes
Public / de-identified	Caution	OK	Confirm de-identification is genuine and irreversible.
Internal non-sensitive	Caution	OK	Prefer the sanctioned tool to avoid shadow AI.
PII	Never	OK*	*With DPA and lawful basis under PIPEDA / state law.
Regulated (health / financial)	Never	Conditional	Requires BAA / sector safeguards; ZDR advisable.
Company confidential / trade secret	Never	Conditional	Disclosure can forfeit “secret” status.
Privileged legal material	Never	Conditional	See Heppner; involve counsel before use.
Secrets / credentials	Never	Never	No exceptions, on any tier.

Draw the line before someone else draws it for you

Data & More helps North American organizations find, classify, and govern their sensitive data — so you know exactly what should never reach an AI prompt, and can prove it.

Talk to our team

Disclaimer. This white paper is provided for general information and is not legal advice. Regulations, court decisions, and AI providers’ data terms change frequently; organizations should verify current terms and consult qualified legal counsel before relying on any statement here. Examples naming specific tools describe publicly reported facts and are illustrative of risks that apply to generative AI tools generally.