Executive summary
Generative AI tools have moved from novelty to default workflow in barely two years. Employees draft contracts, summarize customer records, debug code, and analyze spreadsheets inside chat windows — often without a single line of policy governing what they are allowed to paste in.
That gap is the risk. The same property that makes an AI assistant useful — it accepts and reasons over whatever text you give it — means a careless paste can move your most sensitive data outside your control, outside your jurisdiction, and in some cases outside the protections of the law. The technology is not the hazard. The data you feed it, and the tier you feed it on, is.
This paper sets out a defensible position for any North American organization: a clear taxonomy of what is and is not safe to process, the legal reasoning behind those lines, and a governance checklist you can adopt this quarter. It is deliberately written about AI tools in general, with one named example where a court has now ruled directly on the question.
“The technology is not the hazard. The data you feed it — and the tier you feed it on — is.”
A risk taxonomy of sensitive data
Before drawing any line, an organization has to name the categories. Not all data carries the same exposure, and treating it uniformly leads either to paralysis or to negligence. The categories below cover the great majority of what flows through a workplace AI tool.
Personal identifiable information (PII)
Names, addresses, government IDs, dates of birth, biometric and location data. Regulated under PIPEDA in Canada and a growing patchwork of US state laws.
Regulated special-category data
Health records (HIPAA/PHIPA), financial account data, children’s data. Carries statutory breach-notification and consent obligations.
Company confidential
Strategy, M&A material, unreleased financials, source code, pricing, internal communications. Loss erodes competitive position and may breach disclosure rules.
Secrets & credentials
Passwords, API keys, access tokens, encryption keys, certificates. Pasting one is functionally equivalent to publishing it.
Third-party & contractual data
Customer data, partner data, and anything you hold under NDA or a data-processing agreement. You may be contractually barred from disclosing it to any sub-processor.
Privileged material
Communications and work product tied to legal advice. As Heppner now shows, this protection can be lost the moment it enters a consumer AI tool.
Authentication & access data
Multi-factor codes, security-question answers, password-reset links, and account-recovery information. A single exposed item can unlock systems far beyond the AI tool itself.
Source code & infrastructure config
Proprietary code, internal endpoints, network topology, and deployment settings. Beyond intellectual-property loss, this hands a would-be attacker a map of your environment.
What organizations should — and should not — do
The goal is not to ban AI. Organizations that ban it outright simply push usage into unmanaged personal accounts — the “shadow AI” problem — which is worse. The goal is to channel usage toward low-risk, high-value patterns and away from the handful of actions that create real liability.
Should use AI for
- Drafting and editing on non-sensitive content — marketing copy, internal docs, public-facing material.
- Working over data that is already public or properly de-identified / anonymized.
- Coding help with credentials stripped and secrets replaced by placeholders.
- Summarizing or restructuring documents inside a contracted enterprise tier with a signed DPA.
- Brainstorming, research, and learning — where no confidential input is required.
- Formatting and tidying meeting notes — once names and identifying details have been removed.
Should never put in
- Passwords, API keys, tokens, certificates — in any tool, on any tier, ever.
- Raw PII or regulated records (health, financial, biometric) into a consumer-grade account.
- Privileged legal material into a public tool — it can forfeit protection (see §6).
- Third-party data held under NDA / DPA that bars onward disclosure.
- Trade secrets and unreleased financials — once disclosed, “secret” status can be lost.
- Employee & HR records — salaries, reviews, health, and disciplinary files into a consumer-grade account.
A password or API key pasted into a prompt may be stored, logged, processed by sub-systems, and — on a consumer tier with training enabled — retained for years. Treat any secret entered into an AI tool as compromised and in need of rotation, exactly as you would a secret pasted into a public webpage.
Removing a name is not anonymization. Combinations of seemingly innocuous fields — postal code, birth date, employer — routinely re-identify individuals. If re-identification is plausible, the data is still personal data and the rules still apply.
When no sanctioned tool exists, employees use personal accounts on their phones. Those accounts often default to data retention and, on some consumer tiers, to training. The organization inherits the exposure with none of the visibility.
The most common real-world leak is mundane: someone pastes an entire document to “just summarize it,” not noticing it carries embedded customer PII, a salary figure, or a client name. The intent is harmless; the disclosure is total. Train people to read what they paste before they paste it.
Conversations are typically saved and synced across every device signed into the account. Data pasted on a managed work laptop can reappear on an employee’s personal phone — outside your controls, and on a device you cannot wipe.
“AI assistant” browser extensions and plug-ins may transmit the contents of the pages you view to servers governed by terms you never reviewed. An unvetted integration can quietly route confidential data to an unknown processor.
The tier that changes everything
Most AI providers run two fundamentally different products under one brand: a consumer tier (free and personal-paid plans) and a commercial tier (business, enterprise, and API access governed by commercial terms). The interface looks the same. The data terms are not.
On many consumer tiers, conversations may be retained for extended periods and used to improve future models. The specifics — whether training is opt-in, opt-out, or on by default, and how long data is kept — vary by provider and have changed materially over the past year, so the only safe practice is to read the current terms for the exact plan in use rather than rely on a vendor’s reputation.
Commercial and enterprise tiers typically exclude business data from training by default, offer shorter retention, and — for qualifying customers — support Zero Data Retention arrangements and contractual instruments such as a Data Processing Addendum or a Business Associate Agreement for regulated health data. The protections that make AI defensible for confidential work generally live only on these tiers, and only when the corresponding agreement is actually signed and configured.
If the work touches PII, regulated, confidential, or privileged data, it belongs on a contracted enterprise tier with a signed data agreement — never on a personal or free account. The tier, not the tool, is what determines whether your data is protected.
Case study: United States v. Heppner
For years, the danger of pasting sensitive material into a consumer AI tool was theoretical — a matter of policy and prudence. In early 2026, a federal court made it concrete.
Privilege and trade-secret status are not properties of a document; they are properties of how the document is handled. Disclosing material to a third-party tool can destroy the protection irreversibly, and no after-the-fact policy can restore it.
The court relied on the tool’s own published terms. Those terms — data collection, retention, and the right to disclose to authorities — are what a court will read back to you. Read them before, not after.
AI conversations are stored records. They can be seized, subpoenaed, and produced in litigation. Anything an employee types may later be read aloud in a courtroom or a regulator’s office.
If an attacker phishes a password, steals a session, or gets to an unlocked device, they inherit the entire conversation history tied to that account — potentially years of prompts and outputs on a long-retention consumer tier. Every credential, customer record, or confidential document an employee ever pasted in is now sitting in one place, ready to read. The AI client does not become a doorway into unrelated systems, but it does become a concentrated trove: the more sensitive data went in, the more a single compromise exposes. Multi-factor authentication, short retention, and a strict no-secrets rule are what limit the blast radius.
When an AI tool is asked to read a document, email, or webpage, a malicious instruction hidden inside that content can hijack what the tool does next — for example, coaxing it to reveal earlier parts of the conversation or take an unintended action. As tools gain the ability to browse, open files, and use integrations, untrusted content should be treated as potentially hostile, not merely as data.
If confidential inputs are used to train a model — as can happen on consumer tiers with training enabled — fragments of that data can in principle resurface in another user’s output. The risk is rare and hard to trigger deliberately, but it is real, and it is one more reason confidential work belongs only on tiers that exclude your data from training.
Anatomy of a bad day
Most breaches are not dramatic. They are a chain of small, reasonable-looking steps. Here is how the risks in this paper combine into a single incident.
9:14 a.m. — A sales associate exports a customer list to answer a quick question and pastes it into a personal AI account on their phone: “Which of these accounts are up for renewal?” The list contains names, emails, and contract values for several hundred customers. The intent is harmless. The data is now stored on a consumer tier with multi-year retention.
That evening — The same account syncs to the associate’s laptop and tablet. The customer list now exists in three places the organization does not control and cannot wipe.
Three weeks later — The associate falls for a phishing email and their password is stolen. Because the account is personal, there is no single sign-on and no enforced multi-factor authentication to stop the attacker.
Within minutes — The attacker opens the conversation history and finds the customer list — plus everything else the associate ever pasted in over months of use. A single compromise has become a searchable archive of confidential data.
The aftermath — The organization remains the accountable party for that personal data. Depending on the jurisdiction and the data involved, a breach-notification duty may now be triggered, along with regulatory exposure and the cost of notifying every affected customer.
No single step here was malicious or even unusual. The damage came from the absence of guardrails — a sanctioned tier, a classification rule, SSO and MFA, short retention, and a trained workforce. Each control in §9 breaks this chain at a different link.
The North American legal lens
North America has no single AI privacy statute. Instead, organizations operate under a layered patchwork — federal, provincial, and state — and an AI tool does not suspend any of it.
Canada. The Personal Information Protection and Electronic Documents Act (PIPEDA) governs personal information in commercial activity, with substantially similar provincial regimes in Alberta, British Columbia, and Quebec, and sector rules such as PHIPA for health data. Consent, purpose limitation, and accountability for transfers to third-party processors all apply when personal data is sent to an AI provider.
United States. There is no comprehensive federal privacy law. Instead, a growing roster of state statutes — led by California’s CCPA/CPRA and joined by Virginia, Colorado, Connecticut, Utah, Texas, and many more — create overlapping obligations, alongside sector laws like HIPAA for health and GLBA for financial data. Data & More tracks this expanding body of North American privacy law across both countries.
Sending personal data to an AI provider often means transferring it across borders to wherever the provider processes it. That transfer is itself a regulated act and may require specific contractual safeguards. “The data never left my laptop” is not true the moment it enters the prompt.
Using a third-party tool does not transfer your legal accountability for the personal data you control. If the processing was unlawful, the obligation — and the breach-notification duty — lands on your organization, not the vendor.
A governance checklist
A workable AI data policy does not require a six-month committee. A handful of controls cover the great majority of exposure.
Decision matrix
A summary view of what belongs where. When in doubt, treat the data as one level more sensitive than it first appears.
Draw the line before someone else draws it for you
Data & More helps North American organizations find, classify, and govern their sensitive data — so you know exactly what should never reach an AI prompt, and can prove it.
© 2026 Data & More. North America — Calgary, Alberta.