Real-World Purview Benchmark: We Tested Microsoft Purview on 89,521 Files
Nobody believes a small Danish company when we say Purview classification does not work for real-world compliance with data privacy legislation. So we did something about it: we created a synthetic benchmark dataset that anyone can download and test themselves.
The Dataset
We analyzed over 5 petabytes of real-world data and generated a synthetic dataset that mirrors real unstructured data, without using any actual personal data. The distribution of file types is based on statistical analysis and reflects the typical mix found in enterprise environments.

- 89,521 files total
- 73 GB total size
- 863 KB average file size
- 100% manually validated ground truth
The dataset is available on Hugging Face for anyone to download and reproduce: huggingface.co/datasets/kbillesk/synthetic-privacy
Privacy and Security Data in the Benchmark
The benchmark dataset contains 25 categories of privacy and security data. The two categories used for direct Purview comparison are Payment Card (5,286 files) and Passport (53 files).

Why Classification Matters
Everything in Microsoft Purview depends on correct classification. Sensitivity labels, auto-labeling, Data Loss Prevention (DLP), and retention management all rely on the ability to classify and tag data correctly. If classification fails, every downstream feature fails with it.
Benchmark Results: Passport Classification
We used Purview’s out-of-the-box Passport classification (set to medium confidence) and compared it against the known ground truth in our dataset. The data was stored in SharePoint.

-
99.8% misclassified by Purview’s classification. Purview relies on simple regular expressions, which produce false positives and false negatives — passports stored as images or scans go undetected.
-
98.1% not labeled by Purview’s auto-labeling. Only 1 file was correctly labeled with a sensitivity label, because Purview only supports a limited number of file types.
Benchmark Results: Payment Card Classification
For payment card data, the results were somewhat better but still alarming:

-
91% misclassified by Purview. The regex-based classification produces far too many false positives, incorrectly flagging data as payment cards.
-
57% not labeled by auto-labeling. Only 43% of payment card data was correctly labeled, because Purview only supports a limited number of file formats.
What Purview Is Missing for Data Privacy Compliance
Purview has no comprehensive data privacy compliance classification. It offers only a few imprecise samples in a limited number of languages. To achieve actual compliance, organizations would need to custom-develop classification for 28+ languages across dozens of missing privacy categories.

Missing national certificates: Birth, citizenship, marriage, divorce, registered partnership, ancestry, name change, proof of citizenship, marital status, place of residence, religion, and more.
Missing privacy data types: Health information (medicine, diagnosis, illness), political orientation, sexual orientation, ethnic origin, trade union affiliation, religious orientation, personal tax information, salary information, employment contracts, recruitment data, CVs, written warnings, work absence, criminal records, written consents, travel information, photo geolocation, and more.
The Labeling Dilemma
Purview offers two labeling options, and both have fundamental limitations:

Only works on new files, but 99.99% of data already exists. Only 0.77% of data gets manually tagged — users dislike it and do not do it voluntarily. Limited to 4–5 label options, which is insufficient for real compliance requirements.
Does not support emails at rest or attachments, and covers only ~5% of all unstructured data in real-world environments. Classification has a 50–99% error rate. Additional per-page costs apply for PDF and image OCR on top of the E5 license.
Try It Yourself
Download the dataset from Hugging Face, load it into your own SharePoint or Exchange environment, and run Purview classification against it. Compare the results to the manually validated ground truth. We are confident you will see the same results.
Want to see how Data & More classifies the same data?
Get in touch for a demo and we’ll walk you through the comparison first-hand.