Used to scan 7BN+ unstructured data items1BN Insight report
Executive Battlecard · v2
Microsoft Purview
& Data & More
Why Purview needs a data foundation — and how Data & More delivers what Microsoft account reps promise but Purview cannot provide on its own.
43%
Purview credit card
accuracy (out-of-box)
1.9%
Purview passport
detection rate
100%
D&M passport
detection rate
89K+
Files in public
benchmark dataset
The message for your executive audience
Microsoft Purview is a security policy enforcement tool — not a data intelligence platform. Without knowing what sensitive data you have and where it lives, Purview cannot protect it. We’ve proven this on a publicly available 89,521-file benchmark — and we invite you to run it yourself. Data & More builds the data foundation that makes your Purview investment actually deliver.
Data Foundation
Purview Enabler
Benchmarked
Where Purview Falls Short The Reality
What Microsoft account reps promise — and what Purview actually delivers out of the box.
Accurate Classification Requires Months of Customer Work
Out of the box, Purview relies on regex and keyword matching — producing a 5:1 noise ratio on payment cards and 563:1 on passports. This only works on file types Purview can read: legacy formats like .doc, .xls, and .ppt, plain .csv files, and .msg email archives are not natively supported and are silently skipped. Trainable classifiers use machine learning but shift the entire burden to the customer: you must source 50–500 positive and 150–1,500 negative samples per pattern, run weeks of training cycles, and review results manually. Once a classifier is published it cannot be retrained — if it underperforms, you delete it and start over from scratch.
Out of 53 passport files in the benchmark, Purview auto-labeled exactly 1 correctly. Data & More found all of them — with no training burden on the customer.
Focused on M365 — Limited Beyond It
Purview is purpose-built for M365 and Azure. File servers, unstructured cloud storage, and data sources outside the Microsoft ecosystem receive limited or no coverage. Auto-labeling in SharePoint and OneDrive is largely restricted to modern Office formats — .docx, .pptx, .xlsx — with little to no support for .csv, .msg, .tmp, or list item attachments. Unstructured data outside that narrow set is exactly where classification risk is highest and Purview’s reach runs out.
DLP Policy on Inaccurate Labels = False Security
Data Loss Prevention policies in Purview are only as strong as the classification underneath them. The benchmark demonstrates that out-of-box Purview misses the majority of sensitive files while simultaneously flooding security teams with false alerts. Files in archive containers like .zip and .rar are not inspected at all — their contents are invisible to DLP policy entirely. The result is both coverage gaps and alert fatigue at the same time.
Every Change Requires a Full Re-Scan
Any time a classification pattern changes or your organization wants to look for something new, Purview requires a full re-scan of your entire data estate. There is no incremental or targeted update. In large environments this means weeks of scanning time and repeated cost — every single time.
Add a new data type to your policy? That’s another full pass of every file in your environment — weeks of scan time and, if OCR is involved, cost per document again.
Exchange at Rest Is Not Covered
Purview can enforce DLP on outbound email and apply labels in transit — but it does not classify or label Exchange mailbox data at rest. .msg and .eml files exported or archived outside Exchange are also unsupported for auto-labeling. Years of email containing sensitive information sitting in mailboxes or archive folders remains unclassified, untagged, and outside the scope of your data protection policies.
The True Cost Compounds Every Time You Re-Scan
Base Purview licenses are just the starting point. OCR and trainable classifiers both require license uplift. Worse, OCR is metered — you pay per document. Combined with the requirement to re-scan every time a pattern changes, organizations end up paying OCR costs repeatedly for the same documents that haven’t changed. The cost of getting to accuracy is open-ended.
Purview’s metered OCR model means every re-scan triggered by a policy update generates a new bill — even for files that were already processed.
Objection Handling In the Room
Responses to what the Microsoft account team and internal stakeholders will say.
“We already have Purview — it handles our data security.”
Our benchmark shows Purview missed 57% of payment card files and 98% of passport files on publicly available test data — with default settings. And that’s only what it could see. Files in legacy formats like .doc and .xls, or inside .zip archives, weren’t even scanned. The question isn’t whether you have Purview — it’s whether Purview can see the data it’s supposed to protect.
“Microsoft is building more into Purview every quarter.”
We agree — and we’re not competing with it. Purview is a policy engine. It needs accurate, complete data classification to enforce against. We provide that layer. The better Purview gets, the more value we add by feeding it accurate input.
“Our Microsoft rep says Purview covers all our data security needs.”
Ask them to run their own test against our public benchmark dataset. The data is on Hugging Face — free to download. We’ve validated every file manually. The numbers speak for themselves.
“We don’t want another vendor layered on top of Microsoft.”
We feed classifications directly into Purview sensitivity labels. We’re not a parallel system — we’re the intelligence layer that makes your existing Microsoft investment perform. Think of us as the quality control step before Purview’s policies run.
“Can we tune Purview to improve accuracy?”
Yes — with significant effort. But tuning only improves accuracy on file types Purview already supports. It will never classify .doc, .csv, .msg, or archive contents — those gaps are architectural, not configuration issues. Tuning Purview is a project; deploying D&M to feed it is a foundation. One makes the other sustainable.
Capability Comparison Side by Side
How the two platforms compare across data identification and classification capabilities.
CapabilityMicrosoft PurviewData & More
Classification MethodRegex and keywords out-of-box; trainable classifiers require weeks of customer-led training per pattern, cannot be retrained once publishedMulti-factor: content + context + relationships + format + age
Image & Scan DetectionVery limited — OCR requires paid license uplift and is charged per document per scanSupported — scanned PDFs, image files, mixed-format files; no per-document charge
Data Discovery ScopeFocused on M365 and Azure; limited outside (file servers, unstructured cloud storage)Unstructured data across full estate: on-prem file shares, cloud storage, email, and more
Classifier Training OverheadCustomer responsible: 50–500 positive + 150–1,500 negative samples per pattern; weeks to train; cannot retrain once publishedNo training burden — classification is ready to deploy out of the box
File Type CoverageSilently skips unsupported types: .doc, .xls, .ppt, .csv, .msg, .eml, .zip/.rar contents, and moreClassifies legacy Office, plain text, email archives, containers, and mixed-format files
Cost ModelBase license + uplift for OCR and trainable classifiers; OCR is metered (pay per document, per scan)Predictable — no per-document OCR charges or re-scan fees
False Positive RateUp to 563:1 noise ratio (benchmark-verified)High precision — accurate labeling reduces alert noise
Purview IntegrationNative — it is the platformClassifications feed directly into Purview labels
Data RemediationIdentifies risk; does not remediate data or stale filesIdentify, quarantine, archive, delete, or re-label
Primary PurposePolicy enforcement & access controlData foundation — classification accuracy & completeness
How D&M Makes Purview Work Additive
We don’t replace Purview. We give it the data accuracy it needs to protect what matters.
1
Discover the Full Data Estate
Map everything — file servers, databases, M365, cloud — giving you a complete inventory before policies go live. Not just what Purview already knows about.
2
Classify with Multi-Factor Precision
Combine content type, context, data relationships, age, and format simultaneously. The benchmark proves the difference: 98.9% vs. 43% on payment cards; 100% vs. 1.9% on passports.
3
Feed Purview with Accurate Labels
Our classifications sync directly into Purview sensitivity labels. Purview enforces policy on verified, accurate data — not regex guesses.
4
Reduce DLP Noise Dramatically
Accurate input data means fewer false positives. Security teams stop drowning in alerts and start responding to real risk.
5
Deliver Granularity Purview Cannot
Every organization needs to understand how much sensitive data they have, where exactly it lives, and who has access. D&M provides that granular inventory — by location, file type, sensitivity level, age, and owner. Purview applies labels; it does not give you the data map underneath.
6
Cover the File Types Purview Doesn’t Support
Purview has a defined list of supported file types — files outside it are silently skipped. Common gaps include: legacy Office formats (.doc, .xls, .ppt), plain text and .csv files, archive containers (.zip, .rar — contents not inspected), .msg and .eml email files, and auto-labeling is largely restricted to modern Office files only. D&M classifies across all of these — sensitive data doesn’t get a free pass because of its file extension.
7
Keep Labels Current as Data Grows
Continuous scanning ensures Purview’s labels stay accurate as new data enters the estate — governance doesn’t decay over time.
Architecture Flow
🔍
D&M Scan
Full estate
🏷️
D&M Labels
Multi-factor
🛡️
Purview
Policy engine
DLP Works
Real protection
Payment Card Detection
5,286 actual payment card files · stored in SharePoint · out-of-box Purview settings
Purview — Correctly Labeled43%
Purview — Not Labeled (missed)57%
Purview — False Positive Alerts26,413 files flagged
Purview flagged 26,413 files as containing payment card data. Actual payment card files: 5,286. Noise ratio: ~5:1.
Data & More — Correctly Classified98.9%
57%
Purview miss rate
98.9%
D&M accuracy
Passport Detection
53 actual passport files · stored in SharePoint · out-of-box Purview settings
Purview — Correctly Auto-Labeled1.9%
Purview correctly labeled 1 out of 53 passport files automatically. Reason: limited filetype support and regex-only detection.
Purview — Not Labeled (missed)98.1%
Purview — False Positive Alerts29,845 files flagged
Purview flagged 29,845 files as containing passport data. Actual passport files: 53. Noise ratio: ~563:1.
Data & More — Correctly Classified100%
98.1%
Purview miss rate
100%
D&M accuracy
Run the benchmark yourself — the dataset is public
The 89,521-file synthetic benchmark dataset (73 GB) is available on Hugging Face for free download. It was built from statistical analysis of 5+ PB of real-world data and reflects actual unstructured data distribution — without using any real personal information. Download it, run Purview against it, and compare the results to what we show here.
huggingface.co/datasets/kbillesk/synthetic-privacy
89,521
Total files
73 GB
Dataset size
100%
Manually validated
The executive ask

Don’t let Purview protect data it doesn’t know exists.

The benchmark shows it plainly: out-of-box Purview missed 98% of passport files and generated a 563:1 false alert ratio. Every day you run Purview without an accurate data foundation, sensitive data goes unclassified and your security posture is built on incomplete information. Data & More closes that gap — with numbers you can verify yourself.

4–6wk
Time to first
complete data map
5PB+
Real-world data
analyzed to date
89K
Benchmark files
free to download
info@dataandmore.com