Enhancing Purview

43%

Purview credit card
accuracy (out-of-box)

1.9%

Purview passport
detection rate

100%

D&M passport
detection rate

89K+

Files in public
benchmark dataset

The message for your executive audience

Microsoft Purview is a security policy enforcement tool — not a data intelligence platform. Without knowing what sensitive data you have and where it lives, Purview cannot protect it. We’ve proven this on a publicly available 89,521-file benchmark — and we invite you to run it yourself. Data & More builds the data foundation that makes your Purview investment actually deliver.

Data Foundation

Purview Enabler

Benchmarked

Where Purview Falls Short The Reality

What Microsoft account reps promise — and what Purview actually delivers out of the box.

Accurate Classification Requires Months of Customer Work

Out of the box, Purview relies on regex and keyword matching — producing a 5:1 noise ratio on payment cards and 563:1 on passports. This only works on file types Purview can read: legacy formats like .doc, .xls, and .ppt, plain .csv files, and .msg email archives are not natively supported and are silently skipped. Trainable classifiers use machine learning but shift the entire burden to the customer: you must source 50–500 positive and 150–1,500 negative samples per pattern, run weeks of training cycles, and review results manually. Once a classifier is published it cannot be retrained — if it underperforms, you delete it and start over from scratch.

Out of 53 passport files in the benchmark, Purview auto-labeled exactly 1 correctly. Data & More found all of them — with no training burden on the customer.

Focused on M365 — Limited Beyond It

Purview is purpose-built for M365 and Azure. File servers, unstructured cloud storage, and data sources outside the Microsoft ecosystem receive limited or no coverage. Auto-labeling in SharePoint and OneDrive is largely restricted to modern Office formats — .docx, .pptx, .xlsx — with little to no support for .csv, .msg, .tmp, or list item attachments. Unstructured data outside that narrow set is exactly where classification risk is highest and Purview’s reach runs out.

DLP Policy on Inaccurate Labels = False Security

Data Loss Prevention policies in Purview are only as strong as the classification underneath them. The benchmark demonstrates that out-of-box Purview misses the majority of sensitive files while simultaneously flooding security teams with false alerts. Files in archive containers like .zip and .rar are not inspected at all — their contents are invisible to DLP policy entirely. The result is both coverage gaps and alert fatigue at the same time.

Every Change Requires a Full Re-Scan

Any time a classification pattern changes or your organization wants to look for something new, Purview requires a full re-scan of your entire data estate. There is no incremental or targeted update. In large environments this means weeks of scanning time and repeated cost — every single time.

Add a new data type to your policy? That’s another full pass of every file in your environment — weeks of scan time and, if OCR is involved, cost per document again.

Exchange at Rest Is Not Covered

Purview can enforce DLP on outbound email and apply labels in transit — but it does not classify or label Exchange mailbox data at rest. .msg and .eml files exported or archived outside Exchange are also unsupported for auto-labeling. Years of email containing sensitive information sitting in mailboxes or archive folders remains unclassified, untagged, and outside the scope of your data protection policies.

The True Cost Compounds Every Time You Re-Scan

Base Purview licenses are just the starting point. OCR and trainable classifiers both require license uplift. Worse, OCR is metered — you pay per document. Combined with the requirement to re-scan every time a pattern changes, organizations end up paying OCR costs repeatedly for the same documents that haven’t changed. The cost of getting to accuracy is open-ended.

Purview’s metered OCR model means every re-scan triggered by a policy update generates a new bill — even for files that were already processed.

Objection Handling In the Room

Responses to what the Microsoft account team and internal stakeholders will say.

“We already have Purview — it handles our data security.”

Our benchmark shows Purview missed 57% of payment card files and 98% of passport files on publicly available test data — with default settings. And that’s only what it could see. Files in legacy formats like .doc and .xls, or inside .zip archives, weren’t even scanned. The question isn’t whether you have Purview — it’s whether Purview can see the data it’s supposed to protect.

“Microsoft is building more into Purview every quarter.”

We agree — and we’re not competing with it. Purview is a policy engine. It needs accurate, complete data classification to enforce against. We provide that layer. The better Purview gets, the more value we add by feeding it accurate input.

“Our Microsoft rep says Purview covers all our data security needs.”

Ask them to run their own test against our public benchmark dataset. The data is on Hugging Face — free to download. We’ve validated every file manually. The numbers speak for themselves.

“We don’t want another vendor layered on top of Microsoft.”

We feed classifications directly into Purview sensitivity labels. We’re not a parallel system — we’re the intelligence layer that makes your existing Microsoft investment perform. Think of us as the quality control step before Purview’s policies run.

“Can we tune Purview to improve accuracy?”

Yes — with significant effort. But tuning only improves accuracy on file types Purview already supports. It will never classify .doc, .csv, .msg, or archive contents — those gaps are architectural, not configuration issues. Tuning Purview is a project; deploying D&M to feed it is a foundation. One makes the other sustainable.

Capability Comparison Side by Side

How the two platforms compare across data identification and classification capabilities.

Capability	Microsoft Purview	Data & More
Classification Method	Regex and keywords out-of-box; trainable classifiers require weeks of customer-led training per pattern, cannot be retrained once published	Multi-factor: content + context + relationships + format + age
Image & Scan Detection	Very limited — OCR requires paid license uplift and is charged per document per scan	Supported — scanned PDFs, image files, mixed-format files; no per-document charge
Data Discovery Scope	Focused on M365 and Azure; limited outside (file servers, unstructured cloud storage)	Unstructured data across full estate: on-prem file shares, cloud storage, email, and more
Classifier Training Overhead	Customer responsible: 50–500 positive + 150–1,500 negative samples per pattern; weeks to train; cannot retrain once published	No training burden — classification is ready to deploy out of the box
File Type Coverage	Silently skips unsupported types: `.doc`, `.xls`, `.ppt`, `.csv`, `.msg`, `.eml`, `.zip`/`.rar` contents, and more	Classifies legacy Office, plain text, email archives, containers, and mixed-format files
Cost Model	Base license + uplift for OCR and trainable classifiers; OCR is metered (pay per document, per scan)	Predictable — no per-document OCR charges or re-scan fees
False Positive Rate	Up to 563:1 noise ratio (benchmark-verified)	High precision — accurate labeling reduces alert noise
Purview Integration	Native — it is the platform	Classifications feed directly into Purview labels
Data Remediation	Identifies risk; does not remediate data or stale files	Identify, quarantine, archive, delete, or re-label
Primary Purpose	Policy enforcement & access control	Data foundation — classification accuracy & completeness

How D&M Makes Purview Work Additive

We don’t replace Purview. We give it the data accuracy it needs to protect what matters.

Discover the Full Data Estate

Map everything — file servers, databases, M365, cloud — giving you a complete inventory before policies go live. Not just what Purview already knows about.

Classify with Multi-Factor Precision

Combine content type, context, data relationships, age, and format simultaneously. The benchmark proves the difference: 98.9% vs. 43% on payment cards; 100% vs. 1.9% on passports.

Feed Purview with Accurate Labels

Our classifications sync directly into Purview sensitivity labels. Purview enforces policy on verified, accurate data — not regex guesses.

Reduce DLP Noise Dramatically

Accurate input data means fewer false positives. Security teams stop drowning in alerts and start responding to real risk.

Deliver Granularity Purview Cannot

Every organization needs to understand how much sensitive data they have, where exactly it lives, and who has access. D&M provides that granular inventory — by location, file type, sensitivity level, age, and owner. Purview applies labels; it does not give you the data map underneath.

Cover the File Types Purview Doesn’t Support

Purview has a defined list of supported file types — files outside it are silently skipped. Common gaps include: legacy Office formats (.doc, .xls, .ppt), plain text and .csv files, archive containers (.zip, .rar — contents not inspected), .msg and .eml email files, and auto-labeling is largely restricted to modern Office files only. D&M classifies across all of these — sensitive data doesn’t get a free pass because of its file extension.

Keep Labels Current as Data Grows

Continuous scanning ensures Purview’s labels stay accurate as new data enters the estate — governance doesn’t decay over time.

Architecture Flow

🔍
D&M Scan
Full estate

🏷️
D&M Labels
Multi-factor

🛡️

Purview

Policy engine

✅

DLP Works

Real protection

Real-world benchmark results

Payment Card Detection

5,286 actual payment card files · stored in SharePoint · out-of-box Purview settings

Purview — Correctly Labeled43%

Purview — Not Labeled (missed)57%

Purview — False Positive Alerts26,413 files flagged

Purview flagged 26,413 files as containing payment card data. Actual payment card files: 5,286. Noise ratio: ~5:1.

Data & More — Correctly Classified98.9%

57%

Purview miss rate

98.9%

D&M accuracy

Passport Detection

53 actual passport files · stored in SharePoint · out-of-box Purview settings

Purview — Correctly Auto-Labeled1.9%

Purview correctly labeled 1 out of 53 passport files automatically. Reason: limited filetype support and regex-only detection.

Purview — Not Labeled (missed)98.1%

Purview — False Positive Alerts29,845 files flagged

Purview flagged 29,845 files as containing passport data. Actual passport files: 53. Noise ratio: ~563:1.

Data & More — Correctly Classified100%

98.1%

Purview miss rate

100%

D&M accuracy

Run the benchmark yourself — the dataset is public

The 89,521-file synthetic benchmark dataset (73 GB) is available on Hugging Face for free download. It was built from statistical analysis of 5+ PB of real-world data and reflects actual unstructured data distribution — without using any real personal information. Download it, run Purview against it, and compare the results to what we show here.

huggingface.co/datasets/kbillesk/synthetic-privacy

89,521

Total files

73 GB

Dataset size

100%

Manually validated

The executive ask

Don’t let Purview protect data it doesn’t know exists.

The benchmark shows it plainly: out-of-box Purview missed 98% of passport files and generated a 563:1 false alert ratio. Every day you run Purview without an accurate data foundation, sensitive data goes unclassified and your security posture is built on incomplete information. Data & More closes that gap — with numbers you can verify yourself.

4–6wk

Time to first
complete data map

5PB+

Real-world data
analyzed to date

89K

Benchmark files
free to download

info@dataandmore.com