SC401 - Document Fingerprinting
Introduction to Document Fingerprinting
- Document fingerprinting is a feature of Microsoft Purview.
- Described as creating a sensitive information type based on provided standard forms, thereby simplifying the process of protecting sensitive information within organizational standard forms.
- A previous video on custom sensitive info types covers relevant background information and is still applicable due to primarily unchanged principles.
- Recommended to view prior content for additional context.
Document Fingerprinting Overview
- Definition: A Microsoft Purview feature that generates a unique fingerprint of a document based on its text and structure.
- Purpose: To identify and protect standardized forms (templates) throughout an organization, particularly those that have fixed layouts containing sensitive data.
- Content Types: Best applicable for semi-structured or unstructured data, such as:
- Tax documents
- Contracts
- Invoices
- Process:
- Upload a sample document and create a fingerprint.
- System detects similar documents even if content varies (e.g., names, dates).
- Example Use Case: Prevent unauthorized sharing of sensitive company documents such as NDA templates or tax forms.
- Custom Sensitive Information Types:
- Ideal for structured data patterns not covered by built-in types (e.g., employee IDs, invoices).
- Utilizes regular expressions and keyword proximity for sensitivity detection.
- For instance, to detect a customer ID that follows the format "custo123456".
- Document Fingerprinting:
- Focuses on protecting exact documents or templates with a defined structure, rather than patterns.
- Ideal for preventing data leakage across the organization using specific document templates.
- Detection relies upon document structure and content rather than just keywords.
Implementation Process in the New Purview Portal
- Access Steps:
- Navigate to
purview.microsoft.com with necessary permissions (ideally as a global admin). - Go to Solutions > Information Protection > Classifiers > Sensitive Information Types.
- Create a fingerprint-based sensitive information type by uploading the desired document.
- Fingerprint Specifications:
- Upload must have a minimum of 4,000 characters.
- Review and finish the setup post-upload.
Previous Examples and Current Utilization
- An example fingerprint for CVs (Curriculum Vitae) was previously created which persists in the system.
- Users can modify existing sensitive information types and check for their presence in the portal.
- Document matches can be tracked via matched items within the portal, revealing documents such as those uploaded in SharePoint.
Dynamic Link with DLP Policies
- Document fingerprinting is integrated with Data Loss Prevention (DLP) policies.
- A policy example includes conditions set to notify users and send alerts to administrators upon detection:
- Applied across various platforms (e.g., Exchange email, SharePoint, Teams).
- Conditions evaluate content against defined fingerprint matching rules for sensitivity.
- Notifications can include policy tips for users or incident reports for admins.
User Feedback and Policy Tip Behavior
- User experience may vary, including a warning triangle indicating potential sensitive information on files.
- Ongoing adjustments may be required in DLP policies to optimize notification clarity for users.
Key Takeaways and Future Directions
- The importance of distinguishing between document fingerprinting and custom sensitive info types in exam contexts.
- Upcoming discussions to include exact data match classifiers and trainable classifiers.
- Encouragement to explore PowerShell usage for testing and editing document fingerprints and DLP setup.
- Recognition of the critical need to read exam questions carefully as options vary subtly.
Conclusion
- The tutorial on document fingerprinting emphasizes comparison with other tools.
- Viewers are motivated to better understand features such as exact data match classifiers that will be covered in future content.
- Request for viewer feedback and continued engagement through subscriptions and memberships, hinting at further lessons and resources about Microsoft Purview.