SC401 - Document Fingerprinting

Introduction to Document Fingerprinting

  • Document fingerprinting is a feature of Microsoft Purview.
    • Described as creating a sensitive information type based on provided standard forms, thereby simplifying the process of protecting sensitive information within organizational standard forms.
  • A previous video on custom sensitive info types covers relevant background information and is still applicable due to primarily unchanged principles.
  • Recommended to view prior content for additional context.

Document Fingerprinting Overview

  • Definition: A Microsoft Purview feature that generates a unique fingerprint of a document based on its text and structure.
  • Purpose: To identify and protect standardized forms (templates) throughout an organization, particularly those that have fixed layouts containing sensitive data.
  • Content Types: Best applicable for semi-structured or unstructured data, such as:
    • Tax documents
    • Contracts
    • Invoices
  • Process:
    • Upload a sample document and create a fingerprint.
    • System detects similar documents even if content varies (e.g., names, dates).
  • Example Use Case: Prevent unauthorized sharing of sensitive company documents such as NDA templates or tax forms.

Comparison: Document Fingerprinting vs Custom Sensitive Information Types

  • Custom Sensitive Information Types:
    • Ideal for structured data patterns not covered by built-in types (e.g., employee IDs, invoices).
    • Utilizes regular expressions and keyword proximity for sensitivity detection.
    • For instance, to detect a customer ID that follows the format "custo123456".
  • Document Fingerprinting:
    • Focuses on protecting exact documents or templates with a defined structure, rather than patterns.
    • Ideal for preventing data leakage across the organization using specific document templates.
    • Detection relies upon document structure and content rather than just keywords.

Implementation Process in the New Purview Portal

  • Access Steps:
    1. Navigate to purview.microsoft.com with necessary permissions (ideally as a global admin).
    2. Go to Solutions > Information Protection > Classifiers > Sensitive Information Types.
    3. Create a fingerprint-based sensitive information type by uploading the desired document.
  • Fingerprint Specifications:
    • Upload must have a minimum of 4,000 characters.
    • Review and finish the setup post-upload.

Previous Examples and Current Utilization

  • An example fingerprint for CVs (Curriculum Vitae) was previously created which persists in the system.
  • Users can modify existing sensitive information types and check for their presence in the portal.
  • Document matches can be tracked via matched items within the portal, revealing documents such as those uploaded in SharePoint.

Dynamic Link with DLP Policies

  • Document fingerprinting is integrated with Data Loss Prevention (DLP) policies.
  • A policy example includes conditions set to notify users and send alerts to administrators upon detection:
    • Applied across various platforms (e.g., Exchange email, SharePoint, Teams).
    • Conditions evaluate content against defined fingerprint matching rules for sensitivity.
    • Notifications can include policy tips for users or incident reports for admins.

User Feedback and Policy Tip Behavior

  • User experience may vary, including a warning triangle indicating potential sensitive information on files.
  • Ongoing adjustments may be required in DLP policies to optimize notification clarity for users.

Key Takeaways and Future Directions

  • The importance of distinguishing between document fingerprinting and custom sensitive info types in exam contexts.
  • Upcoming discussions to include exact data match classifiers and trainable classifiers.
  • Encouragement to explore PowerShell usage for testing and editing document fingerprints and DLP setup.
  • Recognition of the critical need to read exam questions carefully as options vary subtly.

Conclusion

  • The tutorial on document fingerprinting emphasizes comparison with other tools.
  • Viewers are motivated to better understand features such as exact data match classifiers that will be covered in future content.
  • Request for viewer feedback and continued engagement through subscriptions and memberships, hinting at further lessons and resources about Microsoft Purview.