SC401 - Trainable Classifiers

Introduction to Trainable Classifiers

  • Definition: Trainable classifiers are tools in Microsoft Purview that utilize machine learning to recognize sensitive data, mimicking human-like recognition patterns.
  • Purpose: They help identify and manage sensitive information based on examples provided by users, improving detection accuracy in varied scenarios.

Importance of Trainable Classifiers to SC 401

  • Relevance: Understanding trainable classifiers is essential for the Microsoft SC 401 exam.
  • Impact: They go beyond predefined sensitive information types, offering a solution tailored to specific customer data situations.

Creating a Trainable Classifier

  • Step-by-step process to create a trainable classifier:
    • Access Microsoft Purview Compliance Portal:
    • Navigate to: Solutions > Information Protection > Classifiers > Trainable Classifiers.
    • View Built-in Classifiers:
    • There will be a list of ready-to-use trainable classifiers for DLP policies, auto labeling, and insider risk management.
    • Creating a Custom Classifier:
    • Example: Custom classifier for detecting employee exit risks.
      • Purpose: Identify documents or communications indicating potential employee turnover (e.g., resignation letters, job interviews).
      • Supports: HR and security teams in managing insider risks effectively.
    • Classifier Development Process:
    • Create Trainable Classifier:
      • Click on "Create Trainable Classifier."
      • Enter a meaningful name and description.
      • Click "Next."
    • Select Sample Content:
      • Positive Samples (from SharePoint, 50-500 files).
      • Click "Next."
      • Negative Samples (SharePoint sites with non-matching examples).
      • Click "Next."
    • Review and Create:
      • Review settings and click "Create Trainable Classifier."
      • Click "Done."
    • Post-creation:
    • Newly created classifiers will appear under the training section for review.
    • Training duration: A few hours required to learn from the samples provided.
    • Upon completion, access statistics such as accuracy metrics, precision, and recall.
    • Recommendations: Add more samples for refinement and consider periodic retraining to enhance classifier performance.
    • Use Content Explorer to validate classifier efficiency.

Implementing Trainable Classifiers

  • Applications:
    • Trainable classifiers can be integrated into various Microsoft Purview solutions, including:
    • DLP (Data Loss Prevention) Policies.
    • Auto Labeling processes.
    • Insider Risk Management initiatives.
  • DLP Policies Setup:
    • Navigate to: Solutions > DLP.
    • Create a new DLP policy and configure rules:
    • Name the rule (e.g., "Employee Exit Risk Classifier").
    • Under conditions, add trainable classifiers (select custom or built-in classifiers).

Comparisons with Other Detection Methods

  • **Contextual Usage of Classifiers:
    • Trainable Classifiers should be utilized when:**
    • Detecting themes or concepts in unstructured text (e.g., emails/documents).
    • Data is not easily definable by keywords or strict patterns.
    • Sample documents exemplify the category (e.g., resignation letters).
    • Document Fingerprinting:
    • Use when specific documents/templates need protection.
    • Ideal for detecting exact or near-exact copies of a known document structure (e.g., NDAs, invoices).
    • Exact Data Match (EDM):
    • Suitable for identifying structured sensitive data from databases or CSV files.
    • When the dataset includes structured identifiers such as names or account numbers.
    • Custom Sensitive Information Types (SITs):
    • Chosen for specific patterns and validation checks not covered by built-in types.
    • Offers granular detection logic control (e.g., internal employee ID formats).

Decision Guide for SC 401 Exam

  • When to use each detection method, keeping in mind scenarios and requirements:
    • Use Trainable Classifiers when contextual understanding is necessary.
    • Use Document Fingerprinting for protection of standard templates.
    • Use EDM for structured and sensitive data detection needing high accuracy.
    • Use Custom SITs for specialized pattern detection beyond built-in capabilities.

Conclusion and Further Learning

  • Trainable classifiers represent a significant advancement in data protection capabilities within Microsoft Purview, emphasizing adaptability and efficiency.
  • For deeper insights and practical examples, refer to the linked resources or previous videos covering trainable classifiers in detail.