SC401 - Trainable Classifiers

Introduction to Trainable Classifiers

Definition: Trainable classifiers are tools in Microsoft Purview that utilize machine learning to recognize sensitive data, mimicking human-like recognition patterns.
Purpose: They help identify and manage sensitive information based on examples provided by users, improving detection accuracy in varied scenarios.

Importance of Trainable Classifiers to SC 401

Relevance: Understanding trainable classifiers is essential for the Microsoft SC 401 exam.
Impact: They go beyond predefined sensitive information types, offering a solution tailored to specific customer data situations.

Creating a Trainable Classifier

Step-by-step process to create a trainable classifier:
- Access Microsoft Purview Compliance Portal:
- Navigate to: Solutions > Information Protection > Classifiers > Trainable Classifiers.
- View Built-in Classifiers:
- There will be a list of ready-to-use trainable classifiers for DLP policies, auto labeling, and insider risk management.
- Creating a Custom Classifier:
- Example: Custom classifier for detecting employee exit risks.
  - Purpose: Identify documents or communications indicating potential employee turnover (e.g., resignation letters, job interviews).
  - Supports: HR and security teams in managing insider risks effectively.
- Classifier Development Process:
- Create Trainable Classifier:
  - Click on "Create Trainable Classifier."
  - Enter a meaningful name and description.
  - Click "Next."
- Select Sample Content:
  - Positive Samples (from SharePoint, 50-500 files).
  - Click "Next."
  - Negative Samples (SharePoint sites with non-matching examples).
  - Click "Next."
- Review and Create:
  - Review settings and click "Create Trainable Classifier."
  - Click "Done."
- Post-creation:
- Newly created classifiers will appear under the training section for review.
- Training duration: A few hours required to learn from the samples provided.
- Upon completion, access statistics such as accuracy metrics, precision, and recall.
- Recommendations: Add more samples for refinement and consider periodic retraining to enhance classifier performance.
- Use Content Explorer to validate classifier efficiency.

Implementing Trainable Classifiers

Applications:
- Trainable classifiers can be integrated into various Microsoft Purview solutions, including:
- DLP (Data Loss Prevention) Policies.
- Auto Labeling processes.
- Insider Risk Management initiatives.
DLP Policies Setup:
- Navigate to: Solutions > DLP.
- Create a new DLP policy and configure rules:
- Name the rule (e.g., "Employee Exit Risk Classifier").
- Under conditions, add trainable classifiers (select custom or built-in classifiers).

Comparisons with Other Detection Methods

**Contextual Usage of Classifiers:
- Trainable Classifiers should be utilized when:**
- Detecting themes or concepts in unstructured text (e.g., emails/documents).
- Data is not easily definable by keywords or strict patterns.
- Sample documents exemplify the category (e.g., resignation letters).
- Document Fingerprinting:
- Use when specific documents/templates need protection.
- Ideal for detecting exact or near-exact copies of a known document structure (e.g., NDAs, invoices).
- Exact Data Match (EDM):
- Suitable for identifying structured sensitive data from databases or CSV files.
- When the dataset includes structured identifiers such as names or account numbers.
- Custom Sensitive Information Types (SITs):
- Chosen for specific patterns and validation checks not covered by built-in types.
- Offers granular detection logic control (e.g., internal employee ID formats).

Decision Guide for SC 401 Exam

When to use each detection method, keeping in mind scenarios and requirements:
- Use Trainable Classifiers when contextual understanding is necessary.
- Use Document Fingerprinting for protection of standard templates.
- Use EDM for structured and sensitive data detection needing high accuracy.
- Use Custom SITs for specialized pattern detection beyond built-in capabilities.

Conclusion and Further Learning

Trainable classifiers represent a significant advancement in data protection capabilities within Microsoft Purview, emphasizing adaptability and efficiency.
For deeper insights and practical examples, refer to the linked resources or previous videos covering trainable classifiers in detail.