SC401 - Trainable Classifiers
Introduction to Trainable Classifiers
- Definition: Trainable classifiers are tools in Microsoft Purview that utilize machine learning to recognize sensitive data, mimicking human-like recognition patterns.
- Purpose: They help identify and manage sensitive information based on examples provided by users, improving detection accuracy in varied scenarios.
Importance of Trainable Classifiers to SC 401
- Relevance: Understanding trainable classifiers is essential for the Microsoft SC 401 exam.
- Impact: They go beyond predefined sensitive information types, offering a solution tailored to specific customer data situations.
Creating a Trainable Classifier
- Step-by-step process to create a trainable classifier:
- Access Microsoft Purview Compliance Portal:
- Navigate to: Solutions > Information Protection > Classifiers > Trainable Classifiers.
- View Built-in Classifiers:
- There will be a list of ready-to-use trainable classifiers for DLP policies, auto labeling, and insider risk management.
- Creating a Custom Classifier:
- Example: Custom classifier for detecting employee exit risks.
- Purpose: Identify documents or communications indicating potential employee turnover (e.g., resignation letters, job interviews).
- Supports: HR and security teams in managing insider risks effectively.
- Classifier Development Process:
- Create Trainable Classifier:
- Click on "Create Trainable Classifier."
- Enter a meaningful name and description.
- Click "Next."
- Select Sample Content:
- Positive Samples (from SharePoint, 50-500 files).
- Click "Next."
- Negative Samples (SharePoint sites with non-matching examples).
- Click "Next."
- Review and Create:
- Review settings and click "Create Trainable Classifier."
- Click "Done."
- Post-creation:
- Newly created classifiers will appear under the training section for review.
- Training duration: A few hours required to learn from the samples provided.
- Upon completion, access statistics such as accuracy metrics, precision, and recall.
- Recommendations: Add more samples for refinement and consider periodic retraining to enhance classifier performance.
- Use Content Explorer to validate classifier efficiency.
Implementing Trainable Classifiers
- Applications:
- Trainable classifiers can be integrated into various Microsoft Purview solutions, including:
- DLP (Data Loss Prevention) Policies.
- Auto Labeling processes.
- Insider Risk Management initiatives.
- DLP Policies Setup:
- Navigate to: Solutions > DLP.
- Create a new DLP policy and configure rules:
- Name the rule (e.g., "Employee Exit Risk Classifier").
- Under conditions, add trainable classifiers (select custom or built-in classifiers).
Comparisons with Other Detection Methods
- **Contextual Usage of Classifiers:
- Trainable Classifiers should be utilized when:**
- Detecting themes or concepts in unstructured text (e.g., emails/documents).
- Data is not easily definable by keywords or strict patterns.
- Sample documents exemplify the category (e.g., resignation letters).
- Document Fingerprinting:
- Use when specific documents/templates need protection.
- Ideal for detecting exact or near-exact copies of a known document structure (e.g., NDAs, invoices).
- Exact Data Match (EDM):
- Suitable for identifying structured sensitive data from databases or CSV files.
- When the dataset includes structured identifiers such as names or account numbers.
- Custom Sensitive Information Types (SITs):
- Chosen for specific patterns and validation checks not covered by built-in types.
- Offers granular detection logic control (e.g., internal employee ID formats).
Decision Guide for SC 401 Exam
- When to use each detection method, keeping in mind scenarios and requirements:
- Use Trainable Classifiers when contextual understanding is necessary.
- Use Document Fingerprinting for protection of standard templates.
- Use EDM for structured and sensitive data detection needing high accuracy.
- Use Custom SITs for specialized pattern detection beyond built-in capabilities.
Conclusion and Further Learning
- Trainable classifiers represent a significant advancement in data protection capabilities within Microsoft Purview, emphasizing adaptability and efficiency.
- For deeper insights and practical examples, refer to the linked resources or previous videos covering trainable classifiers in detail.