Data_Engineering_Part4
Page 1: Spark Job Definition Inline Monitoring
The Spark job definition Inline Monitoring feature allows users to:
View Spark job submission and run status in real-time.
Inspect past runs and configurations of Spark job definitions.
Users can navigate to the Spark application detail page for additional specifics.
Pipeline Spark Activity Inline Monitoring
Deep links have been integrated into the Notebook and Spark job activities within the Pipeline.
Users can:
View execution details of Spark applications.
Access snapshots from the respective Notebook and Spark job definitions.
Retrieve Spark logs for troubleshooting.
If Spark activities fail, inline error messages will be displayed.
Next Steps for Users
Use the Apache Spark advisor for real-time advice on notebooks.
Browse recent Spark application runs in the Fabric monitoring hub.
Monitor Spark jobs through notebooks and monitor capacity consumption.
Utilize the extended Apache Spark history server for debugging and diagnosing applications.
Page 2: Apache Spark Run Series Analysis
Overview
The Apache Spark run series feature is available for Spark version 3.4 and above.
It categorizes Spark applications based on:
Recurring pipeline activities.
Manual notebook runs.
Spark job runs from the same notebook or job definition.
Key Features of Run Series Analysis
Autotune Analysis:
Compare autotune outcomes and performance metrics across runs.
Run Series Comparison:
Evaluate run durations with past performances and data input/output.
Outlier Detection:
Identify and analyze outliers in performance data.
Detailed Run Instance View:
Provide granular details for individual runs to notice performance bottlenecks.
Usage Recommendations
Employ this feature for performance tuning, particularly if:
You’re analyzing production job health.
Optimizing long-running jobs.
Page 3: Examples of Run Series Analysis
Visual Representation
Each run instance is depicted with a vertical bar in graphs indicating duration.
Red bars signal anomalies detected in specific run instances.
Detailed Run Instance Information
Users can:
Zoom in/out for specific time windows.
Access metrics such as:
Duration trends.
Average durations and expected performance.
Page 4: Related Content
Utilize Apache Spark advisor for advisory within notebooks.
Navigate to monitoring hub and view recent Spark applications.
Page 5: Apache Spark Advisor
Functionality
This advisor analyzes commands and provides real-time advice for Notebook runs.
Offers built-in patterns to help avoid common mistakes, focusing on:
Code optimization.
Error analysis to find the root causes of failures.
Built-in Advice Examples
Caching advice before using
randomSplitto avoid inconsistent results.Warnings about naming conflicts between views and tables.
Page 6: Error Messages and Recommendations
Key Error Messages
Handling issues with unexpected queries or hints.
Suggestions on enabling configurations to enhance performance and reduce errors.
Page 7: The User Experience with Spark Advisor
Real-time Feedback
The Spark advisor displays advice as users execute commands, allowing immediate insights into potential issues.
Categories of assistance include Info, Warning, and Error messages, which can be viewed directly in notebook cells.
Page 8: Error Handling
Spark Advisor Settings:
Users can opt to show/hide specific diagnostics.
Management of diagnostic messages between user sessions is customizable.
Page 9: Feedback and Community Interaction
User feedback options present for improving the guidance offered in Spark environments.
Page 10: Monitoring Hub Overview
Functionality of the Monitoring Hub
Centralized portal to view ongoing Apache Spark application activities triggered from various sources.
Search and filter applications based on various criteria, including submitter, status, and item type.
Actions available
Cancel in-progress applications.
View detailed execution metrics for Spark applications.
Page 11: Usability Improvements
Customization Options
Users can sort and filter applications in the Monitoring Hub based on multiple parameters.
Page 12: Overview of Spark Job Definitions
Users should navigate through the Monitoring Hub to access recent runs of their Spark job definitions and applications.
Page 13-14: Upstream View for Pipelines
If scheduled jobs run in pipelines, both pipeline activities and upstream activities can be viewed for better understanding of workflows.
Page 15: Monitoring Spark Applications
Overview of submission statuses and Spark application management techniques are provided for users managing workload scheduling.
... (Continued page-wise outlining following the same structure)