1/9
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
What Spark properties can be configured at the Environment level?
spark.executor.memory, spark.executor.cores, spark.driver.memory, spark.sql.shuffle.partitions, spark.sql.adaptive.enabled, and other Spark config keys.
How do you set spark.executor.memory in Fabric?
In the Environment's Spark Properties section, add the property key spark.executor.memory with the desired value (e.g., 8g). Or use %%configure in a notebook.
What is the %%configure magic command?
A notebook magic command that sets Spark session configuration before the session starts. It overrides Environment-level Spark properties for that session only.
What is the default number of shuffle partitions in Spark?
200 (spark.sql.shuffle.partitions). This can be tuned based on data volume; fewer for small data, more for large data.
How does spark.executor.cores affect Spark performance?
It determines how many tasks each executor runs in parallel. More cores per executor means more parallelism but requires more memory per executor.
What is Adaptive Query Execution (AQE) in Spark?
A Spark feature (spark.sql.adaptive.enabled) that dynamically optimizes query plans at runtime, including coalescing shuffle partitions and handling data skew.
How do Environment Spark properties interact with workspace-level settings?
Environment-level settings override workspace defaults. The precedence is: session (%%configure) > Environment > Workspace defaults.
What is spark.driver.memory and why does it matter?
Memory allocated to the Spark driver process which coordinates job execution. Too little can cause OutOfMemory errors when collecting large results or broadcasting data.
Can you set Spark properties dynamically based on the workload?
Yes, using %%configure in notebooks or by parameterizing pipeline notebook activities to pass different Spark config values.
What does spark.sql.files.maxPartitionBytes control?
The maximum size of a partition when reading files. Default is 128MB. Tuning it affects the number of tasks created when reading large files.