Execution Plans for Common T-SQL Statements

Stored Procedures

  • Stored procedures can contain one or many individual T-SQL statements; each optimised statement may appear as a separate execution plan.

  • For multi-statement procedures, SSMS shows one plan per optimised statement;

    • DDL such as Create Table and control statements such as SET NO COUNT do not generate execution plans because they require no optimisation.

    • In the sample procedure (Sales.TaxRateByState) five T-SQL statements were present, yet only three produced plans.

  • Practical guidance:

    • The more statements/loops a procedure holds, the more plans may be generated; hundreds of plans can overload SSMS.
      • Use Estimated Plans, Filtered Extended Events, or SET STATISTICS XML toggling to minimise overhead.

    • Estimated vs. actual relative cost is reported at the top of each plan as Query cost (relative to the batch); treat percentages cautiously and always compare estimated vs. actual row counts.

Example procedure – Sales.TaxRateByState

  • Goal: return sales‐order subtotals/taxes for states whose tax rate < 7.57.5 within a supplied country.

  • Steps executed inside the procedure:

    1. CREATE TABLE #TaxRateByState – temp table for selected tax rates.

    2. INSERT rows from Sales.SalesTaxRatePerson.StateProvince, filtered by @CountryRegionCode.

    3. DELETE rows in temp table where TaxRate < 7.5.

    4. SELECT final result set joining sales orders, territories, states and the temp table.

  • Data-type note: parameter declared as NVARCHAR(3) because the underlying column is NVARCHAR(3); CHAR(3) would be more efficient but would trigger implicit conversions.

Execution-plan overview (Figure 7-1)
  • Query 1: Populate temp table3%3\% of batch cost.

  • Query 2: Delete low tax-rate rows1%1\% of batch cost.

  • Query 3: Final SELECT96%96\% of batch cost and focus of tuning.

Parameter sniffing insight (Figure 7-2 / 7-3)
  • Each compiled statement stores

    • Parameter Compiled Value – the value used during compilation (e.g. NUSN'US').

    • Parameter Runtime Value – the value used during the current execution.

  • Steps at runtime:

    1. Batch is compiled; @CountryRegionCode for the outer EXEC is NUSN'US'.

    2. Engine checks plan cache; if procedure plan missing it compiles and sniffs the parameter.

    3. Subsequent executions may reuse the plan even when runtime values differ.

    • Risk: if the compiled value’s selectivity is atypical, the reused plan may yield poor performance (see Chapter 8 on selectivity).

Detailed operators in Query 3 (Figure 7-4)
  • Temp table scan (5 rows each executing separately) → outer input of Nested Loops.

  • Inner input: Index Seek on StateProvince → returns matching state rows.

  • Output feeds another Nested Loops; inner input: Key Lookup on StateProvince (fetching missing TerritoryID).

  • Result becomes Build input to a Hash Match (join on TerritoryID).
    • Build side = 5 rows; Probe side = Clustered Index Seek on SalesOrderHeader (31 465 rows).
    • Hash table built in memory, matching returns 23 752 rows.

  • Takeaway: procedure plans are regular execution plans; identify the costly section and apply the usual optimisation techniques (indexes, rewrites, statistics, etc.).

Subqueries

  • A subquery is a SELECT embedded inside another SELECT/INSERT/UPDATE/DELETE.
    • Can appear in WHERE, SELECT, FROM, etc.
    • May be correlated (references outer query columns) or non-correlated.

  • Pros: expressive, sometimes clearer than complex joins.

  • Cons: may execute repeatedly (one execution per outer row) and prevent set-based optimisations.

Correlated subquery example (Listing 7-3)

SELECT p.Name, p.ProductNumber, ph.ListPrice
FROM Production.Product            AS p
INNER JOIN Production.ProductListPriceHistory AS ph
       ON p.ProductID = ph.ProductID
      AND ph.StartDate = (
             SELECT TOP (1) ph2.StartDate
             FROM Production.ProductListPriceHistory AS ph2
             WHERE ph2.ProductID = p.ProductID
             ORDER BY ph2.StartDate DESC);
  • Objective: for each product return only the most recent price.

  • Subquery is correlated through ph2.ProductID = p.ProductID.
    • Executes once per outer row to fetch latest StartDate.

Execution-plan outline (Figure 7-5)
  • Two Clustered Index Scans:
    Production.Product (outer input).
    Production.ProductListPriceHistory (inner input).

  • Inputs feed a Merge Join on ProductID.
    • Ordered property for both scans = True (prerequisite for merge join).
    • The optimizer could not transform the query into a more efficient form (e.g. window functions, lateral joins) in this example, so full scans remain.

Performance considerations & alternatives
  • Correlated subqueries on “versioned” history tables are common but may be sub-optimal.
    • Each product may require walking the entire history.

  • Potential rewrites:

    • Use CROSS APPLY or OUTER APPLY with TOP (1) ... ORDER BY to allow index seeks.

    • Employ window functions: ROW_NUMBER() partitioned by ProductID ordered by StartDate DESC inside a CTE/derived table, then filter on ROW_NUMBER = 1.

    • Add/support indexes on (ProductID, StartDate DESC) to make the seek pattern efficient.

Real-world relevance & best practices

  • Many production procedures grow organically; temporary tables, subqueries, and non-ideal datatypes are common.

  • Knowing how SQL Server generates execution plans lets you
    • Diagnose hotspots quickly without rewriting entire modules.
    • Decide whether a small fix (index, hint) or a full rewrite (set-based logic, elimination of temp objects) yields the best ROI.

  • Ethical/practical implication: touching legacy code requires balancing performance gains with stability and maintainability; sometimes “good enough” tuning is preferable to risky refactors.

Quick reference – numbers & statistics mentioned

  • Temp table rows after delete: 55

  • SalesOrderHeader rows probed: 3146531\,465

  • Rows returned after hash join: 2375223\,752

  • Relative batch costs: Query 1 3%3\%, Query 2 1%1\%, Query 3 96%96\%

Ordered Retrieval & Merge Join Basics

• Two Clustered Index Scans are marked Ordered because of the clustered index on (ProductID) – the engine therefore uses the Ordered retrieval method.
Merge Join exploits this ordering:
• Reads two ordered inputs simultaneously, locating matching rows on the join column.
• In the current example it emits 395395 joined rows (matching list-price rows).
• Atypical distribution noted: Products table contains 504504 rows, but only 395395 have list-price history – ideally every product would have ≥1 price entry.

Nested Loops Join – Row-by-Row Inner Execution

• Those 395395 Merge-Join rows feed the outer input of a Nested Loops (NL) join.
Outer References: values from ProductID and StartDate are pushed to the inner input.
• Inner input = Clustered Index Seek on ProductListPriceHistory(ProductID, StartDate)
• A TOP (1)…ORDER BY StartDate DESC guarantees only the most-recent price per product.
• A Filter then accepts/rejects the single candidate row according to StartDate equality.
• Costly behaviour:
• Filter (and its child operators) executes 395395 times.
• NL only needs to know whether a row exists, so the Filter returns an empty row shell – no columns, just row-presence metadata.
Startup Expression = False ➔ children execute on every outer row; had it been Startup Expression Predicate, children would run only when the predicate is met.
• Hypothetical illustration of the scan penalty:
• Suppose 200200 products × 1515 prices ⇒ 30003000 rows post-Merge-Join.
• NL would perform 30003000 inner seeks, each reading the same 200200 rows repeatedly ⇒ very high logical reads.

Tuning Idea 1 – Replace TOP(1)…ORDER BY with MAX()

• Query rewrite: SELECT MAX(ph2.StartDate)…
• Eliminates the TOP + NL pattern.
• Plan changes to two Merge Joins and shows lower I/O.

Derived Tables & APPLY – Fundamentals

• A derived table is a sub-query in the FROM clause, named with an alias; logically a “virtual table”.
• Before SQL Server 2005 all derived tables had to be independent; 2005 introduced APPLY to allow correlation:
CROSS APPLY – like an inner join, discards rows when right side returns nothing.
OUTER APPLY – like a left outer join, returns NULLs for missing right-side rows.
• Logical definition: right-side sub-query is evaluated once per left-side row (though the optimizer may implement differently).
• MSDN reference: http://bit.ly/1FFmldl

Tuning Idea 2 – CROSS APPLY Version (Listing 7-4)

SELECT p.Name,
       p.ProductNumber,
       ph.ListPrice
FROM   Production.Product p
       CROSS APPLY (
           SELECT TOP (1) ph2.ProductID,
                           ph2.ListPrice
           FROM   Production.ProductListPriceHistory ph2
           WHERE  ph2.ProductID = p.ProductID
           ORDER  BY ph2.StartDate DESC
       ) AS ph;

• Plan (Figure 7-9):

• Outer input: Clustered Index Scan on Products (produces 504504 rows).
• Inner input: Seek + TOP (1) on ProductListPriceHistory executed 504504 times.

Comparing I/O – Subquery (Listing 7-3) vs APPLY (Listing 7-4)

• Captured with Extended Events (Listing 2-6) – execution-plan capture disabled to avoid observer effect.
• Logical reads:
• Subquery in the JOIN: 811811 reads.
• CROSS APPLY version: 10241024 reads.
• Reason: NL inner executes 395395 times for the subquery vs 504504 times for APPLY.
• Note: This sample’s data distribution is unusual (many products lack prices). In a typical distribution (each product has many price rows) APPLY is expected to win.

Subquery Plan Changes (Figure 7-11)

Filter removed.
TOP operator now appears on the outer (parent) branch instead of the subquery branch.
• Three Clustered Index Seeks + two Nested Loops:
• Right-most Seek (alias ph2) supplies the subquery’s most recent StartDate.
• Second Seek joins on both ProductID + StartDate pushed as Outer References.
• Each NL inner now executes once (outer input only 1 row) – huge I/O reduction.

APPLY Plan Changes (Figure 7-12)

• Plan identical, except outer Clustered Index Scan → Seek due to the predicate.

New I/O Metrics (Figure 7-13)

• APPLY query: 44 logical reads.
• Subquery query: 66 logical reads.
• APPLY consistently faster in repeated runs; advantage grows with larger data sets.

Key Take-aways on Plan Choice

• Always benchmark with representative data; small anomalies in distribution can sway the optimizer’s plan selection.
• Re-evaluate queries periodically as data grows or skews – plan & performance can change.
• Extended Events give low-overhead runtime metrics; avoid plan capture when measuring pure duration.

Common Table Expressions (CTEs) – Overview

  • Introduced in SQL Server 2005.

  • Behave like derived tables but

    • Exist only for the scope of a single statement.

    • Can be self-referential and reused multiple times in that statement.

    • Cannot be correlated (even with APPLY).

Recursive CTE Anatomy

• Composed of at least two queries combined with UNION ALL:

  1. Anchor member – the base result set. (can be executed on its own to produce a result)

  2. Recursive member – references the CTE itself, generating additional rows.
    • Logical execution: anchor ➔ feed rows to recursive member ➔ recursive member feeds itself until no more rows.

Example – dbo.uspGetEmployeeManagers

• Purpose: list a given employee and all managerial ancestors.

• Optimizer usually produces a Recursive Union operator followed by Nested Loops or Index Seeks for each iteration.

  • Ex: EXEC dbo.uspGetEmployeeManagers @BusinessEntityID = 9

Practical / Ethical / Operational Implications

Operational: High logical reads = buffer-pool pressure, longer execution time, risk of blocking; plan tuning mitigates these.
Ethical: Running ill-performing queries on production systems wastes shared resources, potentially impacting SLAs and user experience.
Practical: Choose query constructs (TOP vs MAX, subquery vs APPLY, CTE vs temp table) based on data size, distribution, and required use of intermediate results.

Operator Walk-Through (Logical Left-to-Right)

  • Index Spool (Lazy) – NodeID 44; first operator logically encountered.

    • Stores rows from Concatenation; acts as start of recursion.

    • Streams: requests 1 row from child, stores, immediately passes to parent.

  • Concatenation – Implements UNION ALLUNION\ ALL.

    • Top input = Anchor member.

    • Bottom input = Recursive member.

    • Processes top until exhausted, then switches permanently to bottom.

Anchor Member Sub-Plan (Top Input)

  • 2 Clustered Index Seeks

    • Tables: HumanResources.EmployeeHumanResources.Employee and Person.PersonPerson.Person.

  • Nested Loops join combines the two seeks.

  • Returns 1 row (employee ID=9ID = 9).

  • 2 Compute Scalar operators

    • Initialize RecursionLevel=0RecursionLevel = 0 and derived column RecursionLevelRecursionLevel.

Recursive Member Sub-Plan (Bottom Input)

  • Table Spool – NodeID 1414; Primary Node ID = 44.

    • Property With Stack = True ⇒ behaves as stack (LIFO) – critical for recursion logic.

    • Reads a copy of each row stored by Index Spool, removes it after read.

  • Assert

    • Enforces MAXRECURSION (25)MAXRECURSION\ (25); aborts execution if level exceeded.

  • Compute Scalar next to Table Spool

    • Calculates new recursion level: RecursionLevel=RecursionLevel+1RecursionLevel = RecursionLevel + 1.

  • Outer input (from Table Spool) joins via Nested Loops to Employee and Person tables.

    • Join predicate uses GetAncestor(1)GetAncestor(1) on OrganizationNode.

  • Clustered Index Scan – Employee

    • Estimated executions 44; estimated rows/exec 2902904×290=11604 \times 290 = 1160 rows.

    • Actual rows =1160= 1160 (estimate matched).

  • Filter

    • Performs comparison OrganizationNode=GetAncestor()OrganizationNode = GetAncestor(…) after full table scan.

    • Returns 1 manager row for first 3 iterations; returns 0 on CEO row ⇒ recursion ends.

Row Flow Summary
  • Concatenation outputs 44 rows (1 anchor + 3 recursive).

  • Final extra join to grab manager names drops CEO row (no manager) ⇒ 3 rows returned to client.

Conceptual Insights & Best Practices

  • Index Spool + Table Spool (stack) is SQL Server’s internal mechanism for recursive CTE evaluation.

  • Assert operator is the physical manifestation of MAXRECURSIONMAXRECURSION.

  • Reading plans left-to-right (logical ordering) is essential for understanding recursion flow.

Views in Execution Plans

  • A view = stored query; appears table-like but optimizer never “sees” it as a physical object.

    • During binding (Chapter 1), algebrizer expands the view into its base query.

  • Developers often misuse views by joining view-to-view or nesting views ⇒ optimizer overwhelmed.

    • Recognized as a code smell; avoid!

Example – Standard View (Sales.vIndividualCustomer)

  • Query 1 – SELECT * FROM Sales.vIndividualCustomer WHERE businessEntityId = 8743 (Listing 7-18)

    • Forces retrieval of every column; optimizer must access 8 tables / 7 joins defined in the view.

    • Plan in Figure 7-19: many operators, complex shape.

  • Query 2 – Column list (Listing 7-9)

    • Requests only BusinessEntityID, Title, LastName, FirstName

    • Simplification phase removes unnecessary tables/columns.

    • Generates much smaller plan (Figure 7-21).

    • Simplification not perfect – e.g., EmailAddress table still present even though not strictly needed.

SELECT ic.BusinessEntityID,
 ic.Title,
 ic.LastName,
 ic.FirstName
FROM Sales.vIndividualCustomer AS ic
WHERE BusinessEntityID = 8743;
Key Takeaways for Standard Views
  • Selecting fewer columns can dramatically change the plan due to simplification.

  • Views improve developer productivity but do not reduce optimizer work; complexity is merely hidden.

Indexed (Materialized) Views

  • Definition: View + Clustered Index ⇒ physically stores data like a table.

  • Pros

    • Query performance boost: joins & lookups pre-computed, direct access via clustered index.

    • Particularly beneficial for aggregations & complex joins.

  • Cons / Costs

    • Creation cost is high (one-time) – should be scheduled during low-load periods.

    • Maintenance overhead on DML (INSERT/UPDATE/DELETE) of base tables.

    • Example: If a base table receives 100100 inserts/min, each insert propagates into indexed view.

    • Evaluate trade-off: static base tables → low overhead; high-churn tables → potential performance hit.

Cross-Chapter Connections

  • Spool behavior discussed in Chapter 5.

  • Concatenation operator details covered in Chapter 4.

  • Binding & algebrizer expansion of views explained in Chapter 1.

Practical & Ethical Implications

  • Misusing views or over-nesting can mask inefficiencies, leading to production slowdowns – ethical obligation to write transparent, maintainable code.

  • Indexed views must be designed with full understanding of workload to avoid unnecessary resource consumption.

Indexed Views: Rationale & Advantages

  • Purpose of indexed views
    • Pre-aggregate or pre-join data once, at index-creation time, so subsequent queries can read pre-materialized results.
    • Eliminates repeated GROUP BY or aggregation logic on every execution.
    • Delivers substantial I/O savings because the data is already summarized.

  • Key benefit sequence
    • Create view WITH “SCHEMABINDING”
    • This clause binds the view to the schema of the underlying base tables, preventing the base tables from being dropped or modified in a way that would affect the view definition. It's a mandatory requirement for creating clustered indexes on views.
    • Create unique clustered index
    • This step materializes the view, meaning the results of the view's query are stored physically on disk, similar to a regular table. A unique clustered index ensures data integrity and provides an efficient physical storage structure.
    • View becomes materialized
    • Once the unique clustered index is created, the data defined by the view is computed and stored. This materialization is the core power of indexed views, turning a logical view into a physical data structure.
    • Future SELECTs read only the index
    • When a query matches the definition of the indexed view, the SQL Server query optimizer can choose to read directly from the pre-computed clustered index instead of accessing and joining the underlying base tables, leading to significant performance gains.

Example: AdventureWorks2014 – vStateProvinceCountryRegion

  • Complete DDL sequence (Listing 7-10)

DROP VIEW IF EXISTS Person.vStateProvinceCountryRegion;

CREATE OR ALTER VIEW Person.vStateProvinceCountryRegion
  WITH SCHEMABINDING
  AS
  SELECT sp.StateProvinceID, sp.StateProvinceCode, sp.IsOnlyStateProvinceFlag, sp.Name AS StateProvinceName,
         sp.TerritoryID, cr.CountryRegionCode, cr.Name AS CountryRegionName
  FROM Person.StateProvince AS sp
  INNER JOIN Person.CountryRegion AS cr ON sp.CountryRegionCode = cr.CountryRegionCode;

CREATE UNIQUE CLUSTERED INDEX IX_vStateProvinceCountryRegion
ON Person.vStateProvinceCountryRegion (StateProvinceID ASC, CountryRegionCode ASC);

– Columns: StateProvinceID, StateProvinceCode, IsOnlyStateProvinceFlag, StateProvinceName, TerritoryID, CountryRegionCode, CountryRegionName

  • Execution-plan observations (Figure 7-22)
    • DDL generates a plan because SQL Server must execute the view-definition query to populate the clustered index.
    • Operators: Clustered Index Insert fed by Nested Loops join of the two base tables.

Querying an Indexed View

SELECT StateProvinceCode, IsOnlyStateProvinceFlag, CountryRegionName
FROM Person.vStateProvinceCountryRegion;

• In EnterpriseEnterprise / DeveloperDeveloper Editions, optimizer matches directly to the indexed view.
• Resulting plan (Figure 7-23): single Clustered Index Scan on IXvStateProvinceCountryRegionIX_vStateProvinceCountryRegion (no join operators visible).
• In StandardStandard pre-2016 SP1 or ExpressExpress, require WITH (NOEXPAND)WITH\ (NOEXPAND) hint to force same behavior.

SELECT StateProvinceCode, IsOnlyStateProvinceFlag, CountryRegionName
FROM Person.vStateProvinceCountryRegion WITH (NOEXPAND);

Indexed-View Index Matching Outside the View

  • Query (Listing 7-12) hits base tables, not the view

  SELECT sp.StateProvinceCode, sp.IsOnlyStateProvinceFlag, cr.Name AS CountryRegionName
  FROM Person.StateProvince AS sp
  INNER JOIN Person.CountryRegion AS cr ON sp.CountryRegionCode = cr.CountryRegionCode;

• Optimizer may still choose the indexed view’s index if it provides lowest cost.
• Plan still shows single Clustered Index Scan on the view’s clustered index (again edition-dependent).

Limitations & Automatic Matching Failures

  • As query complexity grows, automatic indexed-view matching may fail.
    • Example (Listing 7-13): join Person.AddressPerson.Address with the view and filter by AddressID=22701AddressID=22701.

  SELECT pa.Color, spcr.StateProvinceName, spcr.CountryRegionName
  FROM Person.Address AS pa
  INNER JOIN Person.StateProvince AS sp ON pa.StateProvinceID = sp.StateProvinceID
  INNER JOIN Person.vStateProvinceCountryRegion AS spcr ON sp.StateProvinceID = spcr.StateProvinceID
  WHERE pa.AddressID = 22701;

• Plan (Figure 7-24) shows “view expansion”: optimizer substitutes base tables from view definition, then performs join.

  • Reasons
    • Algebrizer always expands views.
    • Optimizer estimates direct table access cheaper than using materialized index.

  • Work-around
    • Use NOEXPANDNOEXPAND hint to force use of indexed view (detailed in Chapter 10).

  SELECT pa.Color, spcr.StateProvinceName, spcr.CountryRegionName
  FROM Person.Address AS pa
  INNER JOIN Person.StateProvince AS sp ON pa.StateProvinceID = sp.StateProvinceID
  INNER JOIN Person.vStateProvinceCountryRegion WITH (NOEXPAND) AS spcr ON sp.StateProvinceID = spcr.StateProvinceID
  WHERE pa.AddressID = 22701;

Functions Overview

  • Two user-defined function (UDF) categories
    • Scalar UDFs – return single value.
    • Table-valued UDFs – return table expression.

  • Execution-plan handling can be deceptive; much work hidden behind a Compute Scalar.

Scalar UDF Example – dbo.ufnGetStock

  • Definition (Listing 7-14)

  CREATE FUNCTION dbo.ufnGetStock (@ProductID INT)
  RETURNS INT
  AS
  BEGIN
      DECLARE @ret INT;
      SELECT @ret = SUM(Quantity)
      FROM Production.ProductInventory
      WHERE ProductID = @ProductID AND LocationID = '6';

      IF @ret IS NULL
          SET @ret = 0;

      RETURN @ret;
  END;

• Parameters: @ProductID int@ProductID\ int
• Logic
SELECT @ret=SUM(Quantity) FROM Production.ProductInventory WHERE ProductID=@ProductID AND LocationID=6;SELECT\ @ret=SUM(Quantity)\ FROM\ Production.ProductInventory\ WHERE\ ProductID=@ProductID\ AND\ LocationID='6';
– If @ret@ret IS NULLIS\ NULL then set @ret=0@ret=0.
RETURN @ret;RETURN\ @ret;

  • Caller query (Listing 7-15)

  SELECT p.Name, dbo.ufnGetStock(p.ProductID) AS StockLevel
  FROM Production.Product AS p
  WHERE p.Color = 'Black';

SELECT p.Name, dbo.ufnGetStock(p.ProductID) AS StockLevel FROM Production.Product p WHERE p.Color=Black;SELECT\ p.Name,\ dbo.ufnGetStock(p.ProductID)\ AS\ StockLevel\ FROM\ Production.Product\ p\ WHERE\ p.Color='Black';
• Returns 9393 rows.

Actual-plan View (Figure 7-25)

  • Operators
    • Clustered Index Scan on Production.ProductProduction.Product ➔ Predicate Color=BlackColor='Black'.
    • Compute Scalar calls scalar UDF once per row (hidden).

  • Why only two operators?
    • Actual plan in SSMS collapses scalar-UDF details; you must inspect Defined Values within Compute Scalar to see the function invocation.

Estimated-plan View (Figure 7-27)

  • Reveals two separate plans
    • Main query plan (as above).
    • Nested plan for the scalar UDF.

  • Scalar UDF internal plan breakdown (Figure 7-30 reference)
    • Root operator: UDFUDF (T-SQL operator) with overall estimated cost.
    • Sub-branch 1 – SELECT
    – Clustered Index Seek on Production.ProductInventoryProduction.ProductInventory filtering by ProductIDProductID and LocationID=6LocationID=6.
    – Stream Aggregate to SUM(Quantity)SUM(Quantity).
    – Compute Scalar to assign @ret@ret.
    • Sub-branch 2 – CONDCOND (NULL check)
    – If @ret IS NULL@ret\ IS\ NULLASSIGNASSIGN sets @ret=0@ret=0.
    • Sub-branch 3 – RETURNRETURN operator outputs @ret@ret.

Performance Implications of Scalar UDFs

  • Hidden cost
    • UDF executes once per outer row ➔ 9393 executions in example.
    • Each execution repeats the internal plan (Seek ➔ Aggregate ➔ etc.).

  • Misleading STATISTICS IO
    • Reported logical reads for main query: 1515.
    • Actual cumulative reads when including UDF executions: 211211.
    • Rows touched: 365365 total across all UDF calls vs. 9393 reported.

  • Takeaway
    • Inline the logic (e.g., join to ProductInventoryProductInventory directly) to avoid repetitive execution and reduce I/O.

  SELECT p.Name, ISNULL(SUM(inv.Quantity), 0) AS StockLevel
  FROM Production.Product AS p
  LEFT JOIN Production.ProductInventory AS inv
      ON p.ProductID = inv.ProductID AND inv.LocationID = '6'
  WHERE p.Color = 'Black'
  GROUP BY p.Name;

Key Takeaways & Best Practices

  • Indexed views
    • Great for pre-computed aggregations or joins; ensure appropriate edition or use NOEXPANDNOEXPAND.
    • Always evaluate whether optimizer automatically chooses the index; inspect plans.

  • Scalar UDFs
    • Can hide significant work; prefer inline TVFs or rewrite as joins/expressions when performance matters.
    • Use estimated plans or Extended Events to expose hidden cost and validate true I/O.

  • General
    • Always analyze both actual and estimated plans for DDL, views, and UDFs to uncover non-obvious operators and costs.
    • Remember that STATISTICS IO/Time on the outer query may omit nested UDF activity; use XEvents or DMVs for full picture.

Overview of Table-Valued Functions (TVFs)

  • Two distinct varieties, each with different behavioral & optimization characteristics:

    • Inline Table-Valued Functions (iTVFs)
      • Also called parameterized views because they behave like a view whose definition can accept parameters.
      • Contain exactly one SELECT statement returned by the RETURN() clause.
      • No body (BEGIN…END) and no intermediate table variable.

    • Multi-Statement Table-Valued Functions (mTVFs)
      • Contain a declared table variable returned to the caller.
      • Allow multiple INSERT, UPDATE, DELETE, control-flow, etc.
      • Behave similarly to scalar UDFs with respect to plan opacity and cardinality estimation.

  • Both expose a table result, but appear very differently in execution plans.

Inline TVF (iTVF)

  • Example rewritten from Listing 7-14 (Listing 7-16):

  CREATE FUNCTION dbo.GetStock (@ProductID INT)
  RETURNS TABLE
  AS
  RETURN (
      SELECT SUM(pi.Quantity) AS QuantitySum
      FROM Production.ProductInventory AS pi
      WHERE pi.ProductID = @ProductID
        AND pi.LocationID = '6'
  );
  • Query that consumes the function (Listing 7-17):

  SELECT p.Name, gs.QuantitySum
  FROM Production.Product AS p
  CROSS APPLY dbo.GetStock(p.ProductID) AS gs
  WHERE p.Color = 'Black';
  • Key attributes:

    • Appears to optimizer as a single expanded query; no hidden operators.

    • Allows all cost estimates & statistics to flow naturally.

    • Can participate in parallel plans, inlining, predicate push-down, etc.

Execution Plan for iTVF (Figure 7-28)

  • No aggregation operator present even though the SELECT contains SUM(pi.Quantity):

    • Optimizer notices that predicate LocationID = 6 combined with a unique index on (ProductID, LocationID) guarantees ≤ 1 row / product.

    • Therefore SUM(Quantity)\text{SUM}(Quantity) is the same as the raw Quantity value → aggregation eliminated.

  • Left-to-right operator flow:

    • Merge Join (Right Outer) between ProductInventory & Product.

    • Clustered Index Scan (ProductInventory) with predicate push-down on LocationID = 6.

    • Compute Scalar converts SMALLINT QuantityINT because aggregation would normally promote the type; with no explicit aggregate the conversion is done here.

    • Clustered Index Scan (Product) provides product rows filtered later by p.Color = 'Black'.

  • Plan characteristics:

    • Estimated & Actual plans are identical except for runtime row counts.

    • Full cost visibility; cardinality is accurate.

Multi-Statement TVF (mTVF)

  • Rewritten example (Listing 7-18):

  CREATE FUNCTION dbo.GetStock2 (@ProductID INT)
  RETURNS @GetStock TABLE (QuantitySum INT NULL)
  AS
  BEGIN
      INSERT @GetStock (QuantitySum)
      SELECT SUM(pi.Quantity)
      FROM Production.ProductInventory AS pi
      WHERE pi.ProductID = @ProductID
        AND pi.LocationID = '6';
      RETURN;
  END
  • Consuming query identical to Listing 7-17 except call dbo.GetStock2.

  • Fundamental differences:

    • Internally uses a table variable (@GetStock).

    • Generates its own sub-plan that is hidden behind a single Table Valued Function operator in the outer plan.

    • Disallows cost-based rewrites, parallelism, & predicate push-down for the body.

Execution Plan for mTVF (Figure 7-29)

  • Outer plan now shows:

    • Nested Loops join where the inner side is a single Table Valued Function operator.

    • Other operators (e.g., scan on Product) remain visible.

  • Properties of the Table Valued Function operator (Figure 7-30):

    • Estimated Number of Rows = 100100 (hard-coded default for table variables in SQL Server 2014+; pre-2014 it was 11).

    • Estimated number of executions = 9393 (one per outer row).

    • Total estimated rows =100×93=9300= 100 \times 93 = 9300 vs. actual =1×93=93= 1 \times 93 = 93.

    • Large mismatch drives sub-optimal join choice & memory grants.

  • Hidden inner plan (Figure 7-31):

    • Resembles scalar UDF pattern:
      Table Insert → inserts into table variable.
      Clustered Index Scan on ProductInventory.
      Hash/Aggregate or Compute Scalar etc.

    • This work is executed once per outer row → significant cumulative cost.

Cardinality Estimation & Hard-Coded Defaults

  • For any table variable (used by mTVF), cardinality estimator lacks statistics → uses fixed default.

    • Pre-2014 default=1\text{Pre-2014 default} = 1 row.

    • SQL Server 2014+ default=100\text{SQL Server 2014+ default} = 100 rows.

  • Because the default is divorced from reality, join ordering and memory grants can be severely skewed.

Performance Implications & Hidden Costs

  • iTVF:
    • Fully exposes logic; optimizer can eliminate aggregates, push predicates, choose efficient joins.
    • No hidden batch iterations.

  • mTVF:
    • Appears as black-box to outer query; internal cost not considered.
    • Executed row-by-row if placed on inner side of loops join → "triangular" execution pattern (N outer × M inner).
    • Example I/O jump: iTVF = 4444 reads vs. mTVF = 11411141 reads (≈ 26× increase).

  • Similarity to scalar UDFs:
    • Both hide computations; both default to row-by-row execution; both hinder parallelism.

Comparison Summary

  • Scalar UDF
    • Returns single value; hidden internal plan; executes per row.

  • Inline TVF
    • Returns tabular result; inlined into outer query; full cost visibility.

  • Multi-Statement TVF
    • Returns tabular result via table variable; hidden plan; suffers from default cardinality =100=100.

Practical Guidelines / Best Practices

  • Prefer inline TVFs whenever possible for reusable, parameterized table logic.

  • Avoid multi-statement TVFs unless absolutely necessary; consider:
    • Rewriting as inline TVF or view + parameters.
    • Converting to stored procedure with OPTION (RECOMPILE) if row-by-row semantics unavoidable.

  • If you must use mTVF:
    • Capture execution stats & monitor for row mis-estimation.
    • Use OPTION (USE HINT('DISABLE_TFP')) in newer versions to force scalar/multi-statement function inlining if supported.
    • In SQL Server 2019+, enable Scalar UDF/Multi-Statement TVF Inlining feature for automatic plan expansion.

  • Always inspect:
    Estimated vs. Actual rows on the Table Valued Function operator.
    • Nested Loops inner vs. Hash/Batch modes to catch row-by-row hotspots.

Key Takeaways

  • Execution-plan visibility dictates optimizer effectiveness.
    • iTVF → transparent; mTVF/scalar UDF → opaque.

  • Hard-coded cardinality defaults (1 or 100) can mislead the optimizer, causing excessive I/O & CPU.

  • Reading complex plans follows the same core principles:
    • Start at leftmost/rightmost iterator (depending on display order).
    • Check properties, row flow, join types.

  • Larger plans are not fundamentally different—just require more systematic operator-by-operator analysis.