Recursive CTEs in SQL Server

One of the more interesting features of Common Table Expressions (CTEs) is their ability to refer to themselves. This ability allows the CTE to perform something called Recursion, and in this post, we'll look at how to build Recursive CTEs in T-SQL and situations where you might use Recursive CTEs.

For this post, I will assume that you're already familiar with Common Table Expressions and are comfortable using them in a query. If you're not but want to learn more, I highly recommend Itzik Ben-Gan's exploration of CTEs.

What is Recursion?

In programming, Recursion is a technique used to solve problems by breaking them into smaller and smaller sub-tasks of the same type. Recursion is typically understood as a function that calls itself until a condition is met.

Recursion works well on certain problems, particularly involving hierarchical data (think tree-style structures) or where a problem naturally suits being broken into sub-problems.

Recursive CTEs

SQL Server implements Recursion through Recursive CTEs, a unique type of CTE that references itself.

Structure and layout

A Recursive CTE has three components:

  • Anchor Query: Think of this as the starting point for the query. The anchor provides the initial result set.
  • Recursive Query: Following the anchor query, the Recursive query runs iteratively until a condition is met. The result set from each iteration serves as input into the next.
  • Break Condition (optional): Also called the Termination Condition. The predicate or condition that stops any further iteration.

Recursive CTEs have the following layout:

WITH recursive_cte(list_of_col_names) AS (
    
    anchor_query

    UNION ALL

    recursive_query

    WHERE
    
    break_condition
)
SELECT
    columns
FROM
    recursive_cte

Writing a Recursive CTE

Now that you know more about Recursion and Recursive CTE requirements let's write a Recursive CTE for hierarchical data.

Data

We'll work off the data below, stored in a temp table so you can run it without a physical table.

DROP TABLE IF EXISTS #cats
CREATE TABLE #cats (
    cat_id INT
    ,cat VARCHAR(20)
    ,ancestor_id INT
)

INSERT INTO #cats(cat_id, cat, ancestor_id)
VALUES
(1,'Felidae', NULL)
,(2,'Pantherinae', 1)
,(3,'Felinae', 1)
,(4,'Panthera', 2)
,(5,'Acinonyx', 3)
,(6,'Acinonyxa', 3)
,(7,'Leopard', 4)
,(8,'Lion', 4)
,(9,'Jaguar', 4)
,(10,'Snow leopard', 4)
,(11,'Tiger', 4)
,(12,'Cheetah', 5)
,(13,'Cougar', 6);

There are a couple of features of this data to pay attention to:

  • The data represents the relationships between an organism and its common ancestors (known as a Cladogram). Each row is a family tree member, and the rows relate to each other.
  • The data has a hierarchy, so there will be a row(s) that represents the top of the hierarchy. These rows do not have an ancestor.

If visualized, the data might look something like this.

Big Cats Family Tree

Let's create a Recursive CTE to traverse and query this data.

Anchor Query

The first step in a Recursive CTE is creating the Anchor Query. In the Anchor Query, we select the first level of the hierarchy, i.e., the rows that don't have an ancestor.

You can think of the Anchor Query as the starting point for our iterative query.

WITH 
    recursive_cte AS 
    (
        SELECT
            #cats.cat_id
            ,#cats.cat
            ,#cats.ancestor_id
            ,CAST(NULL AS VARCHAR(20)) as ancestor
            ,0 as level
            ,CAST(#cats.cat as VARCHAR(MAX)) as lineage
        FROM
            #cats
        WHERE            ancestor_id IS NULL    )

Recursive query

Next comes the Recursive Query, where the CTE will reference itself.

WITH 
    recursive_cte AS 
    (
        SELECT
            #cats.cat_id
            ,#cats.cat
            ,#cats.ancestor_id
            ,CAST(NULL AS VARCHAR(20)) as ancestor
            ,0 as level
            ,CAST(#cats.cat as VARCHAR(MAX)) as lineage
        FROM
            #cats
        WHERE
            ancestor_id IS NULL
        
        UNION ALL

        SELECT
            #cats.cat_id
            ,#cats.cat
            ,#cats.ancestor_id
            ,recursive_cte.cat
            ,level + 1
            ,recursive_cte.lineage + ' > ' + #cats.cat
        FROM
            #cats
                INNER JOIN recursive_cte                    ON recursive_cte.cat_id = #cats.ancestor_id    )

It's important to note the INNER JOIN, which references recursive_cte. The join steps the query into the next level of the hierarchy because the Recursive Query returns only rows that match the ancestor_id from the previous Anchor Query.

And lastly, as with a regular CTE, we query the CTE with a SELECT.

WITH 
    recursive_cte AS 
    (
        SELECT
            #cats.cat_id
            ,#cats.cat
            ,#cats.ancestor_id
            ,CAST(NULL AS VARCHAR(20)) as ancestor
            ,0 as level
            ,CAST(#cats.cat as VARCHAR(MAX)) as lineage
        FROM
            #cats
        WHERE
            ancestor_id IS NULL
        
        UNION ALL

        SELECT
            #cats.cat_id
            ,#cats.cat
            ,#cats.ancestor_id
            ,recursive_cte.cat
            ,level + 1
            ,recursive_cte.lineage + ' > ' + #cats.cat
        FROM
            #cats
                INNER JOIN recursive_cte
                    ON recursive_cte.cat_id = #cats.ancestor_id
    )
SELECT
    cat_id
    ,cat
    ,ancestor_id
    ,ancestor
    ,recursive_cte.level
    ,recursive_cte.lineage
FROM
    recursive_cte

What about a Break Condition?

As mentioned above, the Break Condition is the 3rd and optional component of a Recursive CTE. The Break Condition tells the query when to stop the Recursive iteration.

In the case of hierarchical data, we don't need a Break Condition, as SQL Server will continue down the hierarchy until it reaches the end.

How it works

There are a couple of key ideas to understanding how SQL Server processes the Recursive CTE:

  • The result set gets built recursively, i.e., step by step.
  • At each recursion, only the result set of the previous step is available to the Recursive Query, not the accumulating result set.

Let's look at how the result set gets built up as the query steps through the data.

Initial query

The query begins with the Anchor, which returns all rows without an ancestor, i.e., the top of the hierarchy.

-- Anchor Query
SELECT
    #cats.cat_id
    ,#cats.cat
    ,#cats.ancestor_id
    ,CAST(NULL AS VARCHAR(20)) as ancestor
    ,0 as level
    ,CAST(#cats.cat as VARCHAR(MAX)) as lineage
FROM
    #cats
WHERE
    ancestor_id IS NULL

At this iteration, the result set of recursive_cte looks like this.

| cat_id | cat     | ancestor_id | ancestor | level | lineage |
|--------|---------|-------------|----------|-------|---------|
| 1      | Felidae | NULL        | NULL     | 0     | Felidae |

Recursion 1

After this, the Recursion starts, and the query engine repeatedly runs the Recursive Query.

recursive_cte is joined to the original data. The effect of this is a result set of descendants of rows from the previous step.

-- Recursive query
UNION ALL

SELECT
    #cats.cat_id
    ,#cats.cat
    ,#cats.ancestor_id
    ,recursive_cte.cat
    ,level + 1
    ,recursive_cte.lineage + ' > ' + #cats.cat
FROM
    #cats
        INNER JOIN recursive_cte            ON recursive_cte.cat_id = #cats.ancestor_id

The result set of recursive_cte now looks like this.

| cat_id | cat         | ancestor_id | ancestor | level | lineage               |
|--------|-------------|-------------|----------|-------|-----------------------|
| 2      | Pantherinae | 1           | Felidae  | 1     | Felidae > Pantherinae |
| 3      | Felinae     | 1           | Felidae  | 1     | Felidae > Felinae     |

Recursion 2

The Recursive Query repeats and the result set of the previous step is joined back to the original data, which takes us into the next level of the hierarchy.

The result set of recursive_cte now looks like this.

| cat_id | cat       | ancestor_id | ancestor    | level | lineage                          |
|--------|-----------|-------------|-------------|-------|----------------------------------|
| 5      | Acinonyx  | 3           | Felinae     | 2     | Felidae > Felinae > Acinonyx     |
| 6      | Acinonyxa | 3           | Felinae     | 2     | Felidae > Felinae > Acinonyxa    |
| 4      | Panthera  | 2           | Pantherinae | 2     | Felidae > Pantherinae > Panthera |

The last Recursion

The Recursive Query repeats once more, stepping into the last level of the hierarchy.

| cat_id | cat          | ancestor_id | ancestor  | level | lineage                                         |
|--------|--------------|-------------|-----------|-------|-------------------------------------------------|
| 13     | Cougar       | 6           | Acinonyxa | 3     | Felidae > Felinae > Acinonyxa > Cougar          |
| 12     | Cheetah      | 5           | Acinonyx  | 3     | Felidae > Felinae > Acinonyx > Cheetah          |
| 7      | Leopard      | 4           | Panthera  | 3     | Felidae > Pantherinae > Panthera > Leopard      |
| 8      | Lion         | 4           | Panthera  | 3     | Felidae > Pantherinae > Panthera > Lion         |
| 9      | Jaguar       | 4           | Panthera  | 3     | Felidae > Pantherinae > Panthera > Jaguar       |
| 10     | Snow leopard | 4           | Panthera  | 3     | Felidae > Pantherinae > Panthera > Snow leopard |
| 11     | Tiger        | 4           | Panthera  | 3     | Felidae > Pantherinae > Panthera > Tiger        |

At this point, the query has reached the bottom of the hierarchy, where no more rows exist, and returns the accumulated result set.

| cat_id | cat          | ancestor_id | ancestor    | level | lineage                                         |
|--------|--------------|-------------|-------------|-------|-------------------------------------------------|
| 1      | Felidae      | NULL        | NULL        | 0     | Felidae                                         |
| 2      | Pantherinae  | 1           | Felidae     | 1     | Felidae > Pantherinae                           |
| 3      | Felinae      | 1           | Felidae     | 1     | Felidae > Felinae                               |
| 5      | Acinonyx     | 3           | Felinae     | 2     | Felidae > Felinae > Acinonyx                    |
| 6      | Acinonyxa    | 3           | Felinae     | 2     | Felidae > Felinae > Acinonyxa                   |
| 13     | Cougar       | 6           | Acinonyxa   | 3     | Felidae > Felinae > Acinonyxa > Cougar          |
| 12     | Cheetah      | 5           | Acinonyx    | 3     | Felidae > Felinae > Acinonyx > Cheetah          |
| 4      | Panthera     | 2           | Pantherinae | 2     | Felidae > Pantherinae > Panthera                |
| 7      | Leopard      | 4           | Panthera    | 3     | Felidae > Pantherinae > Panthera > Leopard      |
| 8      | Lion         | 4           | Panthera    | 3     | Felidae > Pantherinae > Panthera > Lion         |
| 9      | Jaguar       | 4           | Panthera    | 3     | Felidae > Pantherinae > Panthera > Jaguar       |
| 10     | Snow leopard | 4           | Panthera    | 3     | Felidae > Pantherinae > Panthera > Snow leopard |
| 11     | Tiger        | 4           | Panthera    | 3     | Felidae > Pantherinae > Panthera > Tiger        |

A closer look at the level and lineage columns indicates how SQL Server traversed the hierarchy.

It began with the root of the hierarchy, then traveled down to the Felidae branch before returning and traveling down Pantherinae.

Recursion Limit Error

Due to the potential for Recursive CTEs to run indefinitely, SQL Server has protections that limit how many times it runs the Recursive query. SQL Server sets the default Recursion Limit to 100, i.e., 100 Recursive Query calls.

To modify this, we can include the OPTION clause with the MAXRECURSION option. The MAXRECURSION option takes a number specifying how many calls of the Recursive query to allow.

Setting MAXRECURSION to 0 allows indefinite recursions or as many recursions as required to complete the query.

WITH recursive_cte(list_of_col_names) AS (
    
    anchor_query

    UNION ALL

    recursive_query

    WHERE
    
    break_condition
)
SELECT
    columns
FROM
    recursive_cte

OPTION (MAXRECURSION 0)

When dealing with Recursion, it's a good idea to consider using a Break Condition or setting a limit to an expected value. For example, using Recursion to generate a result set of days in a year (see below), you might set the MAXRECURSION to 365.

Setting a limit on the Recursion helps protect against unexpected behavior due to data changing.

Other uses for Recursive CTEs

Apart from hierarchical data, Recursive CTEs have a couple of other uses.

Generating a sequence with Recursive CTEs

Recursive CTEs can be used to generate a sequence of values.

Here the Anchor Query sets the starting value, and each call of the Recursive Query adds to the iteration before it.

WITH recursive_sequence(sequence_num) AS (
    SELECT
        0
    
    UNION ALL

    SELECT
        recursive_sequence.sequence_num + 1
    FROM 
        recursive_sequence
    WHERE
        (recursive_sequence.sequence_num + 1) < 10)
SELECT
    recursive_sequence.sequence_num
FROM
    recursive_sequence

Notice that we included a Termination Condition in the Recursive Query of the CTE. The Termination Condition tells the Recursive CTE when to stop; without this, the query would continue repeating until hitting the max recursion limit (see above).

Here we apply the same idea to create a simple calendar using a Recursive CTE and the DATEADD function.

DECLARE @calendar_start DATE = '2022-01-01'
DECLARE @calendar_end DATE = DATEADD(YEAR, 1, @calendar_start);

WITH calendar(calendar_date, calendar_month, calendar_weekday) AS (
    SELECT
        @calendar_start
        ,DATENAME(MONTH, @calendar_start)
        ,DATENAME(WEEKDAY, @calendar_start)
    
    UNION ALL
    
    SELECT
        DATEADD(DAY, 1, calendar_date)        ,DATENAME(MONTH, DATEADD(DAY, 1, calendar_date))
        ,DATENAME(WEEKDAY, DATEADD(DAY, 1, calendar_date))
    FROM
        calendar
    WHERE
        calendar_date < @calendar_end)
SELECT
    *
FROM
    calendar
OPTION (MAXRECURSION 0)

Use Recursive CTEs to generate rows from aggregated data

A variation on generating a sequence, we can use Recursive CTEs to generate rows from summarized data. The summarized data needs to include a number column for this to work.

-- Setup data
DECLARE @summary TABLE (
    fruit VARCHAR(10)
    ,is_preferred CHAR
    ,total INT
)
INSERT INTO @summary(fruit, is_preferred, total)
VALUES
('Banana', 'Y', 2)
,('Banana', 'N', 3)
,('Orange', 'Y', 1)
,('Orange', 'N', 0)
,('Mango', 'Y', 4)
,('Mango', 'N', 1);

-- Setup break point, i.e., stop the sequence after this many rows
DECLARE @break_point INT = (SELECT MAX(total) FROM @summary);

-- Generate sequence of rows, stopping at the break point
WITH sequence_cte(sequence_num) AS (
    -- Anchor query
    SELECT
        1
    
    UNION ALL

    -- Recursive query
    SELECT
        sequence_num + 1
    FROM
        sequence_cte
    WHERE
        sequence_cte.sequence_num < @break_point
)

-- Produce disaggregated result set
SELECT
    summary.fruit
    ,summary.is_preferred
FROM
    @summary as summary
        INNER JOIN sequence_cte
            ON sequence_cte.sequence_num <= summary.total
ORDER BY
    summary.fruit
    ,summary.is_preferred DESC
    ,sequence_cte.sequence_num;

First, we use a Recursive CTE to generate a sequence of values (see above) up to the largest number in our summarized data.

DECLARE @break_point INT = (SELECT MAX(total) FROM @summary);
WITH sequence_cte(sequence_num) AS (
    SELECT
        1
    
    UNION ALL

    SELECT
        sequence_num + 1
    FROM
        sequence_cte
    WHERE
        sequence_cte.sequence_num < @break_point)

Then, to generate the rows, we join the sequence of values back to the original summarized data. Note the join condition which causes this to happen.

SELECT
    summary.fruit
    ,summary.is_preferred
FROM
    @summary as summary
        INNER JOIN sequence_cte
            ON sequence_cte.sequence_num <= summary.total

Separate delimited values into rows with a Recursive CTE

Note that this functionality is available from SQL Server 2016 onwards via the STRING_SPLIT() function.

If we have a column of data and the data is separated by a delimiter, e.g., a comma, we can use Recursive CTEs to turn this into rows.

In this situation, we'll need a column that uniquely identifies a row to ensure the separated values are associated with the correct identifier.

First, we use the LEFT() function to get the value before the first comma. Then we take the value before the first comma and use SUBSTRING() remove it from our comma-separated column.

This process repeats in the Recursive Query, with each Recursion working on a smaller and smaller list of comma-separated values. When there are no more values left, the Recursion stops.

DECLARE @csv_table TABLE (
    row_id INT
    ,csv_values NVARCHAR(MAX)
)

INSERT INTO @csv_table(row_id, csv_values)
VALUES
(1, 'Purple,Green,Yellow')
,(2, 'Python')
,(3,'X,Y,Z');

WITH cte(row_id, orig_column, split_column, remaining_after_split) AS (
    -- Anchor query
    SELECT
        row_id
        ,csv_values
        ,LEFT(csv_values, CHARINDEX(',', csv_values + ',') -1 )        ,SUBSTRING(csv_values, CHARINDEX(',', csv_values + ',') +1, LEN(csv_values))    FROM
        @csv_table
    
    UNION ALL

    -- Recursive query
    SELECT
        row_id
        ,orig_column
        ,LEFT(remaining_after_split, CHARINDEX(',', remaining_after_split + ',') -1 )        ,SUBSTRING(remaining_after_split, CHARINDEX(',', remaining_after_split + ',') +1, LEN(remaining_after_split))    FROM
        cte
    WHERE
        remaining_after_split > ''
)
SELECT
    *
FROM
    cte
ORDER BY
    row_id

We can see from the result set how the column of comma-separated values becomes smaller as the Recursion breaks down the task.

| row_id | orig_column         | split_column | remaining_after_split |
|--------|---------------------|--------------|-----------------------|
| 1      | Purple,Green,Yellow | Purple       | Green,Yellow          |
| 1      | Purple,Green,Yellow | Green        | Yellow                |
| 1      | Purple,Green,Yellow | Yellow       |                       |
| 2      | Python              | Python       |                       |
| 3      | X,Y,Z               | X            | Y,Z                   |
| 3      | X,Y,Z               | Y            | Z                     |
| 3      | X,Y,Z               | Z            |                       |

Summary

And that's a wrap on Recursive CTEs. Here's a summary of the things we covered:

  • A Recursive CTE refers to itself, which enables Recursion.
  • Recursive CTEs repeatidly call themselves and break down a task into smaller pieces with each call.
  • Traversing hierarchical data or generating sequences are good problems for a Recursive CTE.
  • Consider Break or Termination Conditions to prevent iterating indefinitely.

Further Reading

© 2024 Andrew Villazon. All rights reserved.