Replacing multiple UNION ALLs with a Temporary Table

When combining result sets from different queries, we often use the UNION ALL operator to bring these together. In this post, we'll look at an alternative approach that uses Temporary Tables instead of UNION ALL.

This approach can help make the code more manageable for complex queries that combine data.

When UNION ALL gets unwieldy

When we combine the result sets of two or more queries into a single result set, a typical approach is to use the UNION ALL operator.

UNION ALL has a couple of basic rules:

  • The number and order of columns are the same
  • Columns are compatible data types.

If a result set doesn't have the same number of columns, the usual solution is to add NULL where missing columns would be to meet the UNION ALL requirement.

See the query below, which combines two arbitrary summaries of the StackOverflow Mini database into a single result set.

SELECT
    'Most viewed post' AS summary_name
    ,Posts.Title AS most_viewed_post
    ,NULL AS posts_per_dayFROM
    StackOverflowMini.dbo.Posts
        INNER JOIN (SELECT MAX(ViewCount) AS view_count FROM StackOverflowMini.dbo.Posts) AS most_viewed
            ON Posts.ViewCount = most_viewed.view_count

UNION ALL

SELECT
    'Posts per day'
    ,NULL    ,COUNT(*)
    /
    DATEDIFF(DAY, MIN(Posts.CreationDate), MAX(Posts.CreationDate)) AS posts_per_day
FROM
    StackOverflowMini.dbo.Posts

Look familiar? Adding a new column to the result set involves adding more NULLs to all SELECTs involved in the union.

SELECT
    'Most viewed post' AS summary_name
    ,Posts.Title AS most_viewed_post
    ,NULL AS posts_per_day    ,NULL AS avg_answer_countFROM
    StackOverflowMini.dbo.Posts
        INNER JOIN (SELECT MAX(ViewCount) AS view_count FROM StackOverflowMini.dbo.Posts) AS most_viewed
            ON Posts.ViewCount = most_viewed.view_count

UNION ALL

SELECT
    'Posts per day'
    ,NULL    ,COUNT(*)
    /
    DATEDIFF(DAY, MIN(Posts.CreationDate), MAX(Posts.CreationDate)) AS posts_per_day
    ,NULLFROM
    StackOverflowMini.dbo.Posts

UNION ALL

SELECT
    'Average answer count'
    ,NULL    ,NULL    ,AVG(Posts.AnswerCount)
FROM
    StackOverflowMini.dbo.Posts
        INNER JOIN StackOverflowMini.dbo.PostTypes
            ON PostTypes.Id = Posts.PostTypeId
WHERE
    PostTypes.[Type] = 'Question'

Adding NULL for a missing column isn't a big deal on small, simple queries. However, adding NULL to every UNION ALL becomes cumbersome as the combined result sets and columns grow.

Use a Temporary Table instead.

Instead of filling our code with NULL, we can take advantage of the default behavior of Temp Tables.

When inserting data into a Temp Table, we can provide a column list that maps table columns to the columns of the proceeding SELECT or VALUES. When a table column is not listed in the INSERT, by default SQL Server fills that column with NULL, eliminating the need to add extra NULLs to each SELECT.

How to use

Recall the query from above that used the UNION ALL operator.

SELECT
    'Most viewed post' AS summary_name
    ,Posts.Title AS most_viewed_post
    ,NULL AS posts_per_day
    ,NULL AS avg_answer_count
FROM
    StackOverflowMini.dbo.Posts
        INNER JOIN (SELECT MAX(ViewCount) AS view_count FROM StackOverflowMini.dbo.Posts) AS most_viewed
            ON Posts.ViewCount = most_viewed.view_count

UNION ALL

SELECT
    'Posts per day'
    ,NULL
    ,COUNT(*)
    /
    DATEDIFF(DAY, MIN(Posts.CreationDate), MAX(Posts.CreationDate)) AS posts_per_day
    ,NULL
FROM
    StackOverflowMini.dbo.Posts

UNION ALL

SELECT
    'Average answer count'
    ,NULL
    ,NULL
    ,AVG(Posts.AnswerCount)
FROM
    StackOverflowMini.dbo.Posts
        INNER JOIN StackOverflowMini.dbo.PostTypes
            ON PostTypes.Id = Posts.PostTypeId
WHERE
    PostTypes.[Type] = 'Question'

Now let's rewrite to use a Temp Table. First, we define the Temp Table that will hold our combined result set.

DROP TABLE IF EXISTS #so_summary
CREATE TABLE #so_summary (
    summary_name VARCHAR(50)
    ,most_view_post VARCHAR(200)
    ,posts_per_day INT
    ,avg_answer_count INT
)

Then we replace each UNION ALL with the required INSERT.

INSERT INTO #so_summary(summary_name, most_view_post)SELECT
    'Most viewed post'
    ,Posts.Title
FROM
    StackOverflowMini.dbo.Posts
        INNER JOIN (SELECT MAX(ViewCount) AS view_count FROM StackOverflowMini.dbo.Posts) AS most_viewed
            ON Posts.ViewCount = most_viewed.view_count

INSERT INTO #so_summary(summary_name, posts_per_day)SELECT
    'Posts per day'
    ,COUNT(*)
    /
    DATEDIFF(DAY, MIN(Posts.CreationDate), MAX(Posts.CreationDate)) AS posts_per_day
FROM
    StackOverflowMini.dbo.Posts

INSERT INTO #so_summary(summary_name, avg_answer_count)SELECT
    'Average answer count'
    ,AVG(Posts.AnswerCount)
FROM
    StackOverflowMini.dbo.Posts
        INNER JOIN StackOverflowMini.dbo.PostTypes
            ON PostTypes.Id = Posts.PostTypeId
WHERE
    PostTypes.[Type] = 'Question'

Lastly, SELECT from the Temp Table.

SELECT * FROM #so_summary

If we want to add more result sets, we only need to extend the Temporary Table definition.

DROP TABLE IF EXISTS #so_summary
CREATE TABLE #so_summary (
    summary_name VARCHAR(50)
    ,most_view_post VARCHAR(200)
    ,posts_per_day INT
    ,avg_answer_count INT
    ,tag VARCHAR(50)    ,tag_count INT    ,ranking INT)


INSERT INTO #so_summary(summary_name, most_view_post)
SELECT ...

INSERT INTO #so_summary(summary_name, posts_per_day)
SELECT ...

INSERT INTO #so_summary(summary_name, avg_answer_count)
SELECT ...

INSERT INTO #so_summary(summary_name,tag ,tag_count ,ranking)SELECT
    TOP 5
    'Top 5 tags'
    ,tag.[value]
    ,COUNT(*) AS tag_count
    ,DENSE_RANK() OVER (ORDER BY COUNT(*) DESC) AS ranking
FROM
    StackOverflowMini.dbo.Posts
        CROSS APPLY string_split(REPLACE(REPLACE(Posts.Tags, '<' ,''), '>', ','), ',') as tag
WHERE
    Posts.Tags IS NOT NULL
    AND tag.[value] <> ''
GROUP BY
    tag.[value]
ORDER BY
    tag_count DESC

Benefits

  • The code becomes cleaner, more concise, and more readable
  • We can modify a single result set, of our combined result set, without needing to adjust every other result set
  • To change or add a result set, we only need to change our Temp Table, not every other SELECT
  • If we need to, we can reference records from the temp table.

Disadvantages

  • We're storing rows in an intermediate structure which can have performance implications.
  • If speed and performance are essential, then this approach is unlikely to be faster than a UNION ALL

Conclusion

Here we've seen an alternative way to combine data that can make your SQL simpler, cleaner, and easier to maintain. Next time you're dealing with an unruly UNION ALL, give this alternative a go!

Demo Data

© 2024 Andrew Villazon. All rights reserved.