How to use SQL Server's OPENJSON function

Thanks to the rise of JavaScript, Rest APIs, and No-SQL databases, JSON has become the data exchange format. However, JSON is a different way of representing data to a traditional relational database where data is stored as tables with rows, and often it can seem like the two paradigms aren't compatible.

In this post, I'll look at how to work with JSON in SQL Server by exploring the OPENJSON function and demonstrating how you can use SQL Server to make use of JSON-based data.

To get the most out of this post, I'm going to assume that you're familiar with JSON and writing SQL queries.

JSON in SQL Server

Before we begin there are a couple of things worth noting when working with JSON in SQL Server.

First of all Microsoft suggest that JSON is stored as the NVARCHAR(MAX) data type. It's possible to encounter issues with JSON text being truncated when using the VARCHAR(MAX) data type while NVARCHAR(MAX) supports up to 2GB of storage per value.

Secondly the OPENJSON function is only available under compatibility level 130.

Lets look at the function in more detail.

What is the OPENJSON function?

OPENJSON parses JSON text and returns JSON objects and properties as rows and columns. In other words, making JSON queryable.

OPENJSON has 1 required argument and 2 optional arguments:

  • jsonExpression: required (Unicode) text containing the JSON to parse
  • path: an optional JSON Path Expression which lets you target specific objects within a JSON text
  • with_clause: an optional WITH clause which defines the outputted result set. The WITH clause effectively maps JSON values to columns in a result set.

The function is set out like this:

OPENJSON( jsonExpression [ , path ] )  [ <with_clause> ]

As you'll see in a moment, the way the OPENJSON is written in a SQL query varies with the query's goal. Each goal changes how we use OPENJSON and the arguments provided to it.

Let's look at the different ways to use OPENJSON.

Inspecting JSON text

To start, let's use the function to inspect some JSON text. We'll pass in the only required argument, a string of JSON text. This triggers the function's default behavior.

To call the function, we include it in the FROM clause of the query.

DECLARE @json_text NVARCHAR(MAX)
SET @json_text = N'
    {
        "name": "Spider-Man",
        "vitals": {
            "species": "Human",
            "height": {
                "measure": 178,
                "unit": "cm"
            }
        },
        "current_team": null,
        "aliases": ["Bag-Man","Black Marvel","Peter Palmer"],
        "is_good": true,
        "created": 1962
    }
'

SELECT
    *
FROM
    OPENJSON(@json_text)ORDER BY
    [type]

Results:

key value type
current_team NULL 0
name Spider-Man 1
created 1962 2
is_good true 3
aliases ["Bag-Man","Black Marvel","Peter Palmer"] 4
vitals {"species": "Human","height": {"measure": 178,"unit": "cm"}} 5

The function's default action is to loop through each key at the top level (root) of the JSON text and return a row, which includes the key name, its value, and an integer indicating its JSON data type.

Notice that the function only loops through the top level of the JSON text.

Targeting an object with a JSON Path Expression

Before we move onto the next use of OPENJSON, let's look at the optional path argument.

The path argument allows us to target a specific object in the JSON text by defining a JSON Path Expression pointing to the object. The use of the path argument becomes essential as we deal with more complex JSON.

The JSON Path Expression has the following syntax:

OPENJSON(@json, 'path_mode $.path.to.an.object')

Paths always begin with the $ (dollar sign), which indicates the top level of the current JSON text.

To step further into an object, we can use the . (dot operator) to access keys inside nested objects. In this way, the path expression acts as a set of steps towards a given object in the JSON text.

It is possible to include a path mode before the path, which sets out how OPENJSON will respond if it cannot find the key given. There are two modes:

  • lax - (default) will return NULL and not raise an error
  • strict - will raise an error

Now that we know about the JSON Path Expression let's see an example of its use.

Using the JSON Path Expression

To target a specific object, we include the JSON Path Expression after the jsonText argument.

DECLARE @json_text NVARCHAR(MAX)
SET @json_text = N'
    {
        "name": "Spider-Man",
        "vitals": {            "species": "Human",            "height": {                "measure": 178,                "unit": "cm"            }        },
        "current_team": null,
        "aliases": ["Bag-Man","Black Marvel","Peter Palmer"],
        "is_good": true,
        "created": 1962
    }
'

SELECT
    *
FROM
    OPENJSON(@json_text, '$.vitals.height')ORDER BY
    [type]

Result:

key value type
unit cm 1
measure 178 2

Notice how the path argument has allowed us to target a specific object, but the function has still returned the same 3 columns from earlier.

Keys with a "$" character

When dealing with real-world JSON text, it's possible to encounter key names that break the JSON Path Expression.

The most common examples I've encountered have been key names that include a $ (dollar sign) or spaces. These will cause the OPENJSON function to error.

The fix to this is surprisingly straightforward. Simply encase the key name in double-quotes.

DECLARE @json_text NVARCHAR(MAX)
SET @json_text = N'
    {
        "$title": "Alice in Wonderland",
        "written by": "Lewis Carroll"
    }
'

SELECT
    *
FROM
    OPENJSON(@json_text)
        WITH (
            -- title VARCHAR(50) '$.$title' -- will cause error
            -- ,author VARCHAR(50) '$.written by' -- will cause error

            -- properly handled with double quotes
            title VARCHAR(50) '$."$title"'
            ,author VARCHAR(50) '$."written by"'
        )

In the next example, we'll see how to build a more useful result set from the JSON text.

Creating a result set from JSON text

So far, we've seen OPENJSON return information about a JSON text. However, the function's real utility comes from its ability to overlay a result set over a JSON text.

To define a result set, we use the other optional argument, the with_clause. The WITH clause comes after the call to OPENJSON and has the following syntax:

OPENJSON(jsonText)
    WITH (
        column_name DATATYPE jsonPathExpression
    )

Lets see an example of this:

DECLARE @json_text NVARCHAR(MAX) = N'
[{
    "name": "Spider-Man",
    "first_appearance": {"title":"Amazing Fantasy", "issue": 15}
},
{
    "name": "Iron Man",
    "first_appearance": {"title":"Tales of Suspence", "issue": 39}
},
{
    "name": "Captain America",
    "first_appearance": {"title":"Captain America Comics", "issue": 1}
}]
'

SELECT
    *
FROM
    OPENJSON(@json_text)        WITH (            [name] varchar(200) '$.name',            first_appeared varchar(200) '$.first_appearance.title',            issue_number int '$.first_appearance.issue'        )

Results:

name first_appearance issue_number
Spider-Man Amazing Fantasy 15
Iron Man Tales of Suspence 39
Captain America Captain America Comics 1

The result set we define in the WITH clause will behave just like a regular table and can be joined, filtered, or modified in the SELECT clause.

Automatically convert Data Types

A helpful feature of the WITH clause is that it supports sensible data type conversions within the schema definition.

The automatic conversion reduces the need to use CAST or CONVERT explicitly in the SELECT statement.

DECLARE @json_text NVARCHAR(MAX)
SET @json_text = N'
{
    "moon_landing": "1969-07-20T02:56:00+00:00",
    "seconds_in_a_day": "86400",
    "console_is_better_than_pc": true,
    "hundred_meter_world_record": "9.58"
}
'

SELECT
    random_facts.moon_landing
    ,random_facts.seconds_in_a_day / 3600 as hours_in_a_day
    ,random_facts.console_is_better_than_pc
    ,(100 / random_facts.hundred_meter_world_record) * 3.6 as hmwr_kph
FROM
    OPENJSON(@json_text)
        WITH (
            moon_landing DATETIME2
            ,seconds_in_a_day INT
            ,console_is_better_than_pc BIT
            ,hundred_meter_world_record FLOAT 
        ) as random_facts

Results:

moon_landing hours_in_a_day console_is_better_than_pc hmwr_kph
1969-07-20 02:56:00.0000000 24 1 37.5782881002088

You may have also noticed above that the JSON Path Expression was not used in the WITH clause. Where column names match keys in the current object then the JSON path is not required.

Query JSON text from a table

All the examples up to this point have used OPENJSON with a variable that stores the JSON text. In reality, JSON text will be stored in a table, so how do we query this?

Lets assume we're using a database called nobel which has a table of nobel laureates aptly named laureates and contains rows of JSON text.

To query JSON text in the table, we use the CROSS APPLY operand to apply OPENJSON per row and pass in the column name containing the JSON text.

SELECT
    js.surname
    ,js.first_name
    ,TRY_CAST(js.dob AS DATE) as dob
    ,js.gender
    ,js.birth_country
FROM 
    nobel.dbo.laureate
        CROSS APPLY OPENJSON(laureate.json_text)
            WITH (
                surname VARCHAR(255) '$.surname'
                ,first_name VARCHAR(100) '$.firstname'
                ,dob VARCHAR(20) '$.born'
                ,gender VARCHAR(10) '$.gender'
                ,birth_country VARCHAR(100) '$.bornCountry'
            ) as js
ORDER BY
    js.surname

Results:

surname first_name dob gender birth_country
Curie Marie 1867-11-07 female Russian Empire (now Poland)

Once applied, OPENJSON can be used in the same way as the examples above, the only difference being it is called per row.

Working with Arrays

The last use of OPENJSON we'll explore is working with Arrays (or list of items).

Often when working with JSON, the data we need to query is in an Array. It can be useful to have this data returned as a result set with a row per Array item.

Let's look at how we can do the same with OPENJSON.

To do this, we target the Array with a JSON Path Expression and return it in a result set using the WITH clause. We also add the AS JSON keyword, which designates the column as a JSON fragment, enabling it to be used with a subsequent call to OPENJSON.

To access the Array items we then use an OUTER APPLY and another OPENJSON call passing in the previously defined JSON fragment. The Array items can be mapped to columns using another WITH clause.

DECLARE @spider_man NVARCHAR(MAX)
SET @spider_man = N'
{
    "name": "Spider-Man",
    "aliases": ["Bag-Man","Black Marvel","Peter Palmer"],
    "power_stats": [
        {"power": "Intelligence", "value": 95},
        {"power": "Strength","value": 55},
        {"power": "Speed", "value": 65},
        {"power": "Durability", "value": 75},
        {"power": "Power", "value": 75},
        {"power": "Combat", "value": 95}
    ]
}'

SELECT
    power_stat.power
    ,power_stat.value
FROM
    OPENJSON(@spider_man)
        WITH (
            power_stats NVARCHAR(MAX) '$.power_stats' AS JSON        ) as root_js
    
    OUTER APPLY OPENJSON(root_js.power_stats)        WITH (            power VARCHAR(20) '$.power'            ,value INT '$.value'        ) as power_stat

Results:

surname first_name
Intelligence 95
Strength 55
Speed 65
etc

Notice that when using the JSON Path Expression, the path is defined relative to the JSON fragment and not original JSON text.

Where the Array does not contain objects, we can simply reference the items with the $ dollar sign.

...

SELECT
    alias.name
FROM
    OPENJSON(@spider_man)
        WITH (
            aliases NVARCHAR(MAX) '$.aliases' AS JSON
        ) as root_js
    
    OUTER APPLY OPENJSON(root_js.aliases)
        WITH (
            name VARCHAR(50) '$'        ) as alias

Array index support

The JSON Path Expression also supports Array Indexing when a path refers to an Array. This is useful in situations where you always want to query the same item in an Array, e.g., the first item in an Array.

To use Array Indexing, use the square bracket notation, [], with the number of the Index you wish to target. As is familiar with programming languages, Arrays are zero-based (start at zero).

DECLARE @meteor_landings NVARCHAR(MAX)
SET @meteor_landings = N'
    [
        {"name": "Aachen","coordinates": [6.08333,50.775]},
        {"name": "Aarhus","coordinates": [10.23333,56.18333]},
        {"name": "Abee","coordinates": [-113,54.21667]}
    ]
'

SELECT
    *
FROM
    OPENJSON(@meteor_landings)
        WITH(
            name VARCHAR(10) '$.name',
            latitude FLOAT '$.coordinates[0]',
            longitude FLOAT '$.coordinates[1]'
        )

Results:

name latitude longitude
Aachen 6.08333 50.775
Aarhus 10.23333 56.18333
Abee -113 54.21667

Array aggregations

At times it's useful to know the length of an Array, i.e., how many items the Array holds.

If you're familiar with programming, this is as simple as a function call or accessing a length property on the Array. We can achieve a similar result in SQL Server by sub-querying the OPENJSON function.

DECLARE @world_cup_winners NVARCHAR(MAX)
SET @world_cup_winners = N'
    [
        {"country": "Brazil","tournaments_won": [1958,1962,1970,1994,2002]},
        {"country": "Germany","tournaments_won": [1954,1974,1990,2014]},
        {"country": "Italy","tournaments_won": [1934,1938,1982,2006]},
        {"country": "Argentina","tournaments_won": [1978,1986]},
        {"country": "France","tournaments_won": [1998,2018]},        
        {"country": "England","tournaments_won": [1966]},
        {"country": "Spain","tournaments_won": [2010]},
        {"country": "Uruguay","tournaments_won": [1930,1950]}
    ]
'

SELECT
    countries.country
    ,tournaments_summary.number_won
FROM
    OPENJSON(@world_cup_winners)
        WITH(
            country VARCHAR(20) '$.country',
            tournaments_won NVARCHAR(MAX) '$.tournaments_won' AS JSON
        ) as countries
    
    OUTER APPLY (        SELECT COUNT(*) as number_won FROM OPENJSON(countries.tournaments_won)    ) as tournaments_summary

We can also apply a similar technique (in this situation where the Array items are integers) to return aggregations from an Array.

...

SELECT
    countries.country
    ,tournaments_summary.*
FROM
    OPENJSON(@world_cup_winners)
        WITH(
            country VARCHAR(20) '$.country',
            tournaments_won NVARCHAR(MAX) '$.tournaments_won' AS JSON
        ) as countries
    
    OUTER APPLY (
        SELECT 
            COUNT(*) as number_won
            ,MAX(tournament_years.year_of_tournament) as last_win        FROM 
            OPENJSON(countries.tournaments_won)                WITH (                    year_of_tournament INT '$'                ) as tournament_years    ) as tournaments_summary

Results:

country number_won
Brazil 5
Germany 4
Italy 4
etc

Notice that in the sub-query we call OPENJSON and include the WITH clause so that aggregations can be performed on the named result set.

Conclusion

I hope this exploration of OPENJSON has been useful.

What makes SQL Server's JSON support effective is it allows you to leverage JSON data sources without leaving SQL. This has been helpful when dealing with large volumes of JSON data where data can be injested and then explored through SQL without resorting to a (generally) more complicated general purpose programming language.

Further Reading

© 2024 Andrew Villazon. All rights reserved.