How the WITH Clause Can Simplify Your SQL Queries

This is a quick post to highlight how you can simplify your SQL using the WITH clause. Many of the posts on this blog use this technique to prepare queries that are easy to understand and work with.

  • Migrated from ken-williams.com

  • Explained recursion and updated the “performance” section.

Benefits of using the WITH Clause

The WITH clause allows you to reduce joins and logically sequence your code. It does this by creating temporary tables (technically they are views) that are usable by your main SQL statement so that you can break your code into smaller and easier to understand snippets.

Syntax of the WITH Clause

NOTE
I am working with Google Analytics 4 data in BigQuery for this post, but the following will be applicable to many other database systems.

The WITH clause is a prefix for your main query. You can use it to create one or more CTEs (also known as "common table expressions". Each CTE can be given a name, and then reference in other places in the same query.

As an example, here is how you might pull the unique ID's of users who visited your site yesterday and then came back again today:

# Step 1: Get a list of users who visited the site yesterday

WITH yesterday AS (
    SELECT
        user_pseudo_id AS yesterday_user_pseudo_id
    FROM `<project name>.<dataset name>.events_08012021`
)

# Step 2: Get a list of users who also visited the site today
# by pulling today's users and then running an inner join
# with those who visited yesterday.

SELECT
    user_pseudo_id
FROM `<project name>.<dataset name>.events_09012021`,yesterday
WHERE
    yesterday.yesterday_user_pseudo_id = user_pseudo_id

In the example above I'm defining a temporary table with the name "yesterday". You could also put a comma at the end to add other temporary tables if you choose, like this:

WITH yesterday AS (
    SELECT
        user_pseudo_id AS yesterday_user_pseudo_id
    FROM `<project name>.<dataset name>.events_08012021`
), last_week AS (
    SELECT
        user_pseudo_id AS yesterday_user_pseudo_id
    FROM `<project name>.<dataset name>.events_02012021`
)

Recursive vs. Non-recursive CTEs

The CTEs that you create using the WITH clause can be recursive (meaning that it references itself) or non-recursive. Virtually all of the CTE's created in this blog are non-recursive to keep them simple and easy to read, but recursive CTEs will declare that they are "RECURSIVE" and be given a keyword like this:

# Recursive CTE in a WITH clause
# The recursive keyword in this example is "n"

WITH RECURSIVE
  T1 AS (
    (SELECT 1 AS n) UNION ALL
    (SELECT n + 2 FROM T1 WHERE n < 4))
SELECT * FROM T1 ORDER BY n

/*---*
 | n |
 +---+
 | 1 |
 | 3 |
 | 5 |
 *---*/

So why am I teaching you about recursive CTE's if I don't use them in this blog? Well it's so you can understand how the WITH clause impacts query performance...

Performance

Recursive CTEs will be materialized in BigQuery, which means that they will be precomputed and the results will be cached. This is great for performance because they will only be executed once even if they are referenced multiple times.

The results of your non-recursive CTEs will not be materialized however. So if you reference the CTE multiple times it will be executed once for each reference.

So, the tl;dr is that recursive CTEs (which you will not find in this blog) are helpful for performance, but non-recursive CTEs are useful to make your code readable. You can read Google's post on this topic HERE

Other Resources for BigQuery and GA4

Here are a few other posts that I’ve created for using BigQuery with data from Google Analytics 4:

Previous
Previous

Integrating Salesforce Sales Cloud with Google Analytics: Benefits and Limitations [Part 1 of 2]

Next
Next

Google Signals and Privacy in Google Analytics 4