SQL Examples: Common Table Expressions (CTEs) and Subqueries¶

This notebook continues my SQL practice with the Chinook database, focusing on building layered and reusable queries through CTEs and subqueries.

These exercises demonstrate practical use of:

  • CTEs to structure multi-step analyses and improve query readability
  • Subqueries embedded in FROM and WHERE clauses to filter and rank
  • Ranking and filtering results (e.g., top customers, most popular tracks) using aggregated metrics
  • Nested aggregation to compare individual items to group averages
  • Combining CTEs and joins

Setup¶

In [1]:
from pathlib import Path
from IPython.display import Image, display
import pandas as pd

# project root
repo_root = Path().resolve().parent  

# database path relative to project root
DB_PATH = repo_root / "data" / "Chinook_Sqlite.sqlite"

assert DB_PATH.exists(), f"Database file not found: {DB_PATH}"

# SQLAlchemy engine
from sqlalchemy import create_engine
engine = create_engine(f"sqlite:///{DB_PATH}")

# image path relative to repo root
image_path = repo_root / "input" / "ChinookDBStructure.png"

# check it exists
assert image_path.exists(), f"Image file not found: {image_path}"

# display
display(Image(filename=str(image_path), width=600, height=400))
No description has been provided for this image

Query Examples¶

1) Create a CTE for top 10 customers by revenue and join it to get their invoices.¶

In [2]:
query = """

WITH top_10_customers AS (
  SELECT 
      c.CustomerId,
      TRIM(c.FirstName || ' ' || c.LastName) AS customer_name,
      SUM(i.Total) AS total_invoice_revenue
  FROM customer AS c
  JOIN invoice AS i 
      ON c.CustomerId = i.CustomerId
  GROUP BY c.CustomerId, c.FirstName, c.LastName
  ORDER BY total_invoice_revenue DESC
  LIMIT 10
)
SELECT 
    ttc.customer_name,
    i.InvoiceId,
    i.InvoiceDate,
    i.Total
FROM invoice AS i
JOIN top_10_customers AS ttc 
    ON i.CustomerId = ttc.CustomerId
ORDER BY ttc.customer_name, i.InvoiceDate ;

"""
df = pd.read_sql_query(query, engine)
df
Out[2]:
customer_name InvoiceId InvoiceDate Total
0 Astrid Gruber 78 2021-12-08 00:00:00 1.98
1 Astrid Gruber 89 2022-01-18 00:00:00 18.86
2 Astrid Gruber 144 2022-09-18 00:00:00 8.91
3 Astrid Gruber 273 2024-04-24 00:00:00 1.98
4 Astrid Gruber 296 2024-07-27 00:00:00 3.96
... ... ... ... ...
65 Victor Stevens 190 2023-04-18 00:00:00 1.98
66 Victor Stevens 201 2023-05-29 00:00:00 18.86
67 Victor Stevens 256 2024-01-27 00:00:00 8.91
68 Victor Stevens 385 2025-09-02 00:00:00 1.98
69 Victor Stevens 408 2025-12-05 00:00:00 3.96

70 rows × 4 columns

2) Using a subquery, find the genres whose tracks are, on average, are the most expensive.¶

In [3]:
query = """

SELECT 
     g.name AS genre_name,
     AVG(t.UnitPrice) AS average_track_price
FROM track AS t
JOIN genre AS g
  ON t.GenreId = g.GenreId
GROUP BY g.GenreId, g.name
  HAVING AVG(t.UnitPrice) = (
    SELECT MAX(average_price) --filtering by genres that have average max price
    FROM (
        SELECT 
             AVG(UnitPrice) AS average_price
        FROM track
        GROUP BY GenreId 
     ) AS genre_averages
);

"""
df = pd.read_sql_query(query, engine)
df
Out[3]:
genre_name average_track_price
0 Science Fiction 1.99
1 TV Shows 1.99
2 Sci Fi & Fantasy 1.99
3 Drama 1.99
4 Comedy 1.99

3) Write a CTE to compute total invoice amount per country, then select countries with revenue > 150.¶

In [5]:
query = """

WITH country_invoice_total AS (
    SELECT 
         c.Country,
         COALESCE(SUM(i.Total),0) AS total_invoice_amount
    FROM invoice AS i
    JOIN customer AS c
      ON i.CustomerId = c.CustomerId
    GROUP BY c.Country
)
SELECT 
     Country,
     total_invoice_amount
FROM country_invoice_total
WHERE total_invoice_amount > 150
ORDER BY total_invoice_amount DESC ;

"""
df = pd.read_sql_query(query, engine)
df
Out[5]:
Country total_invoice_amount
0 USA 523.06
1 Canada 303.96
2 France 195.10
3 Brazil 190.10
4 Germany 156.48

4) Find tracks longer than the average track length (subquery in WHERE).¶

In [22]:
query = """

SELECT 
     Name AS track_name,
     ROUND(Milliseconds/(60 * 1000), 1) AS track_length_minutes 
FROM track
WHERE Milliseconds > (
    SELECT AVG(Milliseconds)
    FROM track
)
ORDER BY track_length_minutes DESC

"""
df = pd.read_sql_query(query, engine)
df
Out[22]:
track_name track_length_minutes
0 Occupation / Precipice 88.0
1 Through a Looking Glass 84.0
2 Battlestar Galactica, Pt. 1 49.0
3 Battlestar Galactica, Pt. 2 49.0
4 The Man With Nine Lives 49.0
... ... ...
489 Concerto for Clarinet in A Major, K. 622: II. ... 6.0
490 Karelia Suite, Op.11: 2. Ballade (Tempo Di Men... 6.0
491 Wellington's Victory or the Battle Symphony, O... 6.0
492 Rehab (Hot Chip Remix) 6.0
493 You Sent Me Flying / Cherry 6.0

494 rows × 2 columns

5) Use a CTE to rank artists by number of tracks.¶

In [19]:
query = """

WITH track_counts AS (   
    SELECT 
         ar.ArtistId,
         ar.Name AS artist_name,
         COUNT(tr.TrackId) as track_count
    FROM artist AS ar
    LEFT JOIN album AS al 
      ON ar.ArtistId = al.ArtistId
    JOIN track AS tr
      ON al.AlbumId = tr.AlbumId
    GROUP BY ar.ArtistId, ar.Name
)
SELECT 
    ArtistId,
    artist_name,
    track_count,
    DENSE_RANK() OVER(ORDER BY track_count DESC, artist_name) AS rank_by_track_counts
FROM track_counts
ORDER BY track_count DESC

"""
df = pd.read_sql_query(query, engine)
df
Out[19]:
ArtistId artist_name track_count rank_by_track_counts
0 90 Iron Maiden 213 1
1 150 U2 135 2
2 22 Led Zeppelin 114 3
3 50 Metallica 112 4
4 58 Deep Purple 92 5
... ... ... ... ...
199 259 The 12 Cellists of The Berlin Philharmonic 1 200
200 231 Ton Koopman 1 201
201 211 Wilhelm Kempff 1 202
202 255 Yehudi Menuhin 1 203
203 212 Yo-Yo Ma 1 204

204 rows × 4 columns

6) Find customers whose invoices exceed the average invoice total.¶

In [33]:
query = """

SELECT DISTINCT
    TRIM(c.FirstName || ' ' || c.LastName) AS customer_name
FROM customer AS c
JOIN invoice AS i
  ON c.CustomerId = i.CustomerId
WHERE i.Total > (SELECT AVG(Total) FROM invoice)
ORDER BY customer_name

"""
df = pd.read_sql_query(query, engine)
df.head(10) # showing just the top 10 customer_names
Out[33]:
customer_name
0 Aaron Mitchell
1 Alexandre Rocha
2 Astrid Gruber
3 Bjørn Hansen
4 Camille Bernard
5 Daan Peeters
6 Dan Miller
7 Diego Gutiérrez
8 Dominique Lefebvre
9 Eduardo Martins

7) List the top 3 most popular tracks per genre using a subquery.¶

In [22]:
# subquery version

query = """
  SELECT *
  FROM (
    SELECT
      g.Name AS genre_name,
      t.Name AS track_name,
      COUNT(il.Quantity) AS track_invoice_count,
      DENSE_RANK() OVER(PARTITION BY g.Name ORDER BY COUNT(il.Quantity) DESC) AS track_rank_by_genre
    FROM genre AS g
    JOIN track AS t
      ON g.GenreId = t.GenreId
    JOIN invoiceline AS il 
      ON t.TrackId = il.TrackId
    GROUP BY g.Name, t.Name
) AS ranked_tracks
WHERE track_rank_by_genre <= 3

"""
df = pd.read_sql_query(query, engine)
df
Out[22]:
genre_name track_name track_invoice_count track_rank_by_genre
0 Alternative Until We Fall 1 1
1 Alternative Sound of a Gun 1 1
2 Alternative Slowness 1 1
3 Alternative Show Me How to Live (Live at the Quart Festival) 1 1
4 Alternative Shape of Things to Come 1 1
... ... ... ... ...
1321 World Din Din Wo (Little Child) 1 1
1322 World Demorou! 1 1
1323 World Cafezinho 1 1
1324 World Aos Leões 1 1
1325 World A Moça e a Chuva 1 1

1326 rows × 4 columns

In [21]:
# CTE version
query = """

WITH top_tracks_by_genre AS(
  SELECT
     g.Name AS genre_name,
     t.Name AS track_name,
     COUNT(il.Quantity) AS track_invoice_count,
     DENSE_RANK() OVER(PARTITION BY g.Name ORDER BY COUNT(il.Quantity) DESC) AS genre_track_rank
  FROM genre AS g
  JOIN track AS t
    ON g.GenreId = t.GenreId
  JOIN invoiceline AS il 
    ON t.TrackId = il.TrackId
  GROUP BY g.Name, t.Name
)
SELECT 
     genre_name,
     track_name,
     track_invoice_count,
     genre_track_rank
FROM top_tracks_by_genre
WHERE genre_track_rank <= 3

"""
df = pd.read_sql_query(query, engine)
df
Out[21]:
genre_name track_name track_invoice_count genre_track_rank
0 Alternative Until We Fall 1 1
1 Alternative Sound of a Gun 1 1
2 Alternative Slowness 1 1
3 Alternative Show Me How to Live (Live at the Quart Festival) 1 1
4 Alternative Shape of Things to Come 1 1
... ... ... ... ...
1321 World Din Din Wo (Little Child) 1 1
1322 World Demorou! 1 1
1323 World Cafezinho 1 1
1324 World Aos Leões 1 1
1325 World A Moça e a Chuva 1 1

1326 rows × 4 columns

In [ ]: