Top results

Get the top X results within each category.

While a plain row number just adds a number for each row, here we assign a row number with a category.

Set up a rank value:

ROW_NUMBER() OVER (
    PARTITION BY my_category
    ORDER BY my_metric DESC
) AS rank

So within each value for my_category, you’d have numbering from 1 to the last item.

Then filter to only the top items in that category.

WHERE rank <= X

Here my_category is the grouping variable.

Note that you will get an error if using ROW_NUMBER outside of a window function, so use a window function as below. i.e. Do a select on the inner select and apply the WHERE clause on the outside.

SELECT
    my_category, 
    my_label,
    my_metric
FROM (
    SELECT
        my_category,
        my_label,
        my_metric,
        ROW_NUMBER() OVER (
            PARTITION BY
                my_category
            ORDER BY my_metric DESC
        ) AS rank
    FROM my_table
)
WHERE rank <= 10

Or use aggregation, like here with GROUP BY.

SELECT
    `Department`,
    `Year`,
    COUNT(*) AS `Count`,
    ROW_NUMBER() OVER (
    PARTITION BY `Year`
        ORDER BY COUNT(*) DESC
    ) AS rank
FROM my_table
GROUP BY
    `Department`,
    `Year`

But, if you try and get top 10 for each, you’ll have to add one of these, you’ll get an error - “misuse of aliased window function rank”

WHERE rank <= 10

HAVING rank <= 10

So then you’ll have to use a window function approach like in the earlier example above, just with GROUP BY added.

Examples

Geo

If you wanted the top 5 biggest cities in each country, you’d get output like:

Country	City	Population	Rank
ABC	abc	1.6M	1
ABC	xyz	1.3M	2
ABC	def	1.1M	3
DEF	ghi	2M	1
DEF	mno	0.5M	2
DEF	def	0.2M	3

Departments

Here we add up the spending by each department per year and then get the top 10 values (and names) in each year.

Using * here means that the rank column is in the output, but you can remove that if you want.

Note the partition is just on year, so we get the top 10 values for each year and their corresponding name.

WITH by_dept_year AS (
  SELECT
    STRFTIME('%Y', `date`) AS year,
    `Department` AS name,
    CAST(
      REPLACE(`Spending`, ",", "")
      AS INT
    ) AS value
  FROM records
),

aggregated AS (
  SELECT
    year,
    name,
    SUM(value) AS value
  FROM by_dept_year
  GROUP BY
    year,
    name
  ORDER BY
    year ASC,
    value DESC
)

SELECT *
FROM (
  SELECT
    year,
    name,
    value,
    ROW_NUMBER() OVER (
      PARTITION BY year
      ORDER BY value DESC
    ) AS rank
  FROM aggregated
)
WHERE rank <= 10