Top results
Get the top X results within each category.
While a plain row number just adds a number for each row, here we assign a row number with a category.
Set up a rank value:
ROW_NUMBER() OVER (
PARTITION BY my_category
ORDER BY my_metric DESC
) AS rank
So within each value for my_category
, you’d have numbering from 1 to the last item.
Then filter to only the top items in that category.
WHERE rank <= X
Here my_category
is the grouping variable.
Note that you will get an error if using ROW_NUMBER
outside of a window function, so use a window function as below. i.e. Do a select on the inner select and apply the WHERE
clause on the outside.
SELECT
my_category,
my_label,
my_metric
FROM (
SELECT
my_category,
my_label,
my_metric,
ROW_NUMBER() OVER (
PARTITION BY
my_category
ORDER BY my_metric DESC
) AS rank
FROM my_table
)
WHERE rank <= 10
Or use aggregation, like here with GROUP BY
.
SELECT
`Department`,
`Year`,
COUNT(*) AS `Count`,
ROW_NUMBER() OVER (
PARTITION BY `Year`
ORDER BY COUNT(*) DESC
) AS rank
FROM my_table
GROUP BY
`Department`,
`Year`
But, if you try and get top 10 for each, you’ll have to add one of these, you’ll get an error - “misuse of aliased window function rank”
WHERE rank <= 10
HAVING rank <= 10
So then you’ll have to use a window function approach like in the earlier example above, just with GROUP BY
added.
Examples
Geo
If you wanted the top 5 biggest cities in each country, you’d get output like:
Country | City | Population | Rank |
---|---|---|---|
ABC | abc | 1.6M | 1 |
ABC | xyz | 1.3M | 2 |
ABC | def | 1.1M | 3 |
DEF | ghi | 2M | 1 |
DEF | mno | 0.5M | 2 |
DEF | def | 0.2M | 3 |
Departments
Here we add up the spending by each department per year and then get the top 10 values (and names) in each year.
Using *
here means that the rank column is in the output, but you can remove that if you want.
Note the partition is just on year
, so we get the top 10 values for each year and their corresponding name.
WITH by_dept_year AS (
SELECT
STRFTIME('%Y', `date`) AS year,
`Department` AS name,
CAST(
REPLACE(`Spending`, ",", "")
AS INT
) AS value
FROM records
),
aggregated AS (
SELECT
year,
name,
SUM(value) AS value
FROM by_dept_year
GROUP BY
year,
name
ORDER BY
year ASC,
value DESC
)
SELECT *
FROM (
SELECT
year,
name,
value,
ROW_NUMBER() OVER (
PARTITION BY year
ORDER BY value DESC
) AS rank
FROM aggregated
)
WHERE rank <= 10