How is the rank function used in Hive?

2 years ago

Benjamin Taylor

1 minute

The RANK() function in Hive is used to determine the ranking of each row in the result set. It sorts the results based on specified criteria and assigns a ranking value to each row. Rows with the same sorting value will receive the same ranking, with the next ranking skipping corresponding positions.

The syntax of the RANK() function is as follows:

RANK() OVER (
    [ PARTITION BY col1, col2, ... ]
    ORDER BY col3 [ASC|DESC]
)

The PARTITION BY clause, which is optional, is used to specify the columns for grouping. If not specified, the entire result set will be grouped. The ORDER BY clause is used to specify the columns for sorting and the order of sorting.

Here is an example:

SELECT id, name, score,
       RANK() OVER (ORDER BY score DESC) AS rank
FROM students;

In the given examples, the RANK() function arranges the result set in descending order based on the values in the score column and assigns a ranking value to each row. The student with the highest score will be assigned ranking 1, followed by ranking 2, and so on.