GROUP_CONCAT in Hive: Usage Guide
In Hive, you can use the GROUP_CONCAT function to concatenate the values of grouped data into a single string.
The syntax for GROUP_CONCAT is as follows:
GROUP_CONCAT(expr [, sep])
expr indicates the expression to be combined, which can be a column name, constant, or a more complex expression. sep is an optional parameter used to specify the separator string for the combination, with a default of a comma (,).
Here is an example demonstrating how to use the GROUP_CONCAT function in Hive.
Suppose there is a table named “students” that contains the following data:
+------+--------+
| name | course |
+------+--------+
| John | Math |
| John | Science|
| John | English|
| Mary | Math |
| Mary | Science|
| Alice| Math |
+------+--------+
You can utilize the following query to use the GROUP_CONCAT function:
SELECT name, GROUP_CONCAT(course) AS courses
FROM students
GROUP BY name;
This will result in the following:
+------+----------------------+
| name | courses |
+------+----------------------+
| John | Math,Science,English |
| Mary | Math,Science |
| Alice| Math |
+------+----------------------+
In this example, the GROUP_CONCAT function combines each name’s course into a comma-separated string and returns the result as a column called courses.