Python Group By: Usage & Pandas Examples

2 years ago

William Carter

2 minutes

In Python, group by is a operation used to group a dataset by a specific column. It is typically used in conjunction with aggregate functions (such as sum, count, avg, etc.) to perform calculations on each group.

To use group by, you can utilize the DataFrame object in the pandas library to manipulate data. Here is an example:

import pandas as pd

# 创建一个示例数据集
data = {'Name': ['John', 'Mike', 'Sarah', 'John', 'Mike'],
        'Age': [25, 30, 28, 25, 30],
        'City': ['New York', 'Chicago', 'Los Angeles', 'New York', 'Chicago'],
        'Salary': [50000, 60000, 55000, 50000, 55000]}

df = pd.DataFrame(data)

# 按照Name列进行分组，并计算每个组的平均薪资
grouped = df.groupby('Name')['Salary'].mean()

print(grouped)

The output result is:

Name
John     50000.0
Mike     57500.0
Sarah    55000.0
Name: Salary, dtype: float64

In the above example, we used group by to group the data by the Name column and calculate the average salary for each group. Each group in the results is indexed by the group name (unique values in the Name column) and displays the corresponding average salary for that group.

You can also group multiple columns, as shown in the following example:

grouped = df.groupby(['Name', 'City'])['Salary'].sum()

print(grouped)

The output result is:

Name   City       
John   New York      100000
Mike   Chicago       115000
Sarah  Los Angeles    55000
Name: Salary, dtype: int64

In this example, we group by the Name and City columns and calculate the total salary for each group. Each group in the result is indexed by the group name, which is a unique combination of Name and City, and displays the corresponding total salary for that group.

#aggregate functions #Data Analysis #GROUP BY #pandas #Python