Python Group By: Usage & Pandas Examples
In Python, group by is a operation used to group a dataset by a specific column. It is typically used in conjunction with aggregate functions (such as sum, count, avg, etc.) to perform calculations on each group.
To use group by, you can utilize the DataFrame object in the pandas library to manipulate data. Here is an example:
import pandas as pd
# 创建一个示例数据集
data = {'Name': ['John', 'Mike', 'Sarah', 'John', 'Mike'],
'Age': [25, 30, 28, 25, 30],
'City': ['New York', 'Chicago', 'Los Angeles', 'New York', 'Chicago'],
'Salary': [50000, 60000, 55000, 50000, 55000]}
df = pd.DataFrame(data)
# 按照Name列进行分组,并计算每个组的平均薪资
grouped = df.groupby('Name')['Salary'].mean()
print(grouped)
The output result is:
Name
John 50000.0
Mike 57500.0
Sarah 55000.0
Name: Salary, dtype: float64
In the above example, we used group by to group the data by the Name column and calculate the average salary for each group. Each group in the results is indexed by the group name (unique values in the Name column) and displays the corresponding average salary for that group.
You can also group multiple columns, as shown in the following example:
grouped = df.groupby(['Name', 'City'])['Salary'].sum()
print(grouped)
The output result is:
Name City
John New York 100000
Mike Chicago 115000
Sarah Los Angeles 55000
Name: Salary, dtype: int64
In this example, we group by the Name and City columns and calculate the total salary for each group. Each group in the result is indexed by the group name, which is a unique combination of Name and City, and displays the corresponding total salary for that group.