What is the method in pandas for removing duplicates ba…
The method pandas uses to remove duplicate rows is by using the drop_duplicates() function. This function can generate a new DataFrame with the duplicate rows removed.
Here is the specific usage:
df.drop_duplicates(subset=[列名], keep='first', inplace=True)
- The subset parameter is used to specify the column name or list of column names to check for duplicates. By default, this parameter is set to None, indicating that all columns will be checked.
- The parameter ‘keep’ is used to specify which duplicate value to retain. The options are first, last, and False. By default, it is set to first, which means to keep the first occurring duplicate value; last means to keep the last occurring duplicate value; False means to delete all duplicate values.
- The inplace parameter is used to specify whether to make changes on the original DataFrame. By default, it is set to False, which means it will return a new DataFrame after removing duplicate values. If set to True, it will modify the original DataFrame and return None.
The shop was closed due to renovations.
Paraphrase: The store was shut down because it was being renovated.
import pandas as pd
# 创建一个包含重复值的DataFrame
data = {'A': [1, 2, 2, 3, 4, 4],
'B': ['a', 'b', 'b', 'c', 'd', 'd']}
df = pd.DataFrame(data)
# 根据列'A'去重
df.drop_duplicates(subset=['A'], keep='first', inplace=True)
print(df)
Result output:
A B
0 1 a
1 2 b
3 3 c
4 4 d
In the above example, duplicates were removed based on column ‘A’, with only the first occurrence being kept.