Pandas merge() – 合并两个DataFrame对象

Pandas的merge()函数用于执行两个DataFrame对象的数据库样式连接操作。连接可以根据列或索引进行。若通过列进行连接,则忽略索引。该函数返回一个新的DataFrame对象,原始的DataFrame对象保持不变。

Pandas 的 DataFrame merge() 函数语法。

merge()函数语法如下:

def merge(
    self,
    right,
    how="inner",
    on=None,
    left_on=None,
    right_on=None,
    left_index=False,
    right_index=False,
    sort=False,
    suffixes=("_x", "_y"),
    copy=True,
    indicator=False,
    validate=None,
)
  • right: The other DataFrame to merge with the source DataFrame.
  • how: {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘inner’. This is the most important parameter to define the merge operation type. These are similar to SQL left outer join, right outer join, full outer join, and inner join.
  • on: Column or index level names to join on. These columns must be present in both the DataFrames. If not provided, the intersection of the columns in both DataFrames are used.
  • left_on: Column or index level names to join on in the left DataFrame.
  • right_on: Column or index level names to join on in the right DataFrame.
  • left_index: Use the index from the left DataFrame as the join key(s).
  • right_index: Use the index from the right DataFrame as the join key.
  • sort: Sort the join keys lexicographically in the result DataFrame.
  • suffixes: Suffix to apply to overlapping column names in the left and right side, respectively.
  • indicator: If True, adds a column to output DataFrame called “_merge” with information on the source of each row.
  • validate: used to validate the merge process. The valid values are {“one_to_one” or “1:1”, “one_to_many” or “1:m”, “many_to_one” or “m:1”, “many_to_many” or “m:m”}.

Pandas DataFrame merge() 的示例

让我们来看一些合并两个DataFrame对象的例子。

默认合并 – 内连接

import pandas as pd

d1 = {'Name': ['Pankaj', 'Meghna', 'Lisa'], 'Country': ['India', 'India', 'USA'], 'Role': ['CEO', 'CTO', 'CTO']}

df1 = pd.DataFrame(d1)

print('DataFrame 1:\n', df1)

df2 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Pankaj', 'Anupam', 'Amit']})
print('DataFrame 2:\n', df2)

df_merged = df1.merge(df2)
print('Result:\n', df_merged)

结果:

DataFrame 1:
      Name Country Role
0  Pankaj   India  CEO
1  Meghna   India  CTO
2    Lisa     USA  CTO
DataFrame 2:
    ID    Name
0   1  Pankaj
1   2  Anupam
2   3    Amit
Result:
      Name Country Role  ID
0  Pankaj   India  CEO   1

2. 使用左连接、右连接和外连接合并DataFrame。

print('Result Left Join:\n', df1.merge(df2, how='left'))
print('Result Right Join:\n', df1.merge(df2, how='right'))
print('Result Outer Join:\n', df1.merge(df2, how='outer'))

输出:

Result Left Join:
      Name Country Role   ID
0  Pankaj   India  CEO  1.0
1  Meghna   India  CTO  NaN
2    Lisa     USA  CTO  NaN
Result Right Join:
      Name Country Role  ID
0  Pankaj   India  CEO   1
1  Anupam     NaN  NaN   2
2    Amit     NaN  NaN   3
Result Outer Join:
      Name Country Role   ID
0  Pankaj   India  CEO  1.0
1  Meghna   India  CTO  NaN
2    Lisa     USA  CTO  NaN
3  Anupam     NaN  NaN  2.0
4    Amit     NaN  NaN  3.0

3. 在特定列上合并DataFrame。

import pandas as pd

d1 = {'Name': ['Pankaj', 'Meghna', 'Lisa'], 'ID': [1, 2, 3], 'Country': ['India', 'India', 'USA'],
      'Role': ['CEO', 'CTO', 'CTO']}
df1 = pd.DataFrame(d1)

df2 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Pankaj', 'Anupam', 'Amit']})

print(df1.merge(df2, on='ID'))
print(df1.merge(df2, on='Name'))

结果:

   Name_x  ID Country Role  Name_y
0  Pankaj   1   India  CEO  Pankaj
1  Meghna   2   India  CTO  Anupam
2    Lisa   3     USA  CTO    Amit

     Name  ID_x Country Role  ID_y
0  Pankaj     1   India  CEO     1

4. 指定合并DataFrame对象时的左右列。

import pandas as pd

d1 = {'Name': ['Pankaj', 'Meghna', 'Lisa'], 'ID1': [1, 2, 3], 'Country': ['India', 'India', 'USA'],
      'Role': ['CEO', 'CTO', 'CTO']}
df1 = pd.DataFrame(d1)

df2 = pd.DataFrame({'ID2': [1, 2, 3], 'Name': ['Pankaj', 'Anupam', 'Amit']})

print(df1.merge(df2))

print(df1.merge(df2, left_on='ID1', right_on='ID2'))

输出;

     Name  ID1 Country Role  ID2
0  Pankaj    1   India  CEO    1

   Name_x  ID1 Country Role  ID2  Name_y
0  Pankaj    1   India  CEO    1  Pankaj
1  Meghna    2   India  CTO    2  Anupam
2    Lisa    3     USA  CTO    3    Amit

5. 使用索引作为合并数据框的连接键。

import pandas as pd

d1 = {'Name': ['Pankaj', 'Meghna', 'Lisa'], 'Country': ['India', 'India', 'USA'], 'Role': ['CEO', 'CTO', 'CTO']}
df1 = pd.DataFrame(d1)

df2 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Pankaj', 'Anupam', 'Amit']})

df_merged = df1.merge(df2)
print('Result Default Merge:\n', df_merged)

df_merged = df1.merge(df2, left_index=True, right_index=True)
print('\nResult Index Merge:\n', df_merged)

输出结果:

Result Default Merge:
      Name Country Role  ID
0  Pankaj   India  CEO   1

Result Index Merge:
    Name_x Country Role  ID  Name_y
0  Pankaj   India  CEO   1  Pankaj
1  Meghna   India  CTO   2  Anupam
2    Lisa     USA  CTO   3    Amit

参考资料

  • Python Pandas Module Tutorial
  • DataFrame merge() API Doc
广告
将在 10 秒后关闭
bannerAds