Pandasのmerge() – 2つのDataFrameオブジェクトを結合

3年 ago

陽, 向宇

4 minutes

パンダのDataFrameのmerge()関数は、データベーススタイルの結合操作を用いて2つのDataFrameオブジェクトを結合するために使用されます。結合は列またはインデックスで行われます。もし結合が列で行われる場合、インデックスは無視されます。この関数は新しいDataFrameを返し、元のDataFrameオブジェクトは変更されません。

パンダのDataFrameのmerge()関数の構文

merge()関数の構文は次のようになります。

def merge(
    self,
    right,
    how="inner",
    on=None,
    left_on=None,
    right_on=None,
    left_index=False,
    right_index=False,
    sort=False,
    suffixes=("_x", "_y"),
    copy=True,
    indicator=False,
    validate=None,
)

right: The other DataFrame to merge with the source DataFrame.
how: {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘inner’. This is the most important parameter to define the merge operation type. These are similar to SQL left outer join, right outer join, full outer join, and inner join.
on: Column or index level names to join on. These columns must be present in both the DataFrames. If not provided, the intersection of the columns in both DataFrames are used.
left_on: Column or index level names to join on in the left DataFrame.
right_on: Column or index level names to join on in the right DataFrame.
left_index: Use the index from the left DataFrame as the join key(s).
right_index: Use the index from the right DataFrame as the join key.
sort: Sort the join keys lexicographically in the result DataFrame.
suffixes: Suffix to apply to overlapping column names in the left and right side, respectively.
indicator: If True, adds a column to output DataFrame called “_merge” with information on the source of each row.
validate: used to validate the merge process. The valid values are {“one_to_one” or “1:1”, “one_to_many” or “1:m”, “many_to_one” or “m:1”, “many_to_many” or “m:m”}.

パンダのデータフレームのmerge()の例

DataFrameオブジェクトを結合するいくつかの例を見てみましょう。

1. デフォルト結合 – 内部結合

import pandas as pd

d1 = {'Name': ['Pankaj', 'Meghna', 'Lisa'], 'Country': ['India', 'India', 'USA'], 'Role': ['CEO', 'CTO', 'CTO']}

df1 = pd.DataFrame(d1)

print('DataFrame 1:\n', df1)

df2 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Pankaj', 'Anupam', 'Amit']})
print('DataFrame 2:\n', df2)

df_merged = df1.merge(df2)
print('Result:\n', df_merged)

出力:

DataFrame 1:
      Name Country Role
0  Pankaj   India  CEO
1  Meghna   India  CTO
2    Lisa     USA  CTO
DataFrame 2:
    ID    Name
0   1  Pankaj
1   2  Anupam
2   3    Amit
Result:
      Name Country Role  ID
0  Pankaj   India  CEO   1

2. データフレームを左結合、右結合、外部結合でマージする

print('Result Left Join:\n', df1.merge(df2, how='left'))
print('Result Right Join:\n', df1.merge(df2, how='right'))
print('Result Outer Join:\n', df1.merge(df2, how='outer'))

出力：

Result Left Join:
      Name Country Role   ID
0  Pankaj   India  CEO  1.0
1  Meghna   India  CTO  NaN
2    Lisa     USA  CTO  NaN
Result Right Join:
      Name Country Role  ID
0  Pankaj   India  CEO   1
1  Anupam     NaN  NaN   2
2    Amit     NaN  NaN   3
Result Outer Join:
      Name Country Role   ID
0  Pankaj   India  CEO  1.0
1  Meghna   India  CTO  NaN
2    Lisa     USA  CTO  NaN
3  Anupam     NaN  NaN  2.0
4    Amit     NaN  NaN  3.0

3. 特定の列を基準にデータフレームを結合します。

import pandas as pd

d1 = {'Name': ['Pankaj', 'Meghna', 'Lisa'], 'ID': [1, 2, 3], 'Country': ['India', 'India', 'USA'],
      'Role': ['CEO', 'CTO', 'CTO']}
df1 = pd.DataFrame(d1)

df2 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Pankaj', 'Anupam', 'Amit']})

print(df1.merge(df2, on='ID'))
print(df1.merge(df2, on='Name'))

出力:

   Name_x  ID Country Role  Name_y
0  Pankaj   1   India  CEO  Pankaj
1  Meghna   2   India  CTO  Anupam
2    Lisa   3     USA  CTO    Amit

     Name  ID_x Country Role  ID_y
0  Pankaj     1   India  CEO     1

４．データフレームオブジェクトのマージのために、左側と右側の列を指定してください。

import pandas as pd

d1 = {'Name': ['Pankaj', 'Meghna', 'Lisa'], 'ID1': [1, 2, 3], 'Country': ['India', 'India', 'USA'],
      'Role': ['CEO', 'CTO', 'CTO']}
df1 = pd.DataFrame(d1)

df2 = pd.DataFrame({'ID2': [1, 2, 3], 'Name': ['Pankaj', 'Anupam', 'Amit']})

print(df1.merge(df2))

print(df1.merge(df2, left_on='ID1', right_on='ID2'))

出力；

     Name  ID1 Country Role  ID2
0  Pankaj    1   India  CEO    1

   Name_x  ID1 Country Role  ID2  Name_y
0  Pankaj    1   India  CEO    1  Pankaj
1  Meghna    2   India  CTO    2  Anupam
2    Lisa    3     USA  CTO    3    Amit

データフレームをマージするための結合キーとして、インデックスを使用する。

import pandas as pd

d1 = {'Name': ['Pankaj', 'Meghna', 'Lisa'], 'Country': ['India', 'India', 'USA'], 'Role': ['CEO', 'CTO', 'CTO']}
df1 = pd.DataFrame(d1)

df2 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Pankaj', 'Anupam', 'Amit']})

df_merged = df1.merge(df2)
print('Result Default Merge:\n', df_merged)

df_merged = df1.merge(df2, left_index=True, right_index=True)
print('\nResult Index Merge:\n', df_merged)

出力：

Result Default Merge:
      Name Country Role  ID
0  Pankaj   India  CEO   1

Result Index Merge:
    Name_x Country Role  ID  Name_y
0  Pankaj   India  CEO   1  Pankaj
1  Meghna   India  CTO   2  Anupam
2    Lisa     USA  CTO   3    Amit

「References」のネイティブな日本語での表現は以下の通りです（一つのオプションのみ提供）：参考文献

Python Pandas Module Tutorial
DataFrame merge() API Doc