天天看点

dataframe 合并_pandas的合并(merge)操作

dataframe 合并_pandas的合并(merge)操作
dataframe 合并_pandas的合并(merge)操作
dataframe 合并_pandas的合并(merge)操作

merge与concat的区别在于,merge需要依据某一共同的行或列来进行合并

使用pd.merge()合并时,会自动根据两者相同column名称的那一列,作为key来进行合并。

注意每一列元素的顺序不要求一致

1.一对一合并

import numpy as npimport pandas as pdfrom pandas import Series,DataFramedf1 = DataFrame({'employee':['Bob','Jake','Lisa'], 'group':['Accounting','Engineering','Engineering'], })df2 = DataFrame({'employee':['Lisa','Bob','Jake'], 'hire_date':[2004,2008,2012], })display(df1,df2)输出: employeegroup0Bob Accounting1Jake Engineering2Lisa Engineering employee hire_date 0Lisa 2004 1Bob 2008 2Jake 2012 df3 = pd.merge(df1,df2) df3 输出: employeegrouphire_date0 BobAccounting 20081 JakeEngineering 20122 LisaEngineering 2004
           

多对一合并

df3 = DataFrame({ 'employee':['Lisa','Jake'], 'group':['Accounting','Engineering'], 'hire_date':[2004,2016]})df4 = DataFrame({'group':['Accounting','Engineering','Engineering'], 'supervisor':['Carly','Guido','Steve'] })display(df3,df4,pd.merge(df3,df4))输出:employeegrouphire_date0LisaAccounting20042 JakeEngineering2016groupsupervisor0AccountingCarly1EngineeringGuido2EngineeringSteveemployeegrouphire_datesupervisor0LisaAccounting2004Carly1JakeEngineering2016Guido2JakeEngineering2016Steve
           

多对多合并

df1 = DataFrame({'employee':['Bob','Jake','Lisa'], 'group':['Accounting','Engineering','Engineering']})df5 = DataFrame({'group':['Engineering','Engineering','HR'], 'supervisor':['Carly','Guido','Steve'] })display(df1,df5,pd.merge(df1,df5))#多对多display(pd.concat([df1,df5]))输出:employeegroup0BobAccounting1JakeEngineering2LisaEngineeringgroupsupervisor0EngineeringCarly1EngineeringGuido2HRSteveemployeegroupsupervisor0JakeEngineeringCarly1JakeEngineeringGuido2LisaEngineeringCarly3LisaEngineeringGuido employeegroupsupervisor0BobAccountingNaN1JakeEngineeringNaN2LisaEngineeringNaN0NaNEngineeringCarly1NaNEngineeringGuido2NaNHRSteve
           

(1)key的规范化

使用on=显式指定哪一列为key

df1 = DataFrame({'employee':['Jack',"Summer