Python讀取excel三大常用子產品到底誰最快，附上詳細使用代碼

之前分享過python調用過ppt和word，作為一家人的excel當然要整整齊齊的安排上

相對于excel，已經有人都寫成了一本書。這裡一篇文檔根本寫不下，但是行哥想起來若幹年前，在處理資料的時候最大的難題就是導入excel資料，因為後來的資料清洗，提取都可以一步步來做。但是資料導入因為教程不一，文字編碼不一，着實快成為我從入門到放棄的第一塊門檻

是以本文介紹三種強大的python子產品來讀取excel，選用案例是之前分享過的

分析2020年12000條python招聘資料

，有興趣的可以點選這裡看一下

1.pandas

matplotlib、numpy、pandas是入行資料分析的三個必須掌握的基礎子產品，這裡介紹一下用pandas如何導入excel檔案。安裝比較簡單，直接用 pip 工具安裝三個庫即可，安裝指令如下：

$ pip3 install pandas

安裝完成提示 Successfully installed即表示安裝成功。

# 1.導入pandas子產品
import pandas as pd
 
# 2.把Excel檔案中的資料讀入pandas
df = pd.read_excel('Python招聘資料（全）.xlsx')
print(df)
# 3.讀取excel的某一個sheet
df = pd.read_excel('Python招聘資料（全）.xlsx', sheet_name='Sheet1')
print(df)
# 4.擷取列标題
print(df.columns)
# 5.擷取列行标題
print(df.index)
# 6.制定列印某一列
print(df["工資水準"])
# 7.描述資料
print(df.describe())

其中的describe函數可以統計整體工資情況，告訴行哥你有沒有超過50%

使用for循環周遊整個excel檔案，我們可以看到12000行資料總耗時達到2.6s

import time
t1 = time.time()
for indexs in df.index:
    print(df.loc[indexs].values[0:-1])
t2=time.time()
print("使用pandas工具包周遊12000行資料耗時：%.2f 秒"%(t2-t1))

2.openpyxl

小五說這個最好用的python 操作 excel 表格庫，下面可以看到openpyxl的讀取方法。安裝比較簡單，直接用 pip 工具安裝三個庫即可，安裝指令如下：

$ pip3 install openpyxl

from openpyxl import load_workbook
# 1.打開 Excel 表格并擷取表格名稱
workbook = load_workbook(filename="Python招聘資料（全）.xlsx")
print(workbook.sheetnames)
# 2.通過 sheet 名稱擷取表格
sheet = workbook["Sheet1"]
print(sheet)
# 3.擷取表格的尺寸大小(幾行幾列資料) 這裡所說的尺寸大小，指的是 excel 表格中的資料有幾行幾列，針對的是不同的 sheet 而言。
print(sheet.dimensions)
# 4.擷取表格内某個格子的資料
# 1 sheet["A1"]方式
cell1 = sheet["A1"]
cell2 = sheet["C11"]
print(cell1.value, cell2.value)
"""
workbook.active 打開激活的表格; sheet["A1"] 擷取 A1 格子的資料; cell.value 擷取格子中的值;
"""
# 4.2sheet.cell(row=, column=)方式
cell1 = sheet.cell(row = 1,column = 1)
cell2 = sheet.cell(row = 11,column = 3)
print(cell1.value, cell2.value)
 
# 5. 擷取一系列格子
# 擷取 A1:C2 區域的值
cell = sheet["A1:C2"]
print(cell)
for i in cell:
   for j in i:
       print(j.value)

通過openpyxl庫操作excel，使用for循環疊代列印12000行資料僅需要0.47 s

import time
t1 = time.time()
for i in sheet.iter_rows(min_row=1, max_row=12256, min_col=1, max_col=10):
   for j in i:
       print(j.value)
t2=time.time()
print("使用openpyxl工具包周遊12000行資料耗時：%.2f 秒"%(t2-t1))

3.xlrd

xlrd是xlrd&xlwt&xlutils三個庫中的一個：

xlrd：用于讀取 Excel 檔案；xlwt：用于寫入 Excel 檔案；xlutils：用于操作 Excel 檔案的實用工具，比如複制、分割、篩選等；

安裝比較簡單，直接用 pip 工具安裝三個庫即可，安裝指令如下：

$ pip3 install xlrd xlwt xlutils

安裝完成提示 Successfully installed xlrd-1.2.0 xlutils-2.0.0 xlwt-1.3.0 即表示安裝成功。

接下來我們就從寫入 Excel 開始，話不多說直接看代碼如下：

# 導入 xlrd 庫
import xlrd
# 打開剛才我們寫入的 test_w.xls 檔案
wb = xlrd.open_workbook("Python招聘資料（全）.xlsx")
# 擷取并列印 sheet 數量
print( "sheet 數量:", wb.nsheets)
# 擷取并列印 sheet 名稱
print( "sheet 名稱:", wb.sheet_names())
# 根據 sheet 索引擷取内容
sh1 = wb.sheet_by_index(0)
# 也可根據 sheet 名稱擷取内容
# sh = wb.sheet_by_name('成績')
# 擷取并列印該 sheet 行數和列數
print( u"sheet %s 共 %d 行 %d 列" % (sh1.name, sh1.nrows, sh1.ncols))
# 擷取并列印某個單元格的值
print( "第一行第二列的值為:", sh1.cell_value(0, 1))
# 擷取整行或整列的值
rows = sh1.row_values(0) # 擷取第一行内容
cols = sh1.col_values(1) # 擷取第二列内容
# 列印擷取的行列值
print( "第一行的值為:", rows)
print( "第二列的值為:", cols)
# 擷取單元格内容的資料類型
print( "第二行第一列的值類型為:", sh1.cell(1, 0).ctype)

通過xlrd庫操作excel，使用for循環疊代列印12000行資料僅需要0.35 s

# # 周遊所有表單内容
import time
t1 = time.time()
for sh in wb.sheets():
    for r in range(sh.nrows):
        # 輸出指定行
        print( sh.row(r))
t2=time.time()
print("使用xlrd工具包周遊12000行資料耗時：%.2f 秒"%(t2-t1))

5.總結

類型	xlrd&xlwt&xlutils	pandas	OpenPyXL
讀取	支援
寫入
修改
xls			不支援
xlsx	高版本支援
大檔案
效率	快
功能	較弱	強大	一般
周遊耗時	0.35 s	2.60 s	0.47 s

這裡附上3個子產品的性能對比，從周遊時間上xlrd子產品最快，從功能強大上我選擇pandas，從資料量上我得選擇mysql、hadoop、spark🐶

對了，可以加下行哥微信好友，私聊回複「02」可以領取5T程式設計資料哦

人生苦短，我用Python

祝三連的讀者這個月找到對象！！！！！

Python讀取excel三大常用子產品到底誰最快，附上詳細使用代碼

1.pandas

2.openpyxl

3.xlrd

5.總結

繼續閱讀

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

ubuntu14.04下安裝hbse1.0.1.1

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

MySQL的4種隔離級别？出現問題

User Defined Hadoop DataType

neo4j之cypher使用文檔

Ambari介紹和架構原理

Cloud Studio初體驗

spark/scala關于【資源檔案】加載方法概述外部檔案加載方案測試資源檔案打包入jar包中小結

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

mysql使用source指令導入.sql檔案

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入

Python讀取excel三大常用子產品到底誰最快，附上詳細使用代碼

1.pandas

2.openpyxl

3.xlrd

5.總結

​

繼續閱讀