天天看點

windows離線安裝python3爬蟲環境

目錄

一、離線安裝python3.6.8

二、依賴離線子產品下載下傳

三、爬蟲離線子產品安裝

四、浏覽器驅動下載下傳安裝

五、驗證版本和依賴

python版本下載下傳位址1:https://www.python.org/downloads/

python版本下載下傳位址2:https://www.python.org/ftp/python/3.6.8/

windows安裝版:python-3.6.8-amd64.exe

windows綠色版:python-3.6.8-embed-amd64.zip

windows編譯版:Python-3.6.8.tgz

windows離線安裝python3爬蟲環境
windows離線安裝python3爬蟲環境
windows離線安裝python3爬蟲環境

python3.6依賴子產品搜尋位址:https://pypi.org/search/?c=Programming+Language+%3A%3A+Python+%3A%3A+3.6

python擴充包鏡像網:https://www.lfd.uci.edu/~gohlke/pythonlibs/

selenium 中文文檔:https://python-selenium-zh.readthedocs.io/zh_CN/latest/

python爬蟲依賴子產品位址

功能

子產品

官方位址

安裝包連結

pip依賴

setuptools

https://pypi.org/project/setuptools/

setuptools-51.0.0-py3-none-any.whl 

子產品安裝工具

pip

https://pypi.org/project/pip/

pip-20.3.3-py2.py3-none-any.whl 

requests依賴庫

certifi

https://pypi.org/project/certifi/

certifi-2020.12.5-py2.py3-none-any.whl 

chardet

https://pypi.org/project/chardet/

chardet-4.0.0-py2.py3-none-any.whl 

idna

https://pypi.org/project/idna/

idna-2.10-py2.py3-none-any.whl 

urllib3

https://pypi.org/project/urllib3/

urllib3-1.26.2-py2.py3-none-any.whl 

http庫

requests

https://pypi.org/project/requests/

requests-2.25.1-py2.py3-none-any.whl 

xml解析庫

lxml

https://pypi.org/project/lxml/

lxml-4.6.2-cp36-cp36m-win_amd64.whl 

浏覽器自動化架構

selenium

https://pypi.org/project/selenium/

selenium-3.141.0-py2.py3-none-any.whl 

文字識别庫

pytesseract

https://pypi.org/project/pytesseract/

pytesseract-0.3.7.tar.gz 

tesserocr依賴庫

tesseract

https://pypi.org/project/tesseract/

tesseract-0.1.3.tar.gz 

圖像識别庫

tesserocr

https://pypi.org/project/tesserocr/     

https://github.com/simonflueckiger/tesserocr-windows_build/releases

tesserocr-2.5.1.tar.gz

tesserocr-2.4.0-cp36-cp36m-win_amd64.whl

文字識别

tesseract-ocr

https://digi.bib.uni-mannheim.de/tesseract/

tesseract-ocr-w64-setup-v4.0.0.20181030.exe

矩陣數組計算庫

numpy

https://pypi.org/project/numpy/

numpy-1.19.4-cp36-cp36m-win_amd64.whl 

計算機視覺庫

opencv-python

https://pypi.org/project/opencv-python/

opencv_python-4.4.0.46-cp36-cp36m-win_amd64.whl 

1、whl依賴包離線安裝 

2、tar.gz依賴包離線安裝

解壓之後 cd 進入目錄執行 

3、tesseract-ocr安裝

Python tesserocr的安裝教程:https://jingyan.baidu.com/article/6b18230972e3e6fb59e15909.html

(1)安裝時選擇多語言資料下載下傳

windows離線安裝python3爬蟲環境

(2)将 Tesseract-OCR 添加到環境變量

(3)安裝成功之後需要将 Tesseract-OCR 根目錄下的 tessdata 檔案夾複制到 Python 根目錄下,否則會出現報錯

(4)指定變量 tesseract_cmd 為 安裝的 tesseract.exe 檔案

selenium webdriver download

模拟浏覽器

檢視版本

鏡像位址

驅動下載下傳 

谷歌浏覽器

chrome://version/

http://chromedriver.storage.googleapis.com/index.html    http://npm.taobao.org/mirrors/chromedriver

chromedriver_win32.zip

火狐浏覽器

about:support

https://npm.taobao.org/mirrors/geckodriver

https://github.com/mozilla/geckodriver/releases

geckodriver-v0.26.0-win64.zip

微軟浏覽器

edge://version/

https://developer.microsoft.com/en-us/microsoft-edge/tools/webdriver/

edgedriver_win64.zip

opera浏覽器

https://github.com/operasoftware/operachromiumdriver/releases

operadriver_win64.zip

IE浏覽器

設定 - 關于IE

http://selenium-release.storage.googleapis.com/index.html

IEDriverServer_x64_3.9.0.zip

PhantomJS

https://phantomjs.org/download.html   https://bitbucket.org/ariya/phantomjs/downloads

phantomjs-2.1.1-windows.zip