天天看点

windows离线安装python3爬虫环境

目录

一、离线安装python3.6.8

二、依赖离线模块下载

三、爬虫离线模块安装

四、浏览器驱动下载安装

五、验证版本和依赖

python版本下载地址1:https://www.python.org/downloads/

python版本下载地址2:https://www.python.org/ftp/python/3.6.8/

windows安装版:python-3.6.8-amd64.exe

windows绿色版:python-3.6.8-embed-amd64.zip

windows编译版:Python-3.6.8.tgz

windows离线安装python3爬虫环境
windows离线安装python3爬虫环境
windows离线安装python3爬虫环境

python3.6依赖模块搜索地址:https://pypi.org/search/?c=Programming+Language+%3A%3A+Python+%3A%3A+3.6

python扩展包镜像网:https://www.lfd.uci.edu/~gohlke/pythonlibs/

selenium 中文文档:https://python-selenium-zh.readthedocs.io/zh_CN/latest/

python爬虫依赖模块地址

功能

模块

官方地址

安装包链接

pip依赖

setuptools

https://pypi.org/project/setuptools/

setuptools-51.0.0-py3-none-any.whl 

模块安装工具

pip

https://pypi.org/project/pip/

pip-20.3.3-py2.py3-none-any.whl 

requests依赖库

certifi

https://pypi.org/project/certifi/

certifi-2020.12.5-py2.py3-none-any.whl 

chardet

https://pypi.org/project/chardet/

chardet-4.0.0-py2.py3-none-any.whl 

idna

https://pypi.org/project/idna/

idna-2.10-py2.py3-none-any.whl 

urllib3

https://pypi.org/project/urllib3/

urllib3-1.26.2-py2.py3-none-any.whl 

http库

requests

https://pypi.org/project/requests/

requests-2.25.1-py2.py3-none-any.whl 

xml解析库

lxml

https://pypi.org/project/lxml/

lxml-4.6.2-cp36-cp36m-win_amd64.whl 

浏览器自动化框架

selenium

https://pypi.org/project/selenium/

selenium-3.141.0-py2.py3-none-any.whl 

文字识别库

pytesseract

https://pypi.org/project/pytesseract/

pytesseract-0.3.7.tar.gz 

tesserocr依赖库

tesseract

https://pypi.org/project/tesseract/

tesseract-0.1.3.tar.gz 

图像识别库

tesserocr

https://pypi.org/project/tesserocr/     

https://github.com/simonflueckiger/tesserocr-windows_build/releases

tesserocr-2.5.1.tar.gz

tesserocr-2.4.0-cp36-cp36m-win_amd64.whl

文字识别

tesseract-ocr

https://digi.bib.uni-mannheim.de/tesseract/

tesseract-ocr-w64-setup-v4.0.0.20181030.exe

矩阵数组计算库

numpy

https://pypi.org/project/numpy/

numpy-1.19.4-cp36-cp36m-win_amd64.whl 

计算机视觉库

opencv-python

https://pypi.org/project/opencv-python/

opencv_python-4.4.0.46-cp36-cp36m-win_amd64.whl 

1、whl依赖包离线安装 

2、tar.gz依赖包离线安装

解压之后 cd 进入目录执行 

3、tesseract-ocr安装

Python tesserocr的安装教程:https://jingyan.baidu.com/article/6b18230972e3e6fb59e15909.html

(1)安装时选择多语言数据下载

windows离线安装python3爬虫环境

(2)将 Tesseract-OCR 添加到环境变量

(3)安装成功之后需要将 Tesseract-OCR 根目录下的 tessdata 文件夹复制到 Python 根目录下,否则会出现报错

(4)指定变量 tesseract_cmd 为 安装的 tesseract.exe 文件

selenium webdriver download

模拟浏览器

查看版本

镜像地址

驱动下载 

谷歌浏览器

chrome://version/

http://chromedriver.storage.googleapis.com/index.html    http://npm.taobao.org/mirrors/chromedriver

chromedriver_win32.zip

火狐浏览器

about:support

https://npm.taobao.org/mirrors/geckodriver

https://github.com/mozilla/geckodriver/releases

geckodriver-v0.26.0-win64.zip

微软浏览器

edge://version/

https://developer.microsoft.com/en-us/microsoft-edge/tools/webdriver/

edgedriver_win64.zip

opera浏览器

https://github.com/operasoftware/operachromiumdriver/releases

operadriver_win64.zip

IE浏览器

设置 - 关于IE

http://selenium-release.storage.googleapis.com/index.html

IEDriverServer_x64_3.9.0.zip

PhantomJS

https://phantomjs.org/download.html   https://bitbucket.org/ariya/phantomjs/downloads

phantomjs-2.1.1-windows.zip