准备安装pip
[[email protected] ~]# python36 --version
Python 3.6.3
[[email protected] ~]# command -v pi
pic pidof pifconfig pinentry-curses pinentry-gtk-2 ping pinky pivot_root
piconv pidstat pinentry pinentry-gtk pinfo ping6 pitchplay
[ro[email protected] ~]# command -v pip
[[email protected] ~]# command -v pip3
查询并安装Pip。使用yum可以安装pip,缺点就是不是最新版。
[[email protected] ~]# yum list *pip*
Loaded plugins: fastestmirror, langpacks
Repodata is over 2 weeks old. Install yum-cron? Or run: yum makecache fast
Loading mirror speeds from cached hostfile
* base: mirror.bit.edu.cn
* epel: mirrors.tongji.edu.cn
* extras: mirrors.zju.edu.cn
* updates: mirror.bit.edu.cn
Installed Packages
libpipeline.x86_64 1.2.3-3.el7 @anaconda
Available Packages
aespipe.x86_64 2.4d-2.el7 epel
globus-xio-pipe-driver.x86_64 3.10-1.el7 epel
globus-xio-pipe-driver-devel.x86_64 3.10-1.el7 epel
libpipeline.i686 1.2.3-3.el7 base
libpipeline-devel.i686 1.2.3-3.el7 base
libpipeline-devel.x86_64 1.2.3-3.el7 base
nodejs-unpipe.noarch 1.0.0-2.el7 epel
pdns-backend-pipe.x86_64 3.4.11-4.el7 epel
perl-IO-Pipely.noarch 0.005-4.el7 epel
pipelight-selinux.noarch 0.1.0-2.el7 epel
python-apipkg.noarch 1.2-7.el7 epel
python-django-pipeline.noarch 1.3.27-1.el7 epel
python2-pip.noarch 8.1.2-6.el7 epel
python34-pip.noarch 8.1.2-6.el7 epel
rubygem-apipie-bindings.noarch 0.0.10-2.el7 epel
rubygem-apipie-bindings-doc.noarch 0.0.10-2.el7 epel
uwsgi-logger-pipe.x86_64 2.0.16-1.el7 epel
vanessa_socket-pipe.x86_64 0.0.12-3.el7 epel
[[email protected] ~]# yum install python34-pip.noarch
Loaded plugins: fastestmirror, langpacks
Repodata is over 2 weeks old. Install yum-cron? Or run: yum makecache fast
base | 3.6 kB 00:00:00
epel/x86_64/metalink | 6.6 kB 00:00:00
epel | 3.2 kB 00:00:00
extras | 3.4 kB 00:00:00
updates | 3.4 kB 00:00:00
zabbix | 2.9 kB 00:00:00
zabbix-non-supported | 951 B 00:00:00
(1/5): extras/7/x86_64/primary_db | 149 kB 00:00:00
epel/x86_64/primary FAILED 11% [======== ] 201 kB/s | 814 kB 00:00:29 ETA
http://mirror1.ku.ac.th/fedora/epel/7/x86_64/repodata/916333309cde50b4a977b81421d0043801ca99a6627933cc9270f48a30e61b57-primary.xml.gz: [Errno 14] HTTP Error 404 - Not Found4 kB 00:00:29 ETA
Trying other mirror.
To address this issue please refer to the below knowledge base article
https://access.redhat.com/articles/1320623
If above article doesn't help to resolve this issue please create a bug on https://bugs.centos.org/
(2/5): zabbix/x86_64/primary_db | 87 kB 00:00:03
(3/5): updates/7/x86_64/primary_db | 2.0 MB 00:00:13
(4/5): epel/x86_64/updateinfo | 930 kB 00:00:16
epel/x86_64/primary FAILED
http://ftp.kddilabs.jp/Linux/packages/fedora/epel/7/x86_64/repodata/916333309cde50b4a977b81421d0043801ca99a6627933cc9270f48a30e61b57-primary.xml.gz: [Errno 14] HTTP Error 404 - Not Found ETA
Trying other mirror.
(5/5): epel/x86_64/primary | 3.5 MB 00:00:35
Loading mirror speeds from cached hostfile
* base: mirror.bit.edu.cn
* epel: mirrors.tuna.tsinghua.edu.cn
* extras: mirrors.zju.edu.cn
* updates: mirror.bit.edu.cn
epel 12583/12583
Resolving Dependencies
--> Running transaction check
---> Package python34-pip.noarch 0:8.1.2-6.el7 will be installed
--> Processing Dependency: python(abi) = 3.4 for package: python34-pip-8.1.2-6.el7.noarch
--> Processing Dependency: python34-setuptools for package: python34-pip-8.1.2-6.el7.noarch
--> Processing Dependency: /usr/bin/python3.4 for package: python34-pip-8.1.2-6.el7.noarch
--> Running transaction check
---> Package python34.x86_64 0:3.4.8-1.el7 will be installed
--> Processing Dependency: python34-libs(x86-64) = 3.4.8-1.el7 for package: python34-3.4.8-1.el7.x86_64
--> Processing Dependency: libpython3.4m.so.1.0()(64bit) for package: python34-3.4.8-1.el7.x86_64
---> Package python34-setuptools.noarch 0:19.2-3.el7 will be installed
--> Running transaction check
---> Package python34-libs.x86_64 0:3.4.8-1.el7 will be installed
--> Finished Dependency Resolution
Dependencies Resolved
===============================================================================================================================================================================================
Package Arch Version Repository Size
===============================================================================================================================================================================================
Installing:
python34-pip noarch 8.1.2-6.el7 epel 1.7 M
Installing for dependencies:
python34 x86_64 3.4.8-1.el7 epel 51 k
python34-libs x86_64 3.4.8-1.el7 epel 8.3 M
python34-setuptools noarch 19.2-3.el7 epel 373 k
Transaction Summary
===============================================================================================================================================================================================
Install 1 Package (+3 Dependent packages)
Total download size: 10 M
Installed size: 37 M
Is this ok [y/d/N]: y
Downloading packages:
(1/4): python34-3.4.8-1.el7.x86_64.rpm | 51 kB 00:00:02
(2/4): python34-pip-8.1.2-6.el7.noarch.rpm | 1.7 MB 00:00:05
(3/4): python34-setuptools-19.2-3.el7.noarch.rpm | 373 kB 00:00:09
(4/4): python34-libs-3.4.8-1.el7.x86_64.rpm | 8.3 MB 00:03:54
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total 45 kB/s | 10 MB 00:03:54
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
Installing : python34-libs-3.4.8-1.el7.x86_64 1/4
Installing : python34-3.4.8-1.el7.x86_64 2/4
Installing : python34-setuptools-19.2-3.el7.noarch 3/4
Installing : python34-pip-8.1.2-6.el7.noarch 4/4
Verifying : python34-setuptools-19.2-3.el7.noarch 1/4
Verifying : python34-3.4.8-1.el7.x86_64 2/4
Verifying : python34-pip-8.1.2-6.el7.noarch 3/4
Verifying : python34-libs-3.4.8-1.el7.x86_64 4/4
Installed:
python34-pip.noarch 0:8.1.2-6.el7
Dependency Installed:
python34.x86_64 0:3.4.8-1.el7 python34-libs.x86_64 0:3.4.8-1.el7 python34-setuptools.noarch 0:19.2-3.el7
Complete!
安装完pip验证一下,没问题。
[[email protected] ~]# command -v pip3
/usr/bin/pip3
再安装requests库。
[[email protected] ~]# pip3 install requests
Collecting requests
Downloading https://files.pythonhosted.org/packages/65/47/7e02164a2a3db50ed6d8a6ab1d6d60b69c4c3fdf57a284257925dfc12bda/requests-2.19.1-py2.py3-none-any.whl (91kB)
100% |████████████████████████████████| 92kB 194kB/s
Collecting certifi>=2017.4.17 (from requests)
Downloading https://files.pythonhosted.org/packages/7c/e6/92ad559b7192d846975fc916b65f667c7b8c3a32bea7372340bfe9a15fa5/certifi-2018.4.16-py2.py3-none-any.whl (150kB)
100% |████████████████████████████████| 153kB 464kB/s
Collecting chardet<3.1.0,>=3.0.2 (from requests)
Downloading https://files.pythonhosted.org/packages/bc/a9/01ffebfb562e4274b6487b4bb1ddec7ca55ec7510b22e4c51f14098443b8/chardet-3.0.4-py2.py3-none-any.whl (133kB)
100% |████████████████████████████████| 143kB 864kB/s
Collecting urllib3<1.24,>=1.21.1 (from requests)
Downloading https://files.pythonhosted.org/packages/bd/c9/6fdd990019071a4a32a5e7cb78a1d92c53851ef4f56f62a3486e6a7d8ffb/urllib3-1.23-py2.py3-none-any.whl (133kB)
100% |████████████████████████████████| 143kB 943kB/s
Collecting idna<2.8,>=2.5 (from requests)
Downloading https://files.pythonhosted.org/packages/4b/2a/0276479a4b3caeb8a8c1af2f8e4355746a97fab05a372e4a2c6a6b876165/idna-2.7-py2.py3-none-any.whl (58kB)
100% |████████████████████████████████| 61kB 92kB/s
Installing collected packages: certifi, chardet, urllib3, idna, requests
Successfully installed certifi-2018.4.16 chardet-3.0.4 idna-2.7 requests-2.19.1 urllib3-1.23
You are using pip version 8.1.2, however version 10.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
进入python指令界面,使用get()抓取163网页,发现由于没有导入requests库,抓取失败。
[[email protected] ~]# python3
Python 3.4.8 (default, Mar 23 2018, 10:04:27)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> r = requests.get("http://www.163.com")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'requests' is not defined
导入requests库后重新抓取,成功。
>>> import requests
>>> r = requests.get("http://www.163.com")
>>> r.status_code
200
由于163页面内容太多,换一个页面爬。使用utf-8编码后,易于阅读了。
>>> r = requests.get("http://www.baidu.com")
>>> r.status_code
200
>>> r.text
'<!DOCTYPE html>\r\n<!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charset=utf-8><meta http-equiv=X-UA-Compatible content=IE=Edge><meta content=always name=referrer><link rel=stylesheet type=text/css href=http://s1.bdstatic.com/r/www/cache/bdorz/baidu.min.css><title>ç\x99¾åº¦ä¸\x80ä¸\x8bï¼\x8cä½\xa0å°±ç\x9f¥é\x81\x93</title></head> <body link=#0000cc> <div id=wrapper> <div id=head> <div class=head_wrapper> <div class=s_form> <div class=s_form_wrapper> <div id=lg> <img hidefocus=true src=//www.baidu.com/img/bd_logo1.png width=270 height=129> </div> <form id=form name=f action=//www.baidu.com/s class=fm> <input type=hidden name=bdorz_come value=1> <input type=hidden name=ie value=utf-8> <input type=hidden name=f value=8> <input type=hidden name=rsv_bp value=1> <input type=hidden name=rsv_idx value=1> <input type=hidden name=tn value=baidu><span class="bg s_ipt_wr"><input id=kw name=wd class=s_ipt value maxlength=255 autocomplete=off autofocus></span><span class="bg s_btn_wr"><input type=submit id=su value=ç\x99¾åº¦ä¸\x80ä¸\x8b class="bg s_btn"></span> </form> </div> </div> <div id=u1> <a href=http://news.baidu.com name=tj_trnews class=mnav>æ\x96°é\x97»</a> <a href=http://www.hao123.com name=tj_trhao123 class=mnav>hao123</a> <a href=http://map.baidu.com name=tj_trmap class=mnav>å\x9c°å\x9b¾</a> <a href=http://v.baidu.com name=tj_trvideo class=mnav>è§\x86é¢\x91</a> <a href=http://tieba.baidu.com name=tj_trtieba class=mnav>è´´å\x90§</a> <noscript> <a href=http://www.baidu.com/bdorz/login.gif?login&tpl=mn&u=http%3A%2F%2Fwww.baidu.com%2f%3fbdorz_come%3d1 name=tj_login class=lb>ç\x99»å½\x95</a> </noscript> <script>document.write(\'<a href="http://www.baidu.com/bdorz/login.gif?login&tpl=mn&u=\'+ encodeURIComponent(window.location.href+ (window.location.search === " target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" " ? "?" : "&")+ "bdorz_come=1")+ \'" name="tj_login" class="lb">ç\x99»å½\x95</a>\');</script> <a href=//www.baidu.com/more/ name=tj_briicon class=bri style="display: block;">æ\x9b´å¤\x9a产å\x93\x81</a> </div> </div> </div> <div id=ftCon> <div id=ftConw> <p id=lh> <a href=http://home.baidu.com>å\x85³äº\x8eç\x99¾åº¦</a> <a href=http://ir.baidu.com>About Baidu</a> </p> <p id=cp>©2017 Baidu <a href=http://www.baidu.com/duty/>使ç\x94¨ç\x99¾åº¦å\x89\x8då¿\x85读</a> <a href=http://jianyi.baidu.com/ class=cp-feedback>æ\x84\x8fè§\x81å\x8f\x8dé¦\x88</a> 京ICPè¯\x81030173å\x8f· <img src=//www.baidu.com/img/gs.gif> </p> </div> </div> </div> </body> </html>\r\n'
>>> r.encoding = 'utf-8'
>>> r.text
'<!DOCTYPE html>\r\n<!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charset=utf-8><meta http-equiv=X-UA-Compatible content=IE=Edge><meta content=always name=referrer><link rel=stylesheet type=text/css href=http://s1.bdstatic.com/r/www/cache/bdorz/baidu.min.css><title>百度一下,你就知道</title></head> <body link=#0000cc> <div id=wrapper> <div id=head> <div class=head_wrapper> <div class=s_form> <div class=s_form_wrapper> <div id=lg> <img hidefocus=true src=//www.baidu.com/img/bd_logo1.png width=270 height=129> </div> <form id=form name=f action=//www.baidu.com/s class=fm> <input type=hidden name=bdorz_come value=1> <input type=hidden name=ie value=utf-8> <input type=hidden name=f value=8> <input type=hidden name=rsv_bp value=1> <input type=hidden name=rsv_idx value=1> <input type=hidden name=tn value=baidu><span class="bg s_ipt_wr"><input id=kw name=wd class=s_ipt value maxlength=255 autocomplete=off autofocus></span><span class="bg s_btn_wr"><input type=submit id=su value=百度一下 class="bg s_btn"></span> </form> </div> </div> <div id=u1> <a href=http://news.baidu.com name=tj_trnews class=mnav>新闻</a> <a href=http://www.hao123.com name=tj_trhao123 class=mnav>hao123</a> <a href=http://map.baidu.com name=tj_trmap class=mnav>地图</a> <a href=http://v.baidu.com name=tj_trvideo class=mnav>视频</a> <a href=http://tieba.baidu.com name=tj_trtieba class=mnav>贴吧</a> <noscript> <a href=http://www.baidu.com/bdorz/login.gif?login&tpl=mn&u=http%3A%2F%2Fwww.baidu.com%2f%3fbdorz_come%3d1 name=tj_login class=lb>登录</a> </noscript> <script>document.write(\'<a href="http://www.baidu.com/bdorz/login.gif?login&tpl=mn&u=\'+ encodeURIComponent(window.location.href+ (window.location.search === " target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" " ? "?" : "&")+ "bdorz_come=1")+ \'" name="tj_login" class="lb">登录</a>\');</script> <a href=//www.baidu.com/more/ name=tj_briicon class=bri style="display: block;">更多产品</a> </div> </div> </div> <div id=ftCon> <div id=ftConw> <p id=lh> <a href=http://home.baidu.com>关于百度</a> <a href=http://ir.baidu.com>About Baidu</a> </p> <p id=cp>©2017 Baidu <a href=http://www.baidu.com/duty/>使用百度前必读</a> <a href=http://jianyi.baidu.com/ class=cp-feedback>意见反馈</a> 京ICP证030173号 <img src=//www.baidu.com/img/gs.gif> </p> </div> </div> </div> </body> </html>\r\n'