天天看點

為Qemu aarch32添加BeautifulSoup4子產品

環境

Qemu:2.8.0

開發闆:vexpress-ca9

概述

上一篇博文已經可以讓我們的開發闆可以成功的ping通百度了,據說Python的網絡功能也很強大,而Beautiful Soup是python的一個庫,但不是标準庫,是以需要單獨安裝,最主要的功能是從網頁抓取資料。

正文

一、先用python自帶的urllib庫試一試

net.py3: 這個是python3版本的

1 #!/usr/bin/env python3
2 
3 from urllib.request import urlopen
4 html = urlopen("http://www.pythonscraping.com/pages/page1.html");
5 print(html.read())      

net.py2:這個是python2版本的

1 #!/usr/bin/env python2
2 
3 from urllib2 import urlopen
4 html = urlopen("http://www.pythonscraping.com/pages/page1.html");
5 print(html.read())      

我們運作看看結果:

[root@vexpress ~]# ./net.py3 
b'<html>\n<head>\n<title>A Useful Page</title>\n</head>\n<body>\n<h1>An Interesting Title</h1>\n<div>\nLorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.\n</div>\n</body>\n</html>\n'
[root@vexpress ~]# 
[root@vexpress ~]# ./net.py2
<html>
<head>
<title>A Useful Page</title>
</head>
<body>
<h1>An Interesting Title</h1>
<div>
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
</div>
</body>
</html>      

其實Python提供了一個工具2to3,将Python2版本的代碼轉換為Python3版本, 我們在闆子上面試試。

但是運作後提示找不到2to3:

[root@vexpress ~]# 2to3 net.py2 
-/bin/sh: 2to3: not found      

但是用which指令查找這個工具發現2to3确實存在

[root@vexpress ~]# which 2to3
/usr/bin/2to3      

我們打開/usr/bin/2to3,看到問題所在:

1 #!/home/pengdonglin/src/qemu/python_cross_compile/Python2/aarch32/bin/python2.7
2 
3 import sys
4 from lib2to3.main import main
5 sys.exit(main("lib2to3.fixes"))      

問題出在第一行, 修改如下:

1 #!/usr/bin/env python2
2 
3 import sys
4 from lib2to3.main import main
5 sys.exit(main("lib2to3.fixes"))      

然後再次運作:

1 [root@vexpress ~]# 2to3 net.py2 
 2 RefactoringTool: Skipping optional fixer: buffer
 3 RefactoringTool: Skipping optional fixer: idioms
 4 RefactoringTool: Skipping optional fixer: set_literal
 5 RefactoringTool: Skipping optional fixer: ws_comma
 6 RefactoringTool: Refactored net.py2
 7 --- net.py2    (original)
 8 +++ net.py2    (refactored)
 9 @@ -1,7 +1,7 @@
10  #!/usr/bin/env python2
11  
12 -from urllib2 import urlopen
13 +from urllib.request import urlopen
14  
15  html = urlopen("http://www.pythonscraping.com/pages/page1.html");
16  
17 -print(html.read())
18 +print((html.read()))
19 RefactoringTool: Files that need to be modified:
20 RefactoringTool: net.py2      

可以看到以+開始的行就是對應Python3版本的,使用下面的指令會将自動将轉換後的檔案存儲下來:

1 [root@vexpress ~]# 2to3 net.py2 -w -n -o /tmp/
 2 lib2to3.main: Output in '/tmp/' will mirror the input directory '' layout.
 3 RefactoringTool: Skipping optional fixer: buffer
 4 RefactoringTool: Skipping optional fixer: idioms
 5 RefactoringTool: Skipping optional fixer: set_literal
 6 RefactoringTool: Skipping optional fixer: ws_comma
 7 RefactoringTool: Refactored net.py2
 8 --- net.py2    (original)
 9 +++ net.py2    (refactored)
10 @@ -1,6 +1,6 @@
11  #!/usr/bin/env python2
12  
13 -from urllib2 import urlopen
14 +from urllib.request import urlopen
15  
16  html = urlopen("http://www.pythonscraping.com/pages/page1.html");
17  
18 RefactoringTool: Writing converted net.py2 to /tmp/net.py2.
19 RefactoringTool: Files that were modified:
20 RefactoringTool: net.py2      

可以看到/tmp/net.py2對應的就是Python3版本的:

1 [root@vexpress ~]# cat /tmp/net.py2 
2 
3 #!/usr/bin/env python2
4 from urllib.request import urlopen
5 html = urlopen("http://www.pythonscraping.com/pages/page1.html");
6 print((html.read()))      

當然,第一行還需要我們手動修改。

二、添加BeautifulSoup4子產品

由于這個子產品是純Python實作的,是以可以先在PC上面安裝這個子產品,然後再拷貝到闆子上面,畢竟Python代碼跟具體的平台無關。

1、為PC安裝BeautifulSoup4

1 sudo apt-get install python-pip
2 sudo apt-get install python3-pip
3 sudo apt-get install python-bs4
4 sudo pip install beautifulsoup4
5 sudo pip3 install beautifulsoup4      

這樣就會在對應版本Python的dist-packages下面看到bs4的目錄

1 $ls /usr/lib/python2.7/dist-packages/bs4 
2 builder/  dammit.py  dammit.pyc  diagnose.py  diagnose.pyc  element.py  element.pyc  __init__.py  __init__.pyc  testing.py  testing.pyc  tests/
3 $ls /usr/lib/python3/dist-packages/bs4/
4 builder/  dammit.py  diagnose.py  element.py  __init__.py  __pycache__/  testing.py  tests/       

有時也會安裝到site-packages下面, 然後将這兩個bs4檔案夾拷貝到共享目錄下:

1 $cp /usr/lib/python2.7/dist-packages/bs4 /nfsroot/bs4_python2 -raf
2 $cp /usr/lib/python3/dist-packages/bs4 /nfsroot/bs4_python3 -raf      

如果遇到問題,也可以采用源碼安裝的方式, 可以到

https://www.crummy.com/software/BeautifulSoup/bs4/download/

下載下傳最新的BeautifulSoup4版本, 我下載下傳的是https://www.crummy.com/software/BeautifulSoup/bs4/download/4.5/beautifulsoup4-4.5.3.tar.gz,然後解壓縮:

1 $tar -xf beautifulsoup4-4.5.3.tar.gz 
2 $ls beautifulsoup4-4.5.3
3 AUTHORS.txt  beautifulsoup4.egg-info/  bs4/  convert-py3k*  COPYING.txt  doc/  doc.zh/  MANIFEST.in  NEWS.txt  PKG-INFO  README.txt  scripts/  setup.cfg  setup.py  test-all-versions*  TODO.txt      

在頂層目錄下的bs4是用于Python2的,然後通過工具convert-py3k可以生成Python3版本的:

cd beautifulsoup4-4.5.3/
./convert-py3k      

在目錄py3k下面的bs4就是用于Python3的,我們可以将這兩個bs4分别拷貝到共享目錄下:

$cp -raf bs4 /nfsroot/bs4_python2
$cp -raf py3k/bs4 /nfsroot/bs4_python3      

同時也應該給PC上面拷貝一份:

sudo cp -raf bs4 /usr/local/lib/python2.7/site-packages/
sudo cp -raf py3k/bs4 /usr/local/lib/python3.6/site-packages/      

2、然後将對應版本bs4放到闆子上面

1 [root@vexpress ~]# mount -t nfs -o nolock 192.168.1.100:/nfsroot /mnt
2 [root@vexpress ~]# cp -raf /mnt/bs4_python2 /usr/lib/python2.7/site-packages/bs4
3 [root@vexpress ~]# cp -raf /mnt/bs4_python3/ /usr/lib/python3.6/site-packages/bs4      

驗證有沒有問題, 執行import bs4:

1 [root@vexpress ~]# python2
 2 Python 2.7.13 (default, Mar 24 2017, 17:04:57) 
 3 [GCC 4.8.3 20140320 (prerelease)] on linux2
 4 Type "help", "copyright", "credits" or "license" for more information.
 5 >>> import bs4
 6 >>> 
 7 [root@vexpress ~]# python3
 8 Python 3.6.0 (default, Mar 24 2017, 17:02:49) 
 9 [GCC 4.8.3 20140320 (prerelease)] on linux
10 Type "help", "copyright", "credits" or "license" for more information.
11 >>> import bs4
12 >>>       

如果導入的時候沒有報錯,表示一切正常。

3、編寫測試程式

bs4.py3: Python3版本

1 #!/usr/bin/env python3
2 
3 from urllib.request import urlopen
4 from bs4 import BeautifulSoup
5 html = urlopen("http://www.pythonscraping.com/pages/page1.html")
6 bsObj = BeautifulSoup(html.read(), "html.parser")
7 print(bsObj.h1)      

bs4.py2:Python2版本

1 #!/usr/bin/env python2
2 
3 from urllib2 import urlopen
4 from bs4 import BeautifulSoup
5 html = urlopen("http://www.pythonscraping.com/pages/page1.html")
6 bsObj = BeautifulSoup(html.read(), "html.parser")
7 print(bsObj.h1)      

運作:

1 [root@vexpress ~]# ./bs4.py3
2 <h1>An Interesting Title</h1>
3 [root@vexpress ~]# ./bs4.py2
4 <h1>An Interesting Title</h1>      

完。