继续昨天的内容爬取必应高清壁纸。
今天我将昨天的代码进行了优化,在这里我将对我优化后的代码进行逐一分析。
代码分析
- 前提准备
import requests
from lxml import etree
base_url = 'https://bing.ioliu.cn/?p={}'
header = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36',
'Referer': 'https://bing.ioliu.cn'
}
page = 0
在这里我们需要导入requests库、lxml库
base_url指的是网页的地址,因为页数不同,会导致url的地址也不一样,但是会按照顺序变化。例如:
第一页:https://bing.ioliu.cn/?p=1
第二页:https://bing.ioliu.cn/?p=2
以此类推
- 下载网页
# Download Web Page
def get_html(base_url):
html = requests.get(base_url,headers=header).text
return html
- 获取图片地址
# Get image address
def get_url(html):
etree_html = etree.HTML(html)
img_url = etree_html.xpath('//img/@src')
return img_url
- 下载图片保存至本地文件夹
# Download image and save in directory
def get_img(base_url):
global page
page+=1
img_name='picture\\{}.jpg'.format(page)
img = requests.get(base_url,headers=header).content
with open(img_name,'wb') as save_img:
save_img.write(img)
- 主函数
# Main function and download page one to four
if __name__ == '__main__':
for n in range(1,5):
print('Downing page {}'.format(n))
html = get_html(base_url.format(n))
img_list = get_url(html)
for img in img_list:
get_img(img)
-
完整代码
import requests
from lxml import etree
base_url = 'https://bing.ioliu.cn/?p={}'
header = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36',
'Referer': 'https://bing.ioliu.cn'
}
page = 0
# Download Web Page
def get_html(base_url):
html = requests.get(base_url,headers=header).text
return html
# Get image address
def get_url(html):
etree_html = etree.HTML(html)
img_url = etree_html.xpath('//img/@src')
return img_url
# Download image and save in directory
def get_img(base_url):
global page
page+=1
img_name='picture\\{}.jpg'.format(page)
img = requests.get(base_url,headers=header).content
with open(img_name,'wb') as save_img:
save_img.write(img)
# Main function and download page one to four
if __name__ == '__main__':
for n in range(1,5):
print('Downing page {}'.format(n))
html = get_html(base_url.format(n))
img_list = get_url(html)
for img in img_list:
get_img(img)
在这里我下载的是1至5页的图片,大家可以根据需要在主函数中进行修改。