天天看點

python爬蟲之圖檔下載下傳APP 2.0看修改的代碼:總結

上次講到利用python進行搜尋并下載下傳圖檔,今天更新一下,我們知道, https://www.pexels.com/ 這個網站搜尋圖檔需要英文,但有些人不太會使用英文,想搜尋什麼東西需要先去翻譯了才能搜尋,今天調用API store裡面的斯必克API進行自動翻譯,這樣就可以輸入中文進行搜尋啦!

python爬蟲之圖檔下載下傳APP 2.0看修改的代碼:總結

看修改的代碼:

from bs4 import BeautifulSoup
import requests
import json

headers ={
    'accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
    'Cookie':'__cfduid=dcb472bad94316522ad55151de6879acc1479632720; locale=en; _ga=GA1.2.1575445427.1479632759; _gat=1; _hjIncludedInSample=1',
    'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36'
}

url_path = 'https://www.pexels.com/search/'
word= input('請輸入你要下載下傳的圖檔:')
url_tra ='http://howtospeak.org:443/api/e2c?user_key=dfcacb6404295f9ed9e430f67b641a8e &notrans=0&text=' + word
english_data = requests.get(url_tra)
js_data = json.loads(english_data.text)
content = js_data['english']
url = url_path + content + '/'
wb_data = requests.get(url,headers=headers)
soup = BeautifulSoup(wb_data.text,'lxml')
imgs = soup.select('a > img')
list = []
for img in imgs:
    photo = img.get('src')
    list.append(photo)

path = 'C://Users/Administrator/Desktop/photo/'

i = 1
for item in list:
    if item==None:
        pass
    elif '?' in item:
        data = requests.get(item,headers=headers)
        fp = open(path+content+str(i)+'.jpeg','wb')
        fp.write(data.content)
        fp.close
        i = i+1
    else:
        data = requests.get(item, headers=headers)
        fp = open(path+item[-10:],'wb')
        fp.write(data.content)
        fp.close()
           

總結

1 API的調用,現在許多API是json格式,我使用了json庫進行解析,取我要的英語翻譯構成網頁

2 不足:可能會出現找不到網頁的情況,因為搜尋的網站構成不一樣,如何智能比對是以後需要考慮的。