天天看點

python爬蟲之驗證碼識别(淺)話不多說,大人上碼ლ(′◉❥◉`ლ)!!!步驟:

話不多說,大人上碼ლ(′◉❥◉`ლ)!!!

# coding = utf-8
import requests
import pytesseract
from PIL import Image


class checkcode():
    def __init__(self):  # 初始化參數
        self.start_url = 'http://jxjy.dwjtaq.com/random.xhtml'
        self.headers = {
            'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '
                          'AppleWebKit/537.36 (KHTML, like Gecko) '
                          'Chrome/62.0.3202.62 Safari/537.36'
        }

    def parse(self):  # 請求資料
        ret = requests.get(self.start_url, headers=self.headers)
        return ret.content

    def save_code(self, content):  # 儲存驗證碼
        with open('./code.png', 'wb') as f:
            f.write(content)

    def check_code(self):
        image = Image.open('./code.png')
        text = pytesseract.image_to_string(image)
        return text

    def run(self):
        for i in range(10):
            ret = self.parse()
            self.save_code(ret)
            text = self.check_code()
            print("#")
            print(text)


if __name__ == '__main__':
    start = checkcode()
    start.run()
           

這個驗證碼非正常則,是以識别率100%(吹牛,我自己都不信)

此上代碼思路:直接請求驗證碼,轉化為圖檔進行識别

步驟:

安裝兩個子產品pytessract、PIL

import pytesseract
from PIL import Image
           

關鍵語句:

Image.open('./code.png')   # 載入圖檔
pytesseract.image_to_string(image) #圖檔轉化為文本輸出
           

以上案例隻能參考學習,無深度僅供入門

python爬蟲之驗證碼識别(淺)話不多說,大人上碼ლ(′◉❥◉`ლ)!!!步驟: