python爬蟲

2021-10-06 12:35:00

Python爬蟲

爬蟲步驟：

環境

第一個程式

html狀态碼（需要記住）

導包的路徑

*參數

XHR

登入校園網

實驗2

靜态網頁爬取

robots協定

檔案存儲

PyMySQL

爬取内容存入到資料庫

BS4

爬取步驟

step1. download Web html code :requests.get/post

step2. parse the html code from step1. lxml.etree.HTML()

step3. save the parsed data witch from step2

儲存到資料庫：

pymysql connect to mysql database steps:

1.to get connect object

2.get cursor object from connect

3.execute query string from cursor

4.commit transaction and close resource(connect,cursor)/rollback

cmd 視窗

python --version pip --version pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple pip install pymysql pip list | find "requests" python -m pip install pip -U （更新）

vscode

安裝python , kite , jupyter

漢字字元集編碼查詢；中文字元集編碼：GB2312、BIG5、GBK、GB18030、Unicode (qqxiuzi.cn)

安裝路徑下的lib路徑下的site裡

一個 * : 元組方式

兩個 ** : 字典方式

xml http requests

txt文本存儲

csv檔案存儲

4個步驟

建立資料庫連結對象 con

擷取遊标對象 cursor

執行SQL語句

送出事務，關閉連結

實驗

code

爬取http://ccgp-shaanxi.gov.cn/上的公告

本文來自部落格園，作者：chn-tiancx，轉載請注明原文連結：https://www.cnblogs.com/tiancx/p/15371028.html

python爬蟲

繼續閱讀

python爬蟲

Maven

SpringMVC

Spring

JQuery與Bootstrap

javaweb

JDBC

JDBC