新手学习爬虫,爬取简书网热评,其中就只有点赞数无法导入,以下为报错信息:
pymysql.err.ProgrammingError: (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'like,reward) values ('462','7')' at line 1")
importrequestsfrom bs4 importBeautifulSoupfrom lxml importetreefrom multiprocessing importPoolimportpymysql
headers= {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36'\'(KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36'}
conn= pymysql.connect(host='localhost',user='root',passwd='123456',db='mydb',port=3306,charset='utf8')
cursor=conn.cursor()defget_jianshu_info(url):
res= requests.get(url,headers=headers)
selector=etree.HTML(res.text)
infos= selector.xpath('//ul[@class="note-list"]/li')for info ininfos:try:
title= info.xpath('div/a/text()')[0]
author=info.xpath('div / div / a[1]/text()')[0]
content= info.xpath('div/p/text()')[0].strip()
comment= info.xpath('div/div/a/text()')[2].strip()if len(comment)==0:
comment= '无'like= info.xpath('div/div/span[1]/text()')[0].strip()if len(like) == 0 and 11:
like= '无'reward= info.xpath('div/div/span[2]/text()')if len(reward) ==0:
reward= '无'
else:
reward=reward[0].strip()
cursor.execute("insert into jianshureping (like,reward)"
"values (%s,%s)",
(str(like),str(reward))
)
conn.commit()print('ok')exceptIndexError:print('error')if __name__ == '__main__':
urls=\
['https://www.jianshu.com/c/bDHhpK?order_by=commented_at&page={}'.format(str(i)) for i in range(1, 3)]for url inurls:
get_jianshu_info(url)
我试过将点赞数去掉就可以导入,单独爬取发现点赞数中有换行符,我也用条件语句排除了,结果如下:
点赞数:['462', '21349', '无', '2885', '118', '60', '17', '436', '18', '4']
评论数:['112', '12572', '20', '179', '17', '46', '23', '237', '10', '6']
打赏数:['7', '121', '58', '8', '无', '2', '无', '2', '无', '1']