Python3模拟登录豆瓣(以豆瓣为例)

首先,先哭一会儿.

因为,这个模拟登录我弄了很久,我是属于没经验的那种.所以要弄很久.

登录的目标站点是:

www.douban.com

豆瓣网站!

先上思路吧!

"""

技术要求:

Python第三方库:

requests

json

"""

思路:

#这是最基本的防止反爬虫的策略了!就是设置User-Agent和Referer!

headers = {"User-Agent":"Mozilla/5.0 (Windows NT 6.1; W…) Gecko/20100101 Firefox/59.0",

"Referer":"https://www.douban.com/account…w.douban.com/people/153954087/"}

#第二,设置data!

#因为登录一般来说都是POST请求,那么就会涉及到表单的提交!

def login(CaptchaToken):

data = {"captcha-id":CaptchaToken,

"captcha-solution":input("请输入验证码:"),

"form_email":"你自己的账号",

"form_password":"你自己的密码",

"login":u"登录",

"redir":"https://www.douban.com/people/153954087/",

"source":"None"

}

#啥?什么是POST?

#这个东西你百度的话还真的难懂,不如去试炼一番.

#怎么试炼?

呐:https://v.qq.com/x/page/p0525lpt4x0.html

#提交表单的数据是通过POST,还是不懂POST的话,那么请看"呐"的链接即可!

response = requests.post("https://accounts.douban.com/login",data,headers)

print(response.url)

return response.text

#最后一部分的就是关于豆瓣的验证码的问题了!

#所以这块也不会是晦涩难懂,基本思路就是!

#获取验证码图片的URL,然后,url是不变的,变化的是url的id!

#验证码:

def CaptchaCode(url):

response = requests.get(url)

result = response.json()

CaptchaUrl = result["url"]

CaptchaToken = result["token"]

CodeImg = requests.get("https:"+CaptchaUrl,headers).content

f = open("douban_code.png","wb")

f.write(CodeImg)

f.close()

return CaptchaToken

完整代码:

import requests

"""

思路:

1.设置headers,防止反扒

2.设置data

3.验证码的设置

"""

headers = {"User-Agent":"Mozilla/5.0 (Windows NT 6.1; W…) Gecko/20100101 Firefox/59.0",

"Referer":"https://www.douban.com/account…w.douban.com/people/153954087/"}

#验证码:

def CaptchaCode(url):

response = requests.get(url)

result = response.json()#转化为json格式的字符串!

CaptchaUrl = result["url"]

CaptchaToken = result["token"]

CodeImg = requests.get("https:"+CaptchaUrl,headers).content

f = open("douban_code.png","wb")

f.write(CodeImg)

f.close()

return CaptchaToken

#返回验证码的ID

#登录

def login(CaptchaToken):

data = {"captcha-id":CaptchaToken,

"captcha-solution":input("请输入验证码:"),

"form_email":"xxxxx",

"form_password":"xxxxxx",

"login":u"登录",

"redir":"https://www.douban.com/people/153954087/",

"source":"None"

}

response = requests.post("https://accounts.douban.com/login",data,headers)

print(response.url)

return response.text

CaptchaToken = CaptchaCode("https://www.douban.com/j/misc/captcha")

response = login(CaptchaToken)

if "iven" in response:

print("登录成功!")

else:

print("登录失败!")

print(response)

最后说一点,模拟登陆的意思就是把你的爬虫尽可能的模仿成浏览器的样子去访问目标站点!

这个就需要经验和时间的累积了.

我总结了一点,关于懂和不懂的区别.

懂了不一定会,你懂了其思路,说是懂了.但是真正去实现的时候就会是一脸懵逼的!

所以,在学习这个东西时,做一下记录.保持学习进程.

就这么多!

Python3模拟登录豆瓣(以豆瓣为例)

继续阅读

4、Python爬虫中urllib库的相关介绍

spider-通过scrapyd网页管理工具执行scrapy框架

python多线程爬取图片二

xpath beautiful pyquery三种解析库

对于爬虫遇到的JS渲染的问题的一些解决方法

在使用Selenium抓去网页的时候，使网页的滑动条滚动&&解决Selenium抓去数据不完整问题

【Python爬虫】爬取斗鱼直播信息（Fiddler抓包分析）

数据分析与可视化（中文词云）->Python招聘

Content-Type: application/json的坑

python 爬虫猫眼top100存入 csv mysq mogon