Python爬虫抓取图片，网址从文件中读取

2015-03-30 23:50:00

利用python抓取网络图片的步骤：

1.根据给定的网址获取网页源代码

2.利用正则表达式把源代码中的图片地址过滤出来

def getimg(html): #下载图片保存在同目录下的pictures文件夹下

reg=r'src="(.+?\.jpg)" pic_ext'

imgre=re.compile(reg)

imglist=imgre.findall(html)

if not imglist:

print "not found"

else:

filepath=os.getcwd() +'\pictures'

print filepath

if os.path.exists(filepath) is False:

os.mkdir(filepath)

global x

for imgurl in imglist:

temp = filepath + '\%s.jpg' % x

print imgurl

urllib.urlretrieve(imgurl,temp)

x=x+1

x = 0

fp =file("img_path.txt") #所有网址都放在这个文件里

while True:

outline = fp.readline().strip('\n')

if len(outline)==0:

break

print outline

html=gethtml(outline)

getimg(html)

fp.close()

Python爬虫抓取图片，网址从文件中读取

继续阅读

来自python的【条件控制/语句循环/break/continue/else/pass】一、条件控制二、语句循环

无法解析的外部符号 wmain，该符号在函数 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink导出用例转换工具(XML2Excel)

YAML简介和PyYAML安全操作YAML支持的类型YAML的优点：yaml的基本语法python操作

Small tricks

libsvm for python 安装

学习软件测试基础测试第七天

Zeppelin 配置访问 REST APIApache Zeppelin Configuration REST API

【Torch】最简洁logging使用指南

27. Remove Element(列表)题目代码

Cloud Studio初体验

使用 ctypes 进行 Python 和 C 的混合编程

【python】【数据处理】画多维数据分布图

【python】netconf协议对接管理设备

「Python 网络自动化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 网络设备

在python中创建excel并写入