re库主要功能函数

2023-07-31 14:16:19

#coding=utf-8
import re


#re.search(pattern, string, flags)匹配正则表达式的第一个位置，返回match对象
match = re.search(r'[1-9]\d{5}', 'BIT 100081')
if match:
    print match.group(0)


#re.match(pattern, string, flags)从字符串的开始位置匹配正则表达式，返回match对象
#不调整参数时会报错
#match = re.match(r'[1-9]\d{5}', 'BIT 100081')
#调整参数
match = re.match(r'[1-9]\d{5}', '100081BIT')
if match:
    print match.group(0)
#报错
# print match.group(0)
# Traceback (most recent call last):
#   File "C:/Users/Administrator/PythonWorkSpace/python14.py", line 10, in <module>
#     print match.group(0)
# AttributeError100081
# : 'NoneType' object has no attribute 'group'


#re.findall(pattern, string, flags)以列表类型返回全部能匹配的子串
ls = re.findall(r'[1-9]\d{5}', 'BIT100081 TSU100084')
print ls


#re.split(pattern, string, maxsplit, flags)将字符串按照正则表达式匹配结果进行分割，返回列表类型
#maxsplit:最大分割数
ls = re.split(r'[1-9]\d{5}', 'BIT100081 TSU100084')
print ls
ls = re.split(r'[1-9]\d{5}', 'BIT100081 TSU100084', 1)
print ls #只匹配第一个


#re.finditer(pattern, string, flags)返回一个匹配结果的迭代类型，每个迭代元素都是match对象
for m in re.finditer(r'[1-9]\d{5}', 'BIT100081 TSU100084'):
    if m:
        print m.group(0)


#re.sub(pattern, repl, string,count=0, flags=0)在一个字符串中替换所有匹配正则表达式的子串，返回替换后的字符串
#repl:替换匹配字符串的字符串
#count：匹配的最大替换次数
print re.sub(r'[1-9]\d{5}', ':zipcode', 'BIT100081 TSU100084')
print re.sub(r'[1-9]\d{5}', ':zipcode', 'BIT100081 TSU100084', 1)

re库主要功能函数

继续阅读

Python爬虫之网站超清图片爬取(2021.3.29)

Python入门级爬取百度百科词条

16Python爬虫---Scrapy常用命令

Python爬虫基本库的使用第二章基本库的使用

Python爬虫（四）lxml、xpath安装模块导入查找节点属性查找 @ 符号使用谓语选取未知节点获取文本和属性

爬虫学习之04-request模块获取糗事百科一张热图

python3下用selenium库和chrome的headless模式实现网页抓取（注释中有用phantomJS的小段代码）

【Python爬虫案例学习19】多进程爬取某图片网站

python爬虫实战之爬取成语大全

【爬取百度首页】-将整个html源码保存-headers使用一、网页分析二、代码实现与步骤三、结果分析

爬取百度贴吧

爬取猫眼电影--静态网页反爬与多线程/多进程爬取网页解析爬取代码多线程与多进程

requests模块进行人人网模拟登陆

2023爬虫学习笔记 -- 多线程操作

Python爬虫学习（1）

Boss直聘Python爬虫实战