python正则表达式re常用方法总结

2023-08-07 03:32:01

1.匹配

RE正则表达式在python爬取网页中经常遇到，不同表达式可匹配各种不同字符，常用使用方法如下：

(1) ‘.’可以匹配任意单个字符(除换行符)

(2) ‘\’表示转义字符

(3) ‘[a-zA-Z0-9]’能匹配任意大小写字母和数字

(4) ‘[^abc]’ 可以匹配出abc之外的所有字符，‘^’表示除去字符

(5) 管道符号‘|’，表示有个特定的模式，如‘python|perl’只匹配python和perl

(6) (pattern)* 允许模式重复0次或多次

(7) (pattern)+ 允许模式重复1次或多次

(8) (pattern){m,n} 允许模式重复m到n次

(9) (.+) 匹配一个或多个字符(贪婪匹配)

例：i**like**you

(10) (.+?) 匹配一个或多个字符(非贪婪)

(11)group(1,2,….)获取给定子模式的匹配项

(12)start(group)开始位置 end(group)结束位置 span(group)区间位置

(13)\d 匹配一个数字，相当于 [0-9]

(14)\D 匹配非数字,相当于 [^0-9]

(15)\s 匹配任意空白字符，相当于 [ \t\n\r\f\v]

(16)\S 匹配非空白字符，相当于 [^ \t\n\r\f\v]

(17)\w 匹配数字、字母、下划线中任意一个字符，相当于 [a-zA-Z0-9_]

(18)\W 匹配非数字、字母、下划线中的任意字符，相当于 [^a-zA-Z0-9_]

示例：

import re

string = '* i ** like ** you *'

stri = re.compile('\*(.+)\*')
stri2 = re.compile('\*(.+?)\*')

print (stri.findall(string))

print (stri2.findall(string))

stri3 = re.search('\*(.+?)\*\*(.+?)\*\*(.+?)\*',string)

print (stri3.groups())
print (stri3.group(,))
print ([stri3.start(),stri3.end()])

输出

[' i ** like ** you ']
[' i ', ' like ', ' you ']
(' i ', ' like ', ' you ')
(' i ', ' you ')
[, ]

2.函数

compile、search、 match、 split、 findall、 sub、 escape

具体含义参见https://www.cnblogs.com/dyfblog/p/5880728.html

示例

s = '''first line
second line
third line'''

# 需要从开始处匹配 所以匹配不到 
print re.match('i\w+', s)
# output> None

# 没有限制起始匹配位置
print re.search('i\w+', s)
# output> <_sre.SRE_Match object at 0x0000000002C6A920>

print re.search('i\w+', s).group()
# output> irst

print (re.findall('\w+',s))
# output> ['first', 'line', 'second', 'line', 'third', 'line']

s = '''first 111 line
second 222 line
third 333 line'''

# 按照数字切分
print re.split('\d+', s)
# output> ['first ', ' line\nsecond ', ' line\nthird ', ' line']

# \.+ 匹配不到 返回包含自身的列表
print re.split('\.+', s, )
# output> ['first 111 line\nsecond 222 line\nthird 333 line']

# maxsplit 参数
print re.split('\d+', s, )
# output> ['first ', ' line\nsecond 222 line\nthird 333 line']

3.字符串方法

split 分割

strip返回去除两侧空格的字符串

默认删除空白符包括‘\n’ ‘\t’ ‘\r ’ ‘ ’

lstrip删除开头空白

rstrip删除末尾空白

join 添加与split方法相反

python正则表达式re常用方法总结

1.匹配

2.函数

3.字符串方法

继续阅读

来自python的【条件控制/语句循环/break/continue/else/pass】一、条件控制二、语句循环

无法解析的外部符号 wmain，该符号在函数 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink导出用例转换工具(XML2Excel)

YAML简介和PyYAML安全操作YAML支持的类型YAML的优点：yaml的基本语法python操作

Small tricks

libsvm for python 安装

学习软件测试基础测试第七天

Zeppelin 配置访问 REST APIApache Zeppelin Configuration REST API

【Torch】最简洁logging使用指南

27. Remove Element(列表)题目代码

Cloud Studio初体验

使用 ctypes 进行 Python 和 C 的混合编程

【python】【数据处理】画多维数据分布图

【python】netconf协议对接管理设备

「Python 网络自动化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 网络设备

在python中创建excel并写入