14、re子產品

re子產品

正規表達式就是字元串的比對規則，在多數程式設計語言裡都有相應的支援，python裡對應的子產品是re

常用的表達式規則

'.'     預設比對除\n之外的任意一個字元，若指定flag DOTALL,則比對任意字元，包括換行
'^'     比對字元開頭，若指定flags MULTILINE,這種也可以比對上(r"^a","\nabc\neee",flags=re.MULTILINE)
'$'     比對字元結尾， 若指定flags MULTILINE ,re.search('foo.$','foo1\nfoo2\n',re.MULTILINE).group() 會比對到foo1
'*'     比對*号前的字元0次或多次， re.search('a*','aaaabac')  結果'aaaa'
'+'     比對前一個字元1次或多次，re.findall("ab+","ab+cd+abb+bba") 結果['ab', 'abb']
'?'     比對前一個字元1次或0次 ,re.search('b?','alex').group() 比對b 0次
'{m}'   比對前一個字元m次 ,re.search('b{3}','alexbbbs').group()  比對到'bbb'
'{n,m}' 比對前一個字元n到m次，re.findall("ab{1,3}","abb abc abbcbbb") 結果'abb', 'ab', 'abb']
'|'     比對|左或|右的字元，re.search("abc|ABC","ABCBabcCD").group() 結果'ABC'
'(...)' 分組比對， re.search("(abc){2}a(123|45)", "abcabca456c").group() 結果為'abcabca45'


'\A'    隻從字元開頭比對，re.search("\Aabc","alexabc") 是比對不到的，相當于re.match('abc',"alexabc") 或^
'\Z'    比對字元結尾，同$
'\d'    比對數字0-9
'\D'    比對非數字
'\w'    比對[A-Za-z0-9]
'\W'    比對非[A-Za-z0-9]
's'     比對空白字元、\t、\n、\r , re.search("\s+","ab\tc1\n3").group() 結果 '\t'

'(?P<name>...)' 分組比對 re.search("(?P<province>[0-9]{4})(?P<city>[0-9]{2})(?P<birthday>[0-9]{4})","371481199306143242").groupdict("city") 結果{'province': '3714', 'city': '81', 'birthday': '1993'}

re的比對文法有以下幾種

re.match 從頭開始比對
re.search 比對包含
re.findall 把所有比對到的字元放到以清單中的元素傳回
re.split 以比對到的字元當做清單分隔符
re.sub 比對字元并替換
re.fullmatch 全部比對

re.complie(pattern, flags=0)

Compile a regular expression pattern into a regular expression object, which can be used for matching using its match(), search() and other methods, described below.

The sequence

prog = re.compile(pattern)
result = prog.match(string)

is equivalent to

result = re.match(pattern, string)

but using e.complie() and saving the resulting regular expression object for reuse is more efficient when the expression will be used several times in a single program.

re.match(pattern, string, flags=0)

從起始位置開始根據模型去字元串中比對指定内容，比對單個

pattern 正規表達式
string 要比對的字元串
flags 标志位，用于控制正規表達式的比對方式

import re
obj = re.match('\d+', '123uuasf')
if obj:
    print obj.group()

Flags标志符

re.I(re.IGNORECASE): 忽略大小寫（括号内是完整寫法，下同）
M(MULTILINE): 多行模式，改變'^'和'$'的行為
S(DOTALL): 改變'.'的行為,make the '.' special character match any character at all, including a newline; without this flag, '.' will match anything except a newline.
X(re.VERBOSE) 可以給你的表達式寫注釋，使其更可讀，下面這2個意思一樣

a = re.compile(r"""\d + # the integral part
                \. # the decimal point
                \d * # some fractional digits""",
                re.X)

b = re.compile(r"\d+\.\d*")

re.search(pattern, string, flags=0)

根據模型去字元串中比對指定内容，比對單個

import re
obj = re.search('\d+', 'u123uu888asf')
if obj:
    print obj.group()

re.findall(pattern, string, flags=0)

match and search均用于比對單值，即：隻能比對字元串中的一個，如果想要比對到字元串中所有符合條件的元素，則需要使用 findall。

import re
obj = re.findall('\d+', 'fa123uu888asf')
print obj

re.sub(pattern, repl, string, count=0, flags=0)

用于替換比對的字元串

>>>re.sub('[a-z]+','sb','武配齊是abc123',)

>>> re.sub('\d+','|', 'alex22wupeiqi33oldboy55',count=2)
'alex|wupeiqi|oldboy55'

相比于str.replace功能更加強大

re.split(pattern, string, maxsplit=0, flags=0)

>>>s='9-2*5/3+7/3*99/4*2998+10*568/14'
>>>re.split('[\*\-\/\+]',s)
['9', '2', '5', '3', '7', '3', '99', '4', '2998', '10', '568', '14']

>>> re.split('[\*\-\/\+]',s,3)
['9', '2', '5', '3+7/3*99/4*2998+10*568/14']

re.fullmatch(pattern, string, flags=0)

整個字元串比對成功就傳回re object, 否則傳回None

re.fullmatch('\[email protected]\w+\.(com|cn|edu)',"[email protected]")

14、re子產品

re子產品

re的比對文法有以下幾種

繼續閱讀

python學習筆記之常用子產品用法分析

6.python常用子產品之os

Python自動化運維之常用子產品-re

python之常用子產品

python正規表達式---基于re子產品1、re.match()（其實，最好用re.search()，能完全替換re.match()）2、re.search()3、re.findall()4、re.sub()5、re.compile()6、實戰演練

用python畫小豬佩奇(非原創)

Pygame常用方法

python常用子產品-hashlib子產品hashlib的基本概念子產品基本方法執行個體

python中的指派，淺拷貝，深拷貝

Python基礎學習：statistics子產品

crontab定時任務執行個體

如何建立虛拟環境2、打開~/.bashrc檔案，并添加如下：3、運作

python常用子產品-Pexpect基本使用流程基本方法

python常用子產品-os子產品、sys子產品os子產品簡介os子產品常用方法sys子產品概述sys子產品常用方法

python常用子產品-re子產品正規表達式re子產品常用方法