Python中re对象的使用方法

re模块是Python中用于正则表达式操作的标准库模块。正则表达式(Regular Expression)是一种强大的文本处理工具，可以用来匹配、查找、替换字符串中的特定模式。

1. re模块的基本方法

1.1 re.compile()

将正则表达式模式编译成一个正则表达式对象，可以重复使用。

import repattern = re.compile(r'\d+')  # 匹配一个或多个数字

1.2 re.match()

从字符串的起始位置匹配一个模式，如果不是起始位置匹配成功则返回None。

result = re.match(r'hello', 'hello world')if result:    print("匹配成功:", result.group())  # 输出: 匹配成功: hello

1.3 re.search()

扫描整个字符串并返回第一个成功的匹配。

result = re.search(r'world', 'hello world')if result:    print("找到匹配:", result.group())  # 输出: 找到匹配: world

1.4 re.findall()

在字符串中找到正则表达式所匹配的所有子串，并返回一个列表。

numbers = re.findall(r'\d+', '12 apples, 34 oranges, 56 bananas')print(numbers)  # 输出: ['12', '34', '56']

1.5 re.finditer()

与findall类似，但返回一个迭代器，包含所有匹配的Match对象。

matches = re.finditer(r'\d+', '12 apples, 34 oranges, 56 bananas')for match in matches:    print(match.group(), match.span())

1.6 re.sub()

用于替换字符串中的匹配项。

text = re.sub(r'\d+', 'NUM', '12 apples, 34 oranges, 56 bananas')print(text)  # 输出: NUM apples, NUM oranges, NUM bananas

1.7 re.split()

按照能够匹配的子串将字符串分割后返回列表。

words = re.split(r'\W+', 'Hello, world! This is a test.')print(words)  # 输出: ['Hello', 'world', 'This', 'is', 'a', 'test', '']

2. 正则表达式对象的方法

编译后的正则表达式对象也有相应的方法：

pattern = re.compile(r'\d+')# 对应的方法pattern.match(string)      # 同re.match()pattern.search(string)     # 同re.search()pattern.findall(string)    # 同re.findall()pattern.finditer(string)   # 同re.finditer()pattern.sub(repl, string)  # 同re.sub()pattern.split(string)      # 同re.split()

3. Match对象的方法和属性

匹配成功后返回的Match对象有以下常用方法和属性：

3.1 方法

group([group1, ...])

groups()

groupdict()

start([group])

end([group])

span([group])

3.2 属性

pos

endpos

lastindex

lastgroup

re

string

4. 正则表达式模式语法

4.1 常用元字符

.

^

$

*

+

?

{m}

{m,n}

[...]

|

(...)

4.2 特殊序列

\d

\D

\s

\S

\w

\W

\b

\B

5. 高级用法

5.1 分组和捕获

pattern = re.compile(r'(\d{3})-(\d{3})-(\d{4})')  # 电话号码格式match = pattern.search('我的电话是123-456-7890')if match:    print("完整匹配:", match.group(0))  # 123-456-7890    print("区号:", match.group(1))     # 123    print("前三位:", match.group(2))   # 456    print("后四位:", match.group(3))   # 7890    print("所有分组:", match.groups()) # ('123', '456', '7890')

5.2 非捕获组

(?:...)表示非捕获组，匹配但不捕获

pattern = re.compile(r'(?:\d{3})-(\d{3})-(\d{4})')match = pattern.search('123-456-7890')print(match.groups())  # 输出: ('456', '7890')

5.3 命名组

(?P<name>...)可以给分组命名

pattern = re.compile(r'(?P<area>\d{3})-(?P<first>\d{3})-(?P<last>\d{4})')match = pattern.search('123-456-7890')print(match.groupdict())  # {'area': '123', 'first': '456', 'last': '7890'}

5.4 前后查找

(?=...)

(?!...)

(?<=...)

(?<!...)

# 匹配后面跟着"bar"的"foo"re.findall(r'foo(?=bar)', 'foobar foobaz')  # ['foo']# 匹配后面不跟着"bar"的"foo"re.findall(r'foo(?!bar)', 'foobar foobaz')  # ['foo']# 匹配前面是"foo"的"bar"re.findall(r'(?<=foo)bar', 'foobar bazbar')  # ['bar']# 匹配前面不是"foo"的"bar"re.findall(r'(?<!foo)bar', 'foobar bazbar')  # ['bar']

6. 标志参数

re模块支持多个标志参数，可以修改正则表达式的行为：

re.IGNORECASE

re.I

re.MULTILINE

re.M

re.DOTALL

re.S

re.VERBOSE

re.X

re.ASCII

re.A

re.DEBUG

# 忽略大小写re.findall(r'hello', 'Hello World', re.I)  # ['Hello']# 多行模式text = """first linesecond linethird line"""re.findall(r'^\w+', text, re.M)  # ['first', 'second', 'third']# 详细模式，可以添加注释和空白pattern = re.compile(r"""    \d{3}   # 区号    -       # 分隔符    \d{3}   # 前三位    -       # 分隔符    \d{4}   # 后四位""", re.VERBOSE)

7. 实际应用示例

7.1 验证电子邮件地址

def is_valid_email(email):    pattern = re.compile(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,}$')    return bool(pattern.match(email))print(is_valid_email('user@example.com'))  # Trueprint(is_valid_email('invalid.email@'))    # False

7.2 提取URL

text = "访问我的网站 https://www.example.com 或者 http://test.org"urls = re.findall(r'https?://[^\s]+', text)print(urls)  # ['https://www.example.com', 'http://test.org']

7.3 替换日期格式

text = "今天是2023-05-15，明天是2023-05-16"new_text = re.sub(r'(\d{4})-(\d{2})-(\d{2})', r'\2/\3/\1', text)print(new_text)  # 今天是05/15/2023，明天是05/16/2023

7.4 密码强度验证

def check_password_strength(password):    if len(password) < 8:        return "密码太短"    if not re.search(r'[A-Z]', password):        return "需要至少一个大写字母"    if not re.search(r'[a-z]', password):        return "需要至少一个小写字母"    if not re.search(r'\d', password):        return "需要至少一个数字"    if not re.search(r'[^A-Za-z0-9]', password):        return "需要至少一个特殊字符"    return "密码强度足够"print(check_password_strength('Weak123'))  # 密码太短print(check_password_strength('Strong@Password123'))  # 密码强度足够

8. 性能考虑

预编译正则表达式

re.compile()

避免贪婪匹配

.*

.*?

使用原子组

(?>...)

避免过度使用捕获组

(?:...)

# 预编译示例pattern = re.compile(r'\d+')  # 编译一次for text in large_text_collection:    matches = pattern.findall(text)  # 多次使用

正则表达式是一个强大的工具，但也可能变得复杂难以维护。对于特别复杂的模式，考虑将其分解为多个简单的正则表达式，或者编写专门的解析器。