Python 正则表达式

Python 正则表达式（Regular Expressions，简称 regex） 通过 re 模块实现，可用于 模式匹配、字符串搜索、替换和解析，在处理文本数据时非常强大。

1. `re` 模块基础

import re

方法	作用
`re.match(pattern, string)`	从字符串开头匹配，返回 `Match` 对象或 `None`
`re.search(pattern, string)`	在字符串中搜索第一次匹配项，返回 `Match` 对象或 `None`
`re.findall(pattern, string)`	返回所有匹配项的列表
`re.finditer(pattern, string)`	返回匹配项的迭代器
`re.sub(pattern, repl, string)`	替换匹配项
`re.split(pattern, string)`	按匹配项拆分字符串
`re.compile(pattern)`	预编译正则表达式，提高性能

2. 正则表达式语法

语法	作用	示例
`.`	任意单个字符（除换行符）	`a.c` 匹配 `"abc"`、`"adc"`
`^`	以…开头	`^abc` 匹配 `"abc123"` 但不匹配 `"123abc"`
`$`	以…结尾	`abc$` 匹配 `"123abc"` 但不匹配 `"abc123"`
`*`	0 次或多次匹配	`a*` 匹配 `"a"`、`"aaa"`、`""`
`+`	1 次或多次匹配	`a+` 匹配 `"a"`、`"aaa"` 但不匹配 `""`
`?`	0 次或 1 次匹配	`a?` 匹配 `""` 或 `"a"`
`{n}`	精确匹配 `n` 次	`a{3}` 匹配 `"aaa"`
`{n,}`	至少匹配 `n` 次	`a{2,}` 匹配 `"aa"`、`"aaa"`
`{n,m}`	至少 `n` 次，最多 `m` 次	`a{2,4}` 匹配 `"aa"`、`"aaa"`、`"aaaa"`
`[]`	字符类	`[abc]` 匹配 `"a"`、`"b"`、`"c"`
`	`	或（OR）
`\d`	数字（等价于 `[0-9]`）	`\d+` 匹配 `"123"`
`\w`	字母、数字、下划线（等价于 `[a-zA-Z0-9_]`）	`\w+` 匹配 `"hello_123"`
`\s`	空白字符	`\s+` 匹配 `"\t "`
`\b`	单词边界	`\bword\b` 只匹配 `"word"`，不匹配 `"wording"`
`\D`	非数字	`\D+` 匹配 `"abc"`
`\W`	非字母数字	`\W+` 匹配 `"#@!"`
`\S`	非空白字符	`\S+` 匹配 `"abc123"`

3. 正则表达式示例

3.1 `re.match()` – 从开头匹配

import re

pattern = r"hello"
text = "hello world"
match = re.match(pattern, text)

if match:
    print("匹配成功:", match.group())  # 输出 "hello"
else:
    print("匹配失败")

3.2 `re.search()` – 任意位置匹配

import re

pattern = r"world"
text = "hello world"
match = re.search(pattern, text)

if match:
    print("找到:", match.group())  # 输出 "world"

3.3 `re.findall()` – 查找所有匹配项

import re

pattern = r"\d+"
text = "价格 100 元，税 20 元"
matches = re.findall(pattern, text)

print(matches)  # 输出 ['100', '20']

3.4 `re.finditer()` – 迭代匹配

import re

pattern = r"\d+"
text = "订单 123，价格 456"
matches = re.finditer(pattern, text)

for match in matches:
    print(match.group())  # 输出 123 和 456

3.5 `re.sub()` – 字符串替换

import re

pattern = r"\d+"
text = "价格 100 元"
new_text = re.sub(pattern, "XXX", text)

print(new_text)  # 输出 "价格 XXX 元"

3.6 `re.split()` – 按正则拆分字符串

import re

pattern = r"\s+"
text = "hello  world  python"
words = re.split(pattern, text)

print(words)  # 输出 ['hello', 'world', 'python']

3.7 `re.compile()` – 预编译正则表达式

import re

pattern = re.compile(r"\d+")
text = "订单 123 价格 456"

# 使用编译后的正则表达式
print(pattern.findall(text))  # 输出 ['123', '456']

4. 贪婪与非贪婪匹配

量词	说明	示例
`.*`	贪婪模式（匹配尽可能多的字符）	`<.*>` 匹配 `"<div>hello</div>"`
`.*?`	非贪婪模式（匹配尽可能少的字符）	`<.*?>` 只匹配 `"<div>"`

import re

text = "<div>hello</div>"
greedy = re.search(r"<.*>", text)  # 贪婪匹配
lazy = re.search(r"<.*?>", text)  # 非贪婪匹配

print(greedy.group())  # 输出 "<div>hello</div>"
print(lazy.group())  # 输出 "<div>"

5. 参考资料

出站链接

站内链接

Python 正则表达式 功能强大，熟练掌握可以极大提升文本处理能力！🚀

1. `re` 模块基础

2. 正则表达式语法

3. 正则表达式示例

3.1 `re.match()` – 从开头匹配

3.2 `re.search()` – 任意位置匹配

3.3 `re.findall()` – 查找所有匹配项

3.4 `re.finditer()` – 迭代匹配

3.5 `re.sub()` – 字符串替换

3.6 `re.split()` – 按正则拆分字符串

3.7 `re.compile()` – 预编译正则表达式

4. 贪婪与非贪婪匹配

5. 参考资料

出站链接

站内链接

lichongyang

发表回复取消回复

近期文章

近期评论

归档

分类

Python 正则表达式

1. re 模块基础

2. 正则表达式语法

3. 正则表达式示例

3.1 re.match() – 从开头匹配

3.2 re.search() – 任意位置匹配

3.3 re.findall() – 查找所有匹配项

3.4 re.finditer() – 迭代匹配

3.5 re.sub() – 字符串替换

3.6 re.split() – 按正则拆分字符串

3.7 re.compile() – 预编译正则表达式

4. 贪婪与非贪婪匹配

5. 参考资料

出站链接

站内链接

lichongyang

发表回复 取消回复

近期文章

近期评论

归档

分类

1. `re` 模块基础

3.1 `re.match()` – 从开头匹配

3.2 `re.search()` – 任意位置匹配

3.3 `re.findall()` – 查找所有匹配项

3.4 `re.finditer()` – 迭代匹配

3.5 `re.sub()` – 字符串替换

3.6 `re.split()` – 按正则拆分字符串

3.7 `re.compile()` – 预编译正则表达式

发表回复取消回复