介绍
# Regex Patterns
实用的正则表达式速查手册。涵盖 JavaScript、Python、Go 和命令行工具中用于验证、解析、提取和重构的模式。
## 适用场景
- 验证用户输入(电子邮件、URL、IP、电话、日期) - 解析日志行或结构化文本 - 从字符串中提取数据(ID、数字、令牌) - 代码中的搜索和替换(重命名变量、更新导入) - 过滤文件或命令输出中的行 - 调试不符合预期的正则表达式
## 快速参考
### 元字符
| 模式 | 匹配内容 | 示例 | |---|---|---| | `.` | 任意字符(换行符除外) | `a.c` 匹配 `abc`, `a1c` | | `\d` | 数字 `[0-9]` | `\d{3}` 匹配 `123` | | `\w` | 单词字符 `[a-zA-Z0-9_]` | `\w+` 匹配 `hello_123` | | `\s` | 空白字符 `[ \t\n\r\f]` | `\s+` 匹配空格/制表符 | | `\b` | 单词边界 | `\bcat\b` 匹配 `cat` 但不匹配 `scatter` | | `^` | 行首 | `^Error` 匹配以 Error 开头的行 | | `$` | 行尾 | `\.js$` 匹配以 .js 结尾的行 | | `\D`, `\W`, `\S` | 否定:非数字、非单词字符、非空白字符 | |
### 量词
| 模式 | 含义 | |---|---| | `*` | 0 次或多次(贪婪) | | `+` | 1 次或多次(贪婪) | | `?` | 0 次或 1 次(可选) | | `{3}` | 恰好 3 次 | | `{2,5}` | 2 到 5 次 | | `{3,}` | 3 次或更多 | | `*?`, `+?` | 懒惰(匹配尽可能少) |
### 分组和选择
| 模式 | 含义 | |---|---| | `(abc)` | 捕获组 | | `(?:abc)` | 非捕获组 | | `(?P<name>abc)` | 命名组 | | `(?<name>abc)` | 命名组 (JS/Go) | | `a\|b` | 选择(a 或 b) | | `[abc]` | 字符类(a、b 或 c) | | `[^abc]` | 否定类(非 a、b 或 c) | | `[a-z]` | 范围 |
### 前瞻和后顾
| 模式 | 含义 | |---|---| | `(?=abc)` | 正向先行断言(后面跟着 abc) | | `(?!abc)` | 负向先行断言(后面不跟着 abc) | | `(?<=abc)` | 正向后行断言(前面是 abc) | | `(?<!abc)` | 负向后行断言(前面不是 abc) |
## 验证模式
### 电子邮件
``` # Basic (covers 99% of real emails) ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
# Stricter (no consecutive dots, no leading/trailing dots in local part) ^[a-zA-Z0-9]([a-zA-Z0-9._%+-]*[a-zA-Z0-9])?@[a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?(\.[a-zA-Z]{2,})+$ ```
### URL
``` # HTTP/HTTPS URLs https?://[a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?(\.[a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?)*(/[^\s]*)?
# With optional port and query https?://[^\s/]+(/[^\s?]*)?(\?[^\s#]*)?(#[^\s]*)? ```
### IP 地址
``` # IPv4 \b(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\b
# IPv4 (simple, allows invalid like 999.999.999.999) \b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b
# IPv6 (simplified) (?:[0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4} ```
### 电话号码
``` # US phone (various formats) (?:\+1[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4} # Matches: +1 (555) 123-4567, 555.123.4567, 5551234567
# International (E.164) \+[1-9]\d{6,14} ```
### 日期和时间
``` # ISO 8601 date \d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])
# ISO 8601 datetime \d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d+)?(?:Z|[+-]\d{2}:\d{2})
# US date (MM/DD/YYYY) (?:0[1-9]|1[0-2])/(?:0[1-9]|[12]\d|3[01])/\d{4}
# Time (HH:MM:SS, 24h) (?:[01]\d|2[0-3]):[0-5]\d:[0-5]\d ```
### 密码(强度检查)
``` # At least 8 chars, 1 upper, 1 lower, 1 digit, 1 special ^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[!@#$%^&*()_+=-]).{8,}$ ```
### UUID
``` [0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12} ```
### 语义化版本
``` \bv?(\d+)\.(\d+)\.(\d+)(?:-([\w.]+))?(?:\+([\w.]+))?\b # Captures: major, minor, patch, prerelease, build # Matches: 1.2.3, v1.0.0-beta.1, 2.0.0+build.123 ```
## 解析模式
### 日志行
```bash # Apache/Nginx access log # Format: IP - - [date] "METHOD /path HTTP/x.x" status size grep -oP '(\S+) - - \[([^\]]+)\] "(\w+) (\S+) \S+" (\d+) (\d+)' access.log
# Extract IP and status code grep -oP '^\S+|"\s\K\d{3}' access.log
# Syslog format # Format: Mon DD HH:MM:SS hostname process[pid]: message grep -oP '^\w+\s+\d+\s[\d:]+\s(\S+)\s(\S+)\[(\d+)\]:\s(.*)' syslog
# JSON log — extract a field grep -oP '"level"\s*:\s*"\K[^"]+' app.log grep -oP '"message"\s*:\s*"\K[^"]+' app.log ```
### 代码模式
```bash # Find function definitions (JavaScript/TypeScript) grep -nP '(?:function\s+\w+|(?:const|let|var)\s+\w+\s*=\s*(?:async\s*)?\([^)]*\)\s*=>|(?:async\s+)?function\s*\()' src/*.ts
# Find class definitions grep -nP 'class\s+\w+(?:\s+extends\s+\w+)?' src/*.ts
# Find import statements grep -nP '^import\s+.*\s+from\s+' src/*.ts
# Find TODO/FIXME/HACK comments grep -rnP '(?:TODO|FIXME|HACK|XXX|WARN)(?:\([^)]+\))?:?\s+' src/
# Find console.log left in code grep -rnP 'console\.(log|debug|info|warn|error)\(' src/ --include='*.ts' --include='*.js' ```
### 数据提取
```bash # Extract all email addresses from a file grep -oP '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' file.txt
# Extract all URLs grep -oP 'https?://[^\s<>"]+' file.html
# Extract all quoted strings grep -oP '"[^"\\]*(?:\\.[^"\\]*)*"' file.json
# Extract numbers (integer and decimal) grep -oP '-?\d+\.?\d*' data.txt
# Extract key-value pairs (key=value) grep -oP '\b(\w+)=([^\s&]+)' query.txt
# Extract hashtags grep -oP '#\w+' posts.txt
# Extract hex colors grep -oP '#[0-9a-fA-F]{3,8}\b' styles.css ```
## 特定语言用法
### JavaScript
```javascript // Test if a string matches const emailRegex = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/; emailRegex.test('[email protected]'); // true
// Extract with capture groups const match = '2026-02-03T12:30:00Z'.match(/(\d{4})-(\d{2})-(\d{2})/); // match[1] = '2026', match[2] = '02', match[3] = '03'
// Named groups const m = 'John Doe, age 30'.match(/(?<name>[A-Za-z ]+), age (?<age>\d+)/); // m.groups.name = 'John Doe', m.groups.age = '30'
// Find all matches (matchAll returns iterator) const text = 'Call 555-1234 or 555-5678'; const matches = [...text.matchAll(/\d{3}-\d{4}/g)]; // [{0: '555-1234', index: 5}, {0: '555-5678', index: 18}]
// Replace with callback 'hello world'.replace(/\b\w/g, c => c.toUpperCase()); // 'Hello World'
// Replace with named groups '2026-02-03'.replace(/(?<y>\d{4})-(?<m>\d{2})-(?<d>\d{2})/, '$<m>/$<d>/$<y>'); // '02/03/2026'
// Split with regex 'one, two; three'.split(/[,;]\s*/); // ['one', 'two', 'three'] ```
### Python
```python import re
# Match (anchored to start) m = re.match(r'^(\w+)@(\w+)\.(\w+)
### Go
```go import "regexp"
// Compile pattern (panics on invalid regex) re := regexp.MustCompile(`\d{4}-\d{2}-\d{2}`)
// Match test re.MatchString("2026-02-03") // true
// Find first match re.FindString("Date: 2026-02-03 and 2026-03-01") // "2026-02-03"
// Find all matches re.FindAllString(text, -1) // []string of all matches
// Capture groups re := regexp.MustCompile(`(\w+)@(\w+)\.(\w+)`) match := re.FindStringSubmatch("[email protected]") // match[0] = "[email protected]", match[1] = "user", match[2] = "example"
// Named groups re := regexp.MustCompile(`(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})`) match := re.FindStringSubmatch("2026-02-03") for i, name := range re.SubexpNames() { if name != "" { fmt.Printf("%s: %s\n", name, match[i]) } }
// Replace re.ReplaceAllString("foo123bar", "NUM") // "fooNUMbar"
// Replace with function re.ReplaceAllStringFunc(text, strings.ToUpper)
// Note: Go uses RE2 syntax — no lookahead/lookbehind ```
### 命令行 (grep/sed)
```bash # grep -P uses PCRE (Perl-compatible — full features) # grep -E uses Extended regex (no lookahead/lookbehind)
# Find lines matching a pattern grep -P '\d{3}-\d{4}' file.txt
# Extract only the matching part grep -oP '\d{3}-\d{4}' file.txt
# Invert match (lines NOT matching) grep -vP 'DEBUG|TRACE' app.log
# sed replacement sed 's/oldPattern/newText/g' file.txt # Basic sed -E 's/foo_([a-z]+)/bar_\1/g' file.txt # Extended with capture group
# Perl one-liner (most powerful) perl -pe 's/(?<=price:\s)\d+/0/g' file.txt # Lookbehind works in Perl ```
## 搜索和替换模式
### 代码重构
```bash # Rename a variable across files grep -rlP '\boldName\b' src/ | xargs sed -i 's/\boldName\b/newName/g'
# Convert var to const (JavaScript) sed -i -E 's/\bvar\b/const/g' src/*.js
# Convert single quotes to double quotes sed -i "s/'/\"/g" src/*.ts
# Add trailing commas to object properties sed -i -E 's/^(\s+\w+:.+[^,])$/\1,/' config.json
# Update import paths sed -i 's|from '\''../old-path/|from '\''../new-path/|g' src/*.ts
# Convert snake_case to camelCase (Python → JavaScript naming) perl -pe 's/_([a-z])/uc($1)/ge' file.txt ```
### 文本清理
```bash # Remove trailing whitespace sed -i 's/[[:space:]]*$//' file.txt
# Remove blank lines sed -i '/^$/d' file.txt
# Remove duplicate blank lines (keep at most one) sed -i '/^$/N;/^\n$/d' file.txt
# Trim leading and trailing whitespace from each line sed -i 's/^[[:space:]]*//;s/[[:space:]]*$//' file.txt
# Remove HTML tags sed 's/<[^>]*>//g' file.html
# Remove ANSI color codes sed 's/\x1b\[[0-9;]*m//g' output.txt ```
## 常见陷阱
### 贪婪与懒惰匹配
``` Pattern: <.*> Input: <b>bold</b> Greedy matches: <b>bold</b> (entire string between first < and last >) Lazy matches: <b> (stops at first >) Pattern: <.*?> (lazy version) ```
### 转义特殊字符
``` Characters that need escaping in regex: . * + ? ^ $ { } [ ] ( ) | \ In character classes []: only ] - ^ \ need escaping
# To match a literal dot: \. # To match a literal *: \* # To match a literal \: \\ # To match [ or ]: \[ or \] ```
### 换行符和多行
``` By default . does NOT match newline. By default ^ and $ match start/end of STRING.
# To make . match newlines: JavaScript: /pattern/s (dotAll flag) Python: re.DOTALL or re.S Go: (?s) inline flag
# To make ^ $ match line boundaries: JavaScript: /pattern/m (multiline flag) Python: re.MULTILINE or re.M Go: (?m) inline flag ```
### 回溯和性能
``` # Catastrophic backtracking (avoid these patterns on untrusted input): (a+)+ # Nested quantifiers (a|a)+ # Overlapping alternation (.*a){10} # Ambiguous .* with repetition
# Safe alternatives: [a]+ # Instead of (a+)+ a+ # Instead of (a|a)+ [^a]*a # Possessive/atomic instead of .*a ```
## 提示
- 从简单开始,逐步增加复杂性。`\d+` 几乎总是足够了 —— 你很少需要 `[0-9]+`。 - 在真实数据上测试你的正则,而不仅仅是快乐路径。边缘情况(空字符串、特殊字符、Unicode)会破坏幼稚的模式。 - 当你不需要捕获的值时,使用非捕获组 `(?:...)`。这会更快且更简洁。 - 在 JavaScript 中,`matchAll` 和全局 `replace` 始终使用 `g` 标志。没有它,只会找到/替换第一个匹配项。 - Go 的 `regexp` 包使用 RE2(不支持前瞻/后顾)。如果你需要这些,请使用不同的方法或 `regexp2` 包。 - `grep -P` (PCRE) 是最强大的命令行正则表达式。当你需要前瞻、`\d` 或 `\b` 时,请使用它而不是 `grep -E`。 - 对于复杂的模式,使用详细模式(Python 中的 `re.VERBOSE`,Perl 中的 `/x`)并添加注释解释每个部分。 - 正则是解析 HTML、XML 或 JSON 的错误工具。请使用合适的解析器。正则适用于从这些格式中提取简单的值,而不适用于结构化解析。, '[email protected]') if m: print(m.group(1)) # 'user'
# Search (find first match anywhere) m = re.search(r'\d{3}-\d{4}', 'Call 555-1234 today') print(m.group()) # '555-1234'
# Find all matches emails = re.findall(r'[\w.+-]+@[\w.-]+\.\w{2,}', text)
# Named groups m = re.match(r'(?P<name>\w+)\s+(?P<age>\d+)', 'Alice 30') print(m.group('name')) # 'Alice'
# Substitution result = re.sub(r'\bfoo\b', 'bar', 'foo foobar foo') # 'bar foobar bar'
# Sub with callback result = re.sub(r'\b\w', lambda m: m.group().upper(), 'hello world') # 'Hello World'
# Compile for reuse (faster in loops) pattern = re.compile(r'\d{4}-\d{2}-\d{2}') dates = pattern.findall(log_text)
# Multiline and DOTALL re.findall(r'^ERROR.*
### Go
```go import "regexp"
// Compile pattern (panics on invalid regex) re := regexp.MustCompile(`\d{4}-\d{2}-\d{2}`)
// Match test re.MatchString("2026-02-03") // true
// Find first match re.FindString("Date: 2026-02-03 and 2026-03-01") // "2026-02-03"
// Find all matches re.FindAllString(text, -1) // []string of all matches
// Capture groups re := regexp.MustCompile(`(\w+)@(\w+)\.(\w+)`) match := re.FindStringSubmatch("[email protected]") // match[0] = "[email protected]", match[1] = "user", match[2] = "example"
// Named groups re := regexp.MustCompile(`(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})`) match := re.FindStringSubmatch("2026-02-03") for i, name := range re.SubexpNames() { if name != "" { fmt.Printf("%s: %s\n", name, match[i]) } }
// Replace re.ReplaceAllString("foo123bar", "NUM") // "fooNUMbar"
// Replace with function re.ReplaceAllStringFunc(text, strings.ToUpper)
// Note: Go uses RE2 syntax — no lookahead/lookbehind ```
### 命令行 (grep/sed)
```bash # grep -P uses PCRE (Perl-compatible — full features) # grep -E uses Extended regex (no lookahead/lookbehind)
# Find lines matching a pattern grep -P '\d{3}-\d{4}' file.txt
# Extract only the matching part grep -oP '\d{3}-\d{4}' file.txt
# Invert match (lines NOT matching) grep -vP 'DEBUG|TRACE' app.log
# sed replacement sed 's/oldPattern/newText/g' file.txt # Basic sed -E 's/foo_([a-z]+)/bar_\1/g' file.txt # Extended with capture group
# Perl one-liner (most powerful) perl -pe 's/(?<=price:\s)\d+/0/g' file.txt # Lookbehind works in Perl ```
## 搜索和替换模式
### 代码重构
```bash # Rename a variable across files grep -rlP '\boldName\b' src/ | xargs sed -i 's/\boldName\b/newName/g'
# Convert var to const (JavaScript) sed -i -E 's/\bvar\b/const/g' src/*.js
# Convert single quotes to double quotes sed -i "s/'/\"/g" src/*.ts
# Add trailing commas to object properties sed -i -E 's/^(\s+\w+:.+[^,])$/\1,/' config.json
# Update import paths sed -i 's|from '\''../old-path/|from '\''../new-path/|g' src/*.ts
# Convert snake_case to camelCase (Python → JavaScript naming) perl -pe 's/_([a-z])/uc($1)/ge' file.txt ```
### 文本清理
```bash # Remove trailing whitespace sed -i 's/[[:space:]]*$//' file.txt
# Remove blank lines sed -i '/^$/d' file.txt
# Remove duplicate blank lines (keep at most one) sed -i '/^$/N;/^\n$/d' file.txt
# Trim leading and trailing whitespace from each line sed -i 's/^[[:space:]]*//;s/[[:space:]]*$//' file.txt
# Remove HTML tags sed 's/<[^>]*>//g' file.html
# Remove ANSI color codes sed 's/\x1b\[[0-9;]*m//g' output.txt ```
## 常见陷阱
### 贪婪与懒惰匹配
``` Pattern: <.*> Input: <b>bold</b> Greedy matches: <b>bold</b> (entire string between first < and last >) Lazy matches: <b> (stops at first >) Pattern: <.*?> (lazy version) ```
### 转义特殊字符
``` Characters that need escaping in regex: . * + ? ^ $ { } [ ] ( ) | \ In character classes []: only ] - ^ \ need escaping
# To match a literal dot: \. # To match a literal *: \* # To match a literal \: \\ # To match [ or ]: \[ or \] ```
### 换行符和多行
``` By default . does NOT match newline. By default ^ and $ match start/end of STRING.
# To make . match newlines: JavaScript: /pattern/s (dotAll flag) Python: re.DOTALL or re.S Go: (?s) inline flag
# To make ^ $ match line boundaries: JavaScript: /pattern/m (multiline flag) Python: re.MULTILINE or re.M Go: (?m) inline flag ```
### 回溯和性能
``` # Catastrophic backtracking (avoid these patterns on untrusted input): (a+)+ # Nested quantifiers (a|a)+ # Overlapping alternation (.*a){10} # Ambiguous .* with repetition
# Safe alternatives: [a]+ # Instead of (a+)+ a+ # Instead of (a|a)+ [^a]*a # Possessive/atomic instead of .*a ```
## 提示
- 从简单开始,逐步增加复杂性。`\d+` 几乎总是足够了 —— 你很少需要 `[0-9]+`。 - 在真实数据上测试你的正则,而不仅仅是快乐路径。边缘情况(空字符串、特殊字符、Unicode)会破坏幼稚的模式。 - 当你不需要捕获的值时,使用非捕获组 `(?:...)`。这会更快且更简洁。 - 在 JavaScript 中,`matchAll` 和全局 `replace` 始终使用 `g` 标志。没有它,只会找到/替换第一个匹配项。 - Go 的 `regexp` 包使用 RE2(不支持前瞻/后顾)。如果你需要这些,请使用不同的方法或 `regexp2` 包。 - `grep -P` (PCRE) 是最强大的命令行正则表达式。当你需要前瞻、`\d` 或 `\b` 时,请使用它而不是 `grep -E`。 - 对于复杂的模式,使用详细模式(Python 中的 `re.VERBOSE`,Perl 中的 `/x`)并添加注释解释每个部分。 - 正则是解析 HTML、XML 或 JSON 的错误工具。请使用合适的解析器。正则适用于从这些格式中提取简单的值,而不适用于结构化解析。, text, re.MULTILINE) # ^ and $ match line boundaries re.search(r'start.*end', text, re.DOTALL) # . matches newlines
# Verbose mode (readable complex patterns) pattern = re.compile(r''' ^ # Start of string (?P<year>\d{4}) # Year -(?P<month>\d{2}) # Month -(?P<day>\d{2}) # Day $ # End of string ''', re.VERBOSE) ```
### Go
```go import "regexp"
// Compile pattern (panics on invalid regex) re := regexp.MustCompile(`\d{4}-\d{2}-\d{2}`)
// Match test re.MatchString("2026-02-03") // true
// Find first match re.FindString("Date: 2026-02-03 and 2026-03-01") // "2026-02-03"
// Find all matches re.FindAllString(text, -1) // []string of all matches
// Capture groups re := regexp.MustCompile(`(\w+)@(\w+)\.(\w+)`) match := re.FindStringSubmatch("[email protected]") // match[0] = "[email protected]", match[1] = "user", match[2] = "example"
// Named groups re := regexp.MustCompile(`(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})`) match := re.FindStringSubmatch("2026-02-03") for i, name := range re.SubexpNames() { if name != "" { fmt.Printf("%s: %s\n", name, match[i]) } }
// Replace re.ReplaceAllString("foo123bar", "NUM") // "fooNUMbar"
// Replace with function re.ReplaceAllStringFunc(text, strings.ToUpper)
// Note: Go uses RE2 syntax — no lookahead/lookbehind ```
### 命令行 (grep/sed)
```bash # grep -P uses PCRE (Perl-compatible — full features) # grep -E uses Extended regex (no lookahead/lookbehind)
# Find lines matching a pattern grep -P '\d{3}-\d{4}' file.txt
# Extract only the matching part grep -oP '\d{3}-\d{4}' file.txt
# Invert match (lines NOT matching) grep -vP 'DEBUG|TRACE' app.log
# sed replacement sed 's/oldPattern/newText/g' file.txt # Basic sed -E 's/foo_([a-z]+)/bar_\1/g' file.txt # Extended with capture group
# Perl one-liner (most powerful) perl -pe 's/(?<=price:\s)\d+/0/g' file.txt # Lookbehind works in Perl ```
## 搜索和替换模式
### 代码重构
```bash # Rename a variable across files grep -rlP '\boldName\b' src/ | xargs sed -i 's/\boldName\b/newName/g'
# Convert var to const (JavaScript) sed -i -E 's/\bvar\b/const/g' src/*.js
# Convert single quotes to double quotes sed -i "s/'/\"/g" src/*.ts
# Add trailing commas to object properties sed -i -E 's/^(\s+\w+:.+[^,])$/\1,/' config.json
# Update import paths sed -i 's|from '\''../old-path/|from '\''../new-path/|g' src/*.ts
# Convert snake_case to camelCase (Python → JavaScript naming) perl -pe 's/_([a-z])/uc($1)/ge' file.txt ```
### 文本清理
```bash # Remove trailing whitespace sed -i 's/[[:space:]]*$//' file.txt
# Remove blank lines sed -i '/^$/d' file.txt
# Remove duplicate blank lines (keep at most one) sed -i '/^$/N;/^\n$/d' file.txt
# Trim leading and trailing whitespace from each line sed -i 's/^[[:space:]]*//;s/[[:space:]]*$//' file.txt
# Remove HTML tags sed 's/<[^>]*>//g' file.html
# Remove ANSI color codes sed 's/\x1b\[[0-9;]*m//g' output.txt ```
## 常见陷阱
### 贪婪与懒惰匹配
``` Pattern: <.*> Input: <b>bold</b> Greedy matches: <b>bold</b> (entire string between first < and last >) Lazy matches: <b> (stops at first >) Pattern: <.*?> (lazy version) ```
### 转义特殊字符
``` Characters that need escaping in regex: . * + ? ^ $ { } [ ] ( ) | \ In character classes []: only ] - ^ \ need escaping
# To match a literal dot: \. # To match a literal *: \* # To match a literal \: \\ # To match [ or ]: \[ or \] ```
### 换行符和多行
``` By default . does NOT match newline. By default ^ and $ match start/end of STRING.
# To make . match newlines: JavaScript: /pattern/s (dotAll flag) Python: re.DOTALL or re.S Go: (?s) inline flag
# To make ^ $ match line boundaries: JavaScript: /pattern/m (multiline flag) Python: re.MULTILINE or re.M Go: (?m) inline flag ```
### 回溯和性能
``` # Catastrophic backtracking (avoid these patterns on untrusted input): (a+)+ # Nested quantifiers (a|a)+ # Overlapping alternation (.*a){10} # Ambiguous .* with repetition
# Safe alternatives: [a]+ # Instead of (a+)+ a+ # Instead of (a|a)+ [^a]*a # Possessive/atomic instead of .*a ```
## 提示
- 从简单开始,逐步增加复杂性。`\d+` 几乎总是足够了 —— 你很少需要 `[0-9]+`。 - 在真实数据上测试你的正则,而不仅仅是快乐路径。边缘情况(空字符串、特殊字符、Unicode)会破坏幼稚的模式。 - 当你不需要捕获的值时,使用非捕获组 `(?:...)`。这会更快且更简洁。 - 在 JavaScript 中,`matchAll` 和全局 `replace` 始终使用 `g` 标志。没有它,只会找到/替换第一个匹配项。 - Go 的 `regexp` 包使用 RE2(不支持前瞻/后顾)。如果你需要这些,请使用不同的方法或 `regexp2` 包。 - `grep -P` (PCRE) 是最强大的命令行正则表达式。当你需要前瞻、`\d` 或 `\b` 时,请使用它而不是 `grep -E`。 - 对于复杂的模式,使用详细模式(Python 中的 `re.VERBOSE`,Perl 中的 `/x`)并添加注释解释每个部分。 - 正则是解析 HTML、XML 或 JSON 的错误工具。请使用合适的解析器。正则适用于从这些格式中提取简单的值,而不适用于结构化解析。