Regex Patterns

Introduction

# Regex Patterns

Practical regular expression cookbook. Patterns for validation, parsing, extraction, and refactoring across JavaScript, Python, Go, and command-line tools.

## When to Use

- Validating user input (email, URL, IP, phone, dates) - Parsing log lines or structured text - Extracting data from strings (IDs, numbers, tokens) - Search-and-replace in code (rename variables, update imports) - Filtering lines in files or command output - Debugging regexes that don't match as expected

## Quick Reference

### Metacharacters

| Pattern | Matches | Example | |---|---|---| | `.` | Any character (except newline) | `a.c` matches `abc`, `a1c` | | `\d` | Digit `[0-9]` | `\d{3}` matches `123` | | `\w` | Word char `[a-zA-Z0-9_]` | `\w+` matches `hello_123` | | `\s` | Whitespace `[ \t\n\r\f]` | `\s+` matches spaces/tabs | | `\b` | Word boundary | `\bcat\b` matches `cat` not `scatter` | | `^` | Start of line | `^Error` matches line starting with Error | | `$` | End of line | `\.js$` matches line ending with .js | | `\D`, `\W`, `\S` | Negated: non-digit, non-word, non-space | |

### Quantifiers

| Pattern | Meaning | |---|---| | `*` | 0 or more (greedy) | | `+` | 1 or more (greedy) | | `?` | 0 or 1 (optional) | | `{3}` | Exactly 3 | | `{2,5}` | Between 2 and 5 | | `{3,}` | 3 or more | | `*?`, `+?` | Lazy (match as few as possible) |

### Groups and Alternation

| Pattern | Meaning | |---|---| | `(abc)` | Capture group | | `(?:abc)` | Non-capturing group | | `(?P<name>abc)` | Named group (Python) | | `(?<name>abc)` | Named group (JS/Go) | | `a\|b` | Alternation (a or b) | | `[abc]` | Character class (a, b, or c) | | `[^abc]` | Negated class (not a, b, or c) | | `[a-z]` | Range |

### Lookahead and Lookbehind

| Pattern | Meaning | |---|---| | `(?=abc)` | Positive lookahead (followed by abc) | | `(?!abc)` | Negative lookahead (not followed by abc) | | `(?<=abc)` | Positive lookbehind (preceded by abc) | | `(?<!abc)` | Negative lookbehind (not preceded by abc) |

## Validation Patterns

### Email

``` # Basic (covers 99% of real emails) ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

# Stricter (no consecutive dots, no leading/trailing dots in local part) ^[a-zA-Z0-9]([a-zA-Z0-9._%+-]*[a-zA-Z0-9])?@[a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?(\.[a-zA-Z]{2,})+$ ```

### URL

``` # HTTP/HTTPS URLs https?://[a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?(\.[a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?)*(/[^\s]*)?

# With optional port and query https?://[^\s/]+(/[^\s?]*)?(\?[^\s#]*)?(#[^\s]*)? ```

### IP Addresses

``` # IPv4 \b(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\b

# IPv4 (simple, allows invalid like 999.999.999.999) \b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b

# IPv6 (simplified) (?:[0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4} ```

### Phone Numbers

``` # US phone (various formats) (?:\+1[-.\s]?)?$?\d{3}$?[-.\s]?\d{3}[-.\s]?\d{4} # Matches: +1 (555) 123-4567, 555.123.4567, 5551234567

# International (E.164) \+[1-9]\d{6,14} ```

### Dates and Times

``` # ISO 8601 date \d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])

# ISO 8601 datetime \d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d+)?(?:Z|[+-]\d{2}:\d{2})

# US date (MM/DD/YYYY) (?:0[1-9]|1[0-2])/(?:0[1-9]|[12]\d|3[01])/\d{4}

# Time (HH:MM:SS, 24h) (?:[01]\d|2[0-3]):[0-5]\d:[0-5]\d ```

### Passwords (Strength Check)

``` # At least 8 chars, 1 upper, 1 lower, 1 digit, 1 special ^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[!@#$%^&*()_+=-]).{8,}$ ```

### UUIDs

``` [0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12} ```

### Semantic Version

``` \bv?(\d+)\.(\d+)\.(\d+)(?:-([\w.]+))?(?:\+([\w.]+))?\b # Captures: major, minor, patch, prerelease, build # Matches: 1.2.3, v1.0.0-beta.1, 2.0.0+build.123 ```

## Parsing Patterns

### Log Lines

```bash # Apache/Nginx access log # Format: IP - - [date] "METHOD /path HTTP/x.x" status size grep -oP '(\S+) - - \[([^\]]+)\] "(\w+) (\S+) \S+" (\d+) (\d+)' access.log

# Extract IP and status code grep -oP '^\S+|"\s\K\d{3}' access.log

# Syslog format # Format: Mon DD HH:MM:SS hostname process[pid]: message grep -oP '^\w+\s+\d+\s[\d:]+\s(\S+)\s(\S+)\[(\d+)\]:\s(.*)' syslog

# JSON log — extract a field grep -oP '"level"\s*:\s*"\K[^"]+' app.log grep -oP '"message"\s*:\s*"\K[^"]+' app.log ```

### Code Patterns

```bash # Find function definitions (JavaScript/TypeScript) grep -nP '(?:function\s+\w+|(?:const|let|var)\s+\w+\s*=\s*(?:async\s*)?$[^)]*$\s*=>|(?:async\s+)?function\s*\()' src/*.ts

# Find class definitions grep -nP 'class\s+\w+(?:\s+extends\s+\w+)?' src/*.ts

# Find import statements grep -nP '^import\s+.*\s+from\s+' src/*.ts

# Find TODO/FIXME/HACK comments grep -rnP '(?:TODO|FIXME|HACK|XXX|WARN)(?:$[^)]+$)?:?\s+' src/

# Find console.log left in code grep -rnP 'console\.(log|debug|info|warn|error)\(' src/ --include='*.ts' --include='*.js' ```

### Data Extraction

```bash # Extract all email addresses from a file grep -oP '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' file.txt

# Extract all URLs grep -oP 'https?://[^\s<>"]+' file.html

# Extract all quoted strings grep -oP '"[^"\\]*(?:\\.[^"\\]*)*"' file.json

# Extract numbers (integer and decimal) grep -oP '-?\d+\.?\d*' data.txt

# Extract key-value pairs (key=value) grep -oP '\b(\w+)=([^\s&]+)' query.txt

# Extract hashtags grep -oP '#\w+' posts.txt

# Extract hex colors grep -oP '#[0-9a-fA-F]{3,8}\b' styles.css ```

## Language-Specific Usage

### JavaScript

```javascript // Test if a string matches const emailRegex = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/; emailRegex.test('[email protected]'); // true

// Extract with capture groups const match = '2026-02-03T12:30:00Z'.match(/(\d{4})-(\d{2})-(\d{2})/); // match[1] = '2026', match[2] = '02', match[3] = '03'

// Named groups const m = 'John Doe, age 30'.match(/(?<name>[A-Za-z ]+), age (?<age>\d+)/); // m.groups.name = 'John Doe', m.groups.age = '30'

// Find all matches (matchAll returns iterator) const text = 'Call 555-1234 or 555-5678'; const matches = [...text.matchAll(/\d{3}-\d{4}/g)]; // [{0: '555-1234', index: 5}, {0: '555-5678', index: 18}]

// Replace with callback 'hello world'.replace(/\b\w/g, c => c.toUpperCase()); // 'Hello World'

// Replace with named groups '2026-02-03'.replace(/(?<y>\d{4})-(?<m>\d{2})-(?<d>\d{2})/, '$<m>/$<d>/$<y>'); // '02/03/2026'

// Split with regex 'one, two; three'.split(/[,;]\s*/); // ['one', 'two', 'three'] ```

### Python

```python import re

# Match (anchored to start) m = re.match(r'^(\w+)@(\w+)\.(\w+)$', '[email protected]') if m: print(m.group(1)) # 'user'

# Search (find first match anywhere) m = re.search(r'\d{3}-\d{4}', 'Call 555-1234 today') print(m.group()) # '555-1234'

# Find all matches emails = re.findall(r'[\w.+-]+@[\w.-]+\.\w{2,}', text)

# Named groups m = re.match(r'(?P<name>\w+)\s+(?P<age>\d+)', 'Alice 30') print(m.group('name')) # 'Alice'

# Substitution result = re.sub(r'\bfoo\b', 'bar', 'foo foobar foo') # 'bar foobar bar'

# Sub with callback result = re.sub(r'\b\w', lambda m: m.group().upper(), 'hello world') # 'Hello World'

# Compile for reuse (faster in loops) pattern = re.compile(r'\d{4}-\d{2}-\d{2}') dates = pattern.findall(log_text)

# Multiline and DOTALL re.findall(r'^ERROR.*$', text, re.MULTILINE) # ^ and $ match line boundaries re.search(r'start.*end', text, re.DOTALL) # . matches newlines

# Verbose mode (readable complex patterns) pattern = re.compile(r''' ^ # Start of string (?P<year>\d{4}) # Year -(?P<month>\d{2}) # Month -(?P<day>\d{2}) # Day $ # End of string ''', re.VERBOSE) ```

### Go

```go import "regexp"

// Compile pattern (panics on invalid regex) re := regexp.MustCompile(`\d{4}-\d{2}-\d{2}`)

// Match test re.MatchString("2026-02-03") // true

// Find first match re.FindString("Date: 2026-02-03 and 2026-03-01") // "2026-02-03"

// Find all matches re.FindAllString(text, -1) // []string of all matches

// Capture groups re := regexp.MustCompile(`(\w+)@(\w+)\.(\w+)`) match := re.FindStringSubmatch("[email protected]") // match[0] = "[email protected]", match[1] = "user", match[2] = "example"

// Named groups re := regexp.MustCompile(`(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})`) match := re.FindStringSubmatch("2026-02-03") for i, name := range re.SubexpNames() { if name != "" { fmt.Printf("%s: %s\n", name, match[i]) } }

// Replace re.ReplaceAllString("foo123bar", "NUM") // "fooNUMbar"

// Replace with function re.ReplaceAllStringFunc(text, strings.ToUpper)

// Note: Go uses RE2 syntax — no lookahead/lookbehind ```

### Command Line (grep/sed)

```bash # grep -P uses PCRE (Perl-compatible — full features) # grep -E uses Extended regex (no lookahead/lookbehind)

# Find lines matching a pattern grep -P '\d{3}-\d{4}' file.txt

# Extract only the matching part grep -oP '\d{3}-\d{4}' file.txt

# Invert match (lines NOT matching) grep -vP 'DEBUG|TRACE' app.log

# sed replacement sed 's/oldPattern/newText/g' file.txt # Basic sed -E 's/foo_([a-z]+)/bar_\1/g' file.txt # Extended with capture group

# Perl one-liner (most powerful) perl -pe 's/(?<=price:\s)\d+/0/g' file.txt # Lookbehind works in Perl ```

## Search-and-Replace Patterns

### Code Refactoring

```bash # Rename a variable across files grep -rlP '\boldName\b' src/ | xargs sed -i 's/\boldName\b/newName/g'

# Convert var to const (JavaScript) sed -i -E 's/\bvar\b/const/g' src/*.js

# Convert single quotes to double quotes sed -i "s/'/\"/g" src/*.ts

# Add trailing commas to object properties sed -i -E 's/^(\s+\w+:.+[^,])$/\1,/' config.json

# Update import paths sed -i 's|from '\''../old-path/|from '\''../new-path/|g' src/*.ts

# Convert snake_case to camelCase (Python → JavaScript naming) perl -pe 's/_([a-z])/uc($1)/ge' file.txt ```

### Text Cleanup

```bash # Remove trailing whitespace sed -i 's/[[:space:]]*$//' file.txt

# Remove blank lines sed -i '/^$/d' file.txt

# Remove duplicate blank lines (keep at most one) sed -i '/^$/N;/^\n$/d' file.txt

# Trim leading and trailing whitespace from each line sed -i 's/^[[:space:]]*//;s/[[:space:]]*$//' file.txt

# Remove HTML tags sed 's/<[^>]*>//g' file.html

# Remove ANSI color codes sed 's/\x1b\[[0-9;]*m//g' output.txt ```

## Common Gotchas

### Greedy vs lazy matching

``` Pattern: <.*> Input: bold Greedy matches: bold (entire string between first < and last >) Lazy matches: (stops at first >) Pattern: <.*?> (lazy version) ```

### Escaping special characters

``` Characters that need escaping in regex: . * + ? ^ $ { } [ ] ( ) | \ In character classes []: only ] - ^ \ need escaping

# To match a literal dot: \. # To match a literal *: \* # To match a literal \: \\ # To match [ or ]: \[ or \] ```

### Newlines and multiline

``` By default . does NOT match newline. By default ^ and $ match start/end of STRING.

# To make . match newlines: JavaScript: /pattern/s (dotAll flag) Python: re.DOTALL or re.S Go: (?s) inline flag

# To make ^ $ match line boundaries: JavaScript: /pattern/m (multiline flag) Python: re.MULTILINE or re.M Go: (?m) inline flag ```

### Backtracking and performance

``` # Catastrophic backtracking (avoid these patterns on untrusted input): (a+)+ # Nested quantifiers (a|a)+ # Overlapping alternation (.*a){10} # Ambiguous .* with repetition

# Safe alternatives: [a]+ # Instead of (a+)+ a+ # Instead of (a|a)+ [^a]*a # Possessive/atomic instead of .*a ```

## Tips

- Start simple and add complexity. `\d+` is almost always enough — you rarely need `[0-9]+`. - Test your regex on real data, not just the happy path. Edge cases (empty strings, special characters, Unicode) break naive patterns. - Use non-capturing groups `(?:...)` when you don't need the captured value. It's slightly faster and cleaner. - In JavaScript, always use the `g` flag for `matchAll` and global `replace`. Without it, only the first match is found/replaced. - Go's `regexp` package uses RE2 (no lookahead/lookbehind). If you need those, use a different approach or the `regexp2` package. - `grep -P` (PCRE) is the most powerful command-line regex. Use it over `grep -E` when you need lookahead, `\d`, or `\b`. - For complex patterns, use verbose mode (`re.VERBOSE` in Python, `/x` in Perl) with comments explaining each part. - Regex is the wrong tool for parsing HTML, XML, or JSON. Use a proper parser. Regex works for extracting simple values from these formats, not for structural parsing.

Back

Introduction

More Products

Tavily Web Search

Humanize AI text

Humanizer