regex.pcre #
regex.pcre Module Documentation
The regex.pcre module is a high-performance Virtual Machine (VM) based regular expression engine for V.
Key Features
- Non-recursive VM: Safe execution that avoids stack overflows on complex patterns.
- Zero-Allocation Search: Uses a pre-allocated
Machineworkspace for search operations. - Fast ASCII Path: Optimized path for characters < 128 to bypass heavy UTF-8 decoding.
- Bitmap Lookups: ASCII character classes use a 128-bit bitset for $O(1)$ matching.
- Instruction Merging: Consecutive character matches are mergedinto string blocks for faster execution.- Bitmap lookups: ASCII character classes use a 128-bit bitset for O(1) matching.
- NFA Virtual Machine: Executes bytecode instructions to simulate pattern matching.
- Dynamic Stack Growth: Automatically expands the backtracking stack to prevent false negatives.
- Zero-Allocation Search: Reuses a pre-allocated Machine workspace for search operations.
- Anchored Optimization: Patterns starting with '^' skip the scanning loop.
- Prefix Skipping: Uses Boyer-Moore-like skipping for literal prefixes.
Supported Syntax
| Feature | Syntax | Description |
|---|---|---|
| Literals | abc |
Matches exact characters (UTF-8 supported). |
| Wildcard | . |
Matches any character (excluding \n unless (?s) flag is used). |
| Alternation | | |
Matches the left OR right expression (e.g., cat|dog). |
| Quantifiers | *, +, ? |
Matches 0+, 1+, or 0-1 times. |
| Lazy | *?, +?, ?? |
Non-greedy versions of the above. |
| Repetition | {m,n} |
Matches between m and n times. {m,} for m or more. |
| Groups | (...) |
Capturing group. |
(?:...) |
Non-capturing group. | |
(?P<name>...) |
Named capturing group. | |
| Anchors | ^, $ |
Start/End of string (or line with (?m)). |
\b, \B |
Word boundary and Non-word boundary. | |
| Classes | [abc], [^abc] |
Character set and Negated character set. |
[a-z] |
Range of characters. | |
\w, \W |
Word / Non-word ([a-zA-Z0-9_]). | |
\d, \D |
Digit / Non-digit. | |
\s, \S |
Whitespace / Non-whitespace ( \t\n\r\v\f). | |
\a, \A |
Lowercase / Uppercase ASCII character class. | |
| Flags | (?i) |
Case-insensitive matching. |
(?m) |
Multiline mode (^ and $ match start/end of lines). | |
(?s) |
Dot-all mode (. matches newlines). |
Structs
Regex
The compiled regular expression object.
pub struct Regex {
pub:
pattern string // The original pattern
prog []Inst // Compiled VM bytecode
total_groups int // Number of capture groups
group_map map[string]int // Map for named groups
}
Match
Represents the result of a successful search.
pub struct Match {
pub:
text string // The full substring that matched
start int // Byte index where match starts
end int // Byte index where match ends
groups []string // Text captured by each group
}
Core Functions
compile
Compiles a pattern into a Regex object.
fn compile(pattern string) !Regex
find
Finds the first match in the text. Returns none if no match is found.
fn (r Regex) find(text string) ?Match
find_all
Returns all non-overlapping matches in a string.
fn (r Regex) find_all(text string) []Match
replace
Replaces the first match in text with repl. Supports backreferences like $1, $2.
fn (r Regex) replace(text string, repl string) string
change_stack_depth
Updates the maximum backtracking depth for the VM. Default is 1024. Use this if your pattern is extremely complex and returns none prematurely.
fn (mut r Regex) change_stack_depth(depth int)
Named Groups Example
import regex.pcre
fn main() {
r := pcre.compile(r'(?P<year>\d{4})-(?P<month>\d{2})')!
m := r.find('Date: 2026-02') or { return }
year := r.group_by_name(m, 'year')
month := r.group_by_name(m, 'month')
println('Year: ${year}, Month: ${month}') // Year: 2026, Month: 02
}
PCRE Compatibility Layer
To facilitate easier migration from other engines, a compatibility layer is provided:
| Function | Equivalent To |
|---|---|
new_regex(pattern, flags) |
compile(pattern) |
r.match_str(text, start, flags) |
r.find_from(text, start) |
m.get(idx) |
Retrieves match text (0) or capture group (1+). |
m.get_all() |
Returns [full_match, group1, group2, ...] |
Example:
import regex.pcre
r := pcre.new_regex(r'(\w+) (\w+)', 0)!
if m := r.match_str('hello world', 0, 0) {
println(m.get(0)?) // "hello world"
println(m.get(1)?) // "hello"
println(m.get(2)?) // "world"
}
Performance Note
Here is a clear summary of the optimizations implemented in the code:
- Raw Pointer Access: The VM bypasses standard array bounds checking by using
unsafepointer arithmetic for both the instruction set and the string text, significantly speeding up the hot loop.* Zero-Allocation Search: TheMachinestruct pre-allocates the backtracking stack andcapture arrays, ensuring that running a search (finding a match) creates no new heap allocations (garbage collection pressure is zero).* Fast ASCII Path: The code checks if a byte is< 128before decoding. If it is ASCII, itskips the expensive UTF-8 decoding logic entirely.* Bitmap Class Lookups: Character classes (like\w,\d,[a-z]) use a 128-bit bitset.Checking if an ASCII character matches a class is a single O(1) bitwise operation.* Instruction Merging: The compiler groups consecutive literal characters into a singlestringinstruction (e.g.,a,b,cbecomes"abc"), reducing the number of VM cycles required.* Prefix Skipping: If a pattern starts with a literal string, the engine scans ahead forthat substring (Boyer-Moore style) before initializing the VM, avoiding useless execution.* Anchored Optimization: If the pattern starts with^, the engine only attempts a match atthe start of the string (or line), skipping the character-by-character scan of the rest of the text.
fn compile #
fn compile(pattern string) !Regex
compile transforms a regex pattern string into a Regex object.
fn new_regex #
fn new_regex(pattern string, _ int) !Regex
new_regex is a helper wrapper to compile a regex pattern.
struct Machine #
struct Machine {
mut:
stack []int // Backtracking stack stores: [capture_states..., string_ptr, next_pc]
captures []int // Flat array of [start, end] byte indices for groups
}
Machine provides a workspace for VM execution. To ensure thread safety, this is created per top-level API call.
struct Match #
struct Match {
pub:
text string // The full text of the match
start int // Starting byte index in the source string
end int // Ending byte index in the source string
groups []string // Sub-strings captured by groups
}
Match contains the results of a successful regex match.
fn (Match) get #
fn (m Match) get(idx int) ?string
get returns the matched text for a specific group index. Index 0 returns the full match, 1..n returns capture groups.
fn (Match) get_all #
fn (m Match) get_all() []string
get_all returns a list of all captured strings, starting with the full match at index 0.
struct Regex #
struct Regex {
pub:
pattern string // The original regex string
prog []Inst // Compiled bytecode
total_groups int // Number of capture groups defined in pattern
group_map map[string]int // Mapping of names to indices for (?P...)
prefix_lit string // Pre-calculated literal prefix for fast-skip optimization
has_prefix bool // Whether a literal prefix exists
anchored bool // True if pattern starts with '^' (optimization hint)
pub mut:
max_stack_depth int // User-defined stack limit hint
}
Regex is the compiled regular expression object.
fn (Regex) new_machine #
fn (r &Regex) new_machine() Machine
new_machine allocates a new VM state machine. This isolates the runtime memory (stack/captures) from the compiled regex, allowing thread-safe usage.
fn (Regex) find #
fn (r &Regex) find(text string) ?Match
find returns the first match of the regex in text.
fn (Regex) find_from #
fn (r &Regex) find_from(text string, start_index int) ?Match
find_from returns the first match starting from start_index. Optimized with fast prefix skipping and anchor checks.
fn (Regex) find_all #
fn (r &Regex) find_all(text string) []Match
find_all returns all non-overlapping matches in text.
fn (Regex) replace #
fn (r &Regex) replace(text string, repl string) string
replace finds the first match and replaces it using repl. Supports $1, $2 backreferences.
fn (Regex) fullmatch #
fn (r &Regex) fullmatch(text string) ?Match
fullmatch returns a Match only if the pattern matches the entire text.
fn (Regex) group_by_name #
fn (r &Regex) group_by_name(m Match, name string) string
group_by_name retrieves text captured by a named group.
fn (Regex) match_str #
fn (r &Regex) match_str(text string, start_index int, _ int) ?Match
match_str is a compatibility alias for find_from.