Regular Expression
Introduction
Regular Expressions (regex) can be divided into the following categories based on different standards.
Type | Abbr | Full Name |
---|---|---|
Basic Regular Expressions | BRE | Basic Regular Expressions |
Extended Regular Expressions | ERE | Extended Regular Expressions |
Perl Regular Expressions | PCRE | Perl-Compatible Regular Expressions |
POSIX Regular Expressions | BRE & ERE | BRE & ERE |
BRE and ERE are two types of regular expressions in the POSIX standard, BRE is more basic and requires escaping certain metacharacters, ERE is an extension of BRE, providing more metacharacters and functions. PCRE is a more powerful and flexible type of regular expression, widely used in modern programming languages such as: Python, Ruby, Javascript.
SED Command
Supports BRE and ERE, defaults to BRE.
BRE Pattern
This pattern requires escaping metacharacters, for example:
)
:needs to be escaped with\)
。|
:needs to be escaped with\|
。
ERE Pattern
Enables ERE using the -E
or -r
option, no need to escape metacharacters.
GAWK
Defaults to ERE mode.
Special Characters
Characters with special meanings, need to be escaped.
Although /
is not a special character in regular expressions, it also needs to be escaped in sed
and gawk
.
Line Start ^
Matches the start position of a line.
If ^
is not at the beginning, it is treated as a normal character, no need to escape.
Line End $
Matches the end position of a line.
If $
is not at the end, it is treated as a normal character, no need to escape.
Dot Character .
Matches any single character except newline.
Character Group []
Character Class, can match any character within the group.
Excludes characters within the group.
Matches characters between c
and e
.
Matches characters between c
and e
or 0
and 9
.
Asterisk *
Matches the character before the *
0 or more times.
All the above examples can be successfully matched.
Question Mark ?
Matches the character before the ?
0 or 1 time.
The above examples can be matched, can be limited with ^.
Above can only match at or cat.
Plus +
Matches the character before the +
1 or more times.
Interval
Specifies the number of matches for the character before {}
.
Vertical Line |
Represents the OR logic.
Grouping ()
Grouping can be viewed as a whole.