Regular Expression

Introduction

Regular Expressions (regex) can be divided into the following categories based on different standards.

TypeAbbrFull Name
Basic Regular ExpressionsBREBasic Regular Expressions
Extended Regular ExpressionsEREExtended Regular Expressions
Perl Regular ExpressionsPCREPerl-Compatible Regular Expressions
POSIX Regular ExpressionsBRE & EREBRE & ERE

BRE and ERE are two types of regular expressions in the POSIX standard, BRE is more basic and requires escaping certain metacharacters, ERE is an extension of BRE, providing more metacharacters and functions. PCRE is a more powerful and flexible type of regular expression, widely used in modern programming languages such as: Python, Ruby, Javascript.

SED Command

Supports BRE and ERE, defaults to BRE.

BRE Pattern

This pattern requires escaping metacharacters, for example:

  • ):needs to be escaped with \)
  • |:needs to be escaped with \|
echo 'abc' | sed 's/\(b\|c\)/p/g'
app

ERE Pattern

Enables ERE using the -E or -r option, no need to escape metacharacters.

echo 'abc' | sed -E 's/(b|c)/p/g'
app

GAWK

Defaults to ERE mode.

echo 'abc' | gawk '{gsub(/(b|c)/, "p"); print }'
app

Special Characters

Characters with special meanings, need to be escaped.

.*[]^${}\+?|()

Although / is not a special character in regular expressions, it also needs to be escaped in sed and gawk.

Line Start ^

Matches the start position of a line.

echo 'aa bb' | sed -n '/^aa/p'

If ^ is not at the beginning, it is treated as a normal character, no need to escape.

echo 'aa b^b' | sed -n '/b^/p'

Line End $

Matches the end position of a line.

echo 'aa bb' | sed -n '/bb$/p'

If $ is not at the end, it is treated as a normal character, no need to escape.

echo 'aa b$b' | sed -n '/b$b/p'

Dot Character .

Matches any single character except newline.

echo 'abc' | sed -n '/a.c/p'

Character Group []

Character Class, can match any character within the group.

echo 'cat' | sed -n '/[ch]at/p'
echo 'yes' | sed -n '/[Yy][Ee][Ss]/p'

Excludes characters within the group.

echo 'bat' | sed -n '/[^ch]at/p'

Matches characters between c and e.

echo 'cat' | sed -n '/[c-e]at/p'

Matches characters between c and e or 0 and 9.

echo 'cat' | sed -n '/[c-e0-9]at/p'

Asterisk *

Matches the character before the * 0 or more times.

echo '24' | sed -n '/23*4/p'
echo '234' | sed -n '/23*4/p'
echo '2334' | sed -n '/23*4/p'
echo 'bat' | sed -n '/b[ae]*/p'
echo 'baaeeaet' | sed -n '/b[ae]*/p'

All the above examples can be successfully matched.

Question Mark ?

Matches the character before the ? 0 or 1 time.

echo 'at' | sed -En '/c?at/p'
echo 'ccbbat' | sed -En '/c?at/p'

The above examples can be matched, can be limited with ^.

echo 'ccbbat' | sed -En '/^c?at/p'

Above can only match at or cat.

Plus +

Matches the character before the + 1 or more times.

echo 'at' | sed -En '/c+at/p'

Interval

Specifies the number of matches for the character before {}.

echo 'cat' | sed -En '/^c{1}at/p'
echo 'ccat' | sed -En '/^c{1,2}at/p'

Vertical Line |

Represents the OR logic.

echo 'cat' | sed -En '/cat|hat/p'

Grouping ()

Grouping can be viewed as a whole.

echo 'cat' | sed -En '/(c|h)at/p'
echo 'Sun' | sed -En '/(S|s)un(day)?/p'