GAWK-2

Regular Expression

gawk defaults to using ERE mode.

Basic Usage

First, create the foo text file.

cat <<EOF > foo
a1,a2,a3
b1,b2,b3
EOF
gawk -F, '/1,a/{ print $1 }' foo
a1

Here, the entire line is matched, equivalent to $0.

Field-Specific Matching

$2 ~ specifies the use of the second field for matching.

gawk 'BEGIN{ FS="," } $2 ~ /^[ab]2/{ print $2 }' foo
a2
b2

Sub

Substitution, replaces the first matched string.

Basic Syntax

sub(regex, replacement [, target])
  • regex:The regular expression to match.
  • replacement:The string to replace the match.
  • target:Optional, the target string, defaults to $0.

Not providing target defaults to whole line matching.

Basic Usage

echo "aa bb aa" | gawk '{ sub(/aa/, "cc"); print }'
cc bb aa

Specify to replace the third column.

echo "aa bb aa" | gawk '{ sub(/aa/, "cc", $3); print }'
aa bb cc

Special Symbol &

echo "app cat" | gawk '{ sub(/\w+/, "[&]"); print }'
[app] cat

Gsub

Global Substitution, globally replaces.

Basic Syntax

gsub(regex, replacement [, target])
  • regex:The regular expression to match.
  • replacement:The string to replace the match.
  • target:Optional, the target string, defaults to $0.

Not providing target defaults to whole line matching.

Basic Usage

echo 'aa bb aa' | gawk '{ gsub("aa", "cc"); print }'
cc bb cc

Specify to replace the third column.

echo 'aa bb aa' | gawk '{ gsub("aa", "cc", $3); print }'
aa bb cc

Special Symbol &

echo "app cat" | gawk '{ gsub(/\w+/, "[&]"); print }'
[app] [cat]

First Character of a Word

\< indicates the start of a word.

echo 'app cat' | gawk '{ gsub(/\<[a-z]/, "[&]"); print }'
[a]pp [c]at

Last Character of a Word

\> indicates the end of a word.

echo 'app cat' | gawk '{ gsub(/[a-z]\>/, "[&]"); print }'
ap[p] ca[t]

Gensub

General Substitution, general replacement.

  • More powerful than sub and gsub.
  • Supports capture groups.
  • Can choose to replace specific matches.
  • Does not modify the original, returns the replaced string.
  • sub and gsub do not support capture groups.

Basic Syntax

gensub(regex, replacement, how [, target])
  • regex:The regular expression to match.
  • replacement:The replacement string, can use capture groups.
  • how:Can specify global or the Nth match replacement.
  • target:Optional, the target string, defaults to $0.

Basic Usage

Using g for global replacement.

echo "aa aa aa" | gawk '{ print gensub(/aa/, "bb", "g") }'
bb bb bb

Replace the second match.

echo "aa aa aa" | gawk '{ print gensub(/aa/, "bb", "2") }'
aa bb aa

Using Capture Groups

\1 represents the first matched parameter.

echo "aa-bb" | gawk '{ print gensub(/(\w+)-(\w+)/, "\\2:\\1", "g")}'
bb:aa