GAWK-1

Gnu Awk

Gawk is the GNU version of the text processing tool. In most GNU/Linux distributions, Gawk is the default awk implementation, so there is usually no difference in daily use.

readlink -f /usr/bin/awk
/usr/bin/gawk

The Gawk command defaults to using ERE mode.

Basic Syntax

gawk [OPTIONS] program file
  • OPTIONS: command options.
  • program: there is a program in this damn command.
  • file: the file being processed, if omitted, read from STDIN.

Omitting file enters interactive mode, where one line is executed at a time.

Execution Process

  • Read a line of data:
    • If there is a matching rule:
      • If the match is successful: perform the corresponding operation.
      • If the match fails: do not perform the operation.
    • If there is no matching rule: perform the corresponding operation.

Basic Usage

Create the foo file.

echo -e 'aa 11\nbb 22' > foo

For each line of data, Gawk defaults to using space/tab to separate fields.

  • $N: represents the Nth field.
  • $0: represents the entire line of data.
gawk '{print $1}' foo
aa
bb

BEGIN/END Structure

  • BEGIN: initialization, executed before interpretation.
  • BODY: executed once for each record.
  • END: end of processing.

Note the use of single quotes 'EOF' to create the file, so that special characters $ are not processed.

cat <<'EOF' > foo.gawk
BEGIN {
    FS=":"
    print "User\tShell"
    print "-------\t-------"
}
{
    print $1 "\t" $6
}
END {
    print "-------\t-------"
}
EOF
head -n 3 /etc/passwd | gawk -f foo.gawk
User    Shell
------- -------
root    /root
daemon  /usr/sbin
bin     /bin
------- -------

Common Options

Specify Separator

The -F option can modify the line separator.

gawk -F: '{print $1}' /etc/passwd | head -n 1

Specify File

The -f option can specify a file.

echo '{print $1 "-dir:" $6}' > foo.gawk
gawk -F: -f foo.gawk /etc/passwd | head -n 1
root-dir:/root

Assign Variable Parameters

The -v option can assign values to variables before BEGIN.

gawk -v n=2 'BEGIN{print 2*n}'
4

If you don’t need to use it in BEGIN, you can omit the -v parameter.

echo 'a b c' | gawk '{print $n}' n=2
b

Built-in Variables

Variable $N

$N can also be assigned, and double quotes for strings cannot be omitted.

echo 'hey man' | gawk '{$2="bro"; print $0}'
hey bro

Variable FS

Field Separator, field separator.

gawk 'BEGIN{FS=":"} {print $1}' /etc/passwd | head -n 1

Variable NF

Number of Fields, represents the number of fields in the record.

gawk -F: '$1=="root"{print $1":"$NF}' /etc/passwd
root:/bin/bash

Variable NR

Number of Records, represents the current record number being processed, the default value is 1, and 1 is added after processing each line. Can be used to skip the first line of text, the NR value of the first line is 1.

cat <<EOF > foo
name score
foo  90
bar  80
EOF
gawk '{if (NR>1) {if ($2>85) {print $1,$2}}' foo
foo 90

Variable RS

Record Separator, input record separator, the default value is \n, which means that each record is separated by a newline. Setting RS to "" means that an empty line is used as the record separator. For the following text, it will be divided into two records, upper and lower.

cat <<EOF > foo
apple
sweet
red

banana
sweet
yellow
EOF

Set FS="\n", then you can get each line of record through $N. RS and FS are usually used together.

gawk 'BEGIN{RS=""; FS="\n"} {print $1"\t"$3}' foo
apple  red
banana yellow

Variable OFS

Output Field Separator, output field separator.

echo 'aa,bb' | gawk 'BEGIN{FS=","; OFS="-"} {print $1,$2}'
aa-bb

Variable FIELDWIDTHS

Specify character width for separation.

echo 'abbc' | gawk 'BEGIN{FIELDWIDTHS="1 2 1"} {print $1,$2,$3}'
a bb c

Conditional and Structure

Conditional Expression

==, <, <=, >, >=.

gawk -F: '$7=="/bin/bash"{print $1}' /etc/passwd

Output all users who start with bash.

Conditional Statement

A single statement inside if does not need {}.

echo -e '10\n20' | gawk '{if ($1>15) print $1}'

Multiple statements inside if need {}.

echo -e '10\n20' | gawk '{if ($1>15) {x=2*$i; print x}'

For a single line else statement, the previous statement needs a ;.

echo -e '10\n20' | gawk '{if ($1>15) print $1; else print "no"}'

Multiple lines do not need a semicolon.

echo -e '10\n20' | gawk '{
if ($i>15) {
    x=2*$i
    print x
} else {
    print "no"
}
}'

FOR Statement

Calculate the sum of each field for each line, both += and ++ are supported.

echo '1 2 3' | gawk '{
total=0
for (i=1; i<=NF; i++) {
    total += $i
}
print total
}'

WHILE Statement

Calculate the sum of each field for each line.

echo '1 2 3' | gawk '{
i=1
total=0
while (i<=NF) {
    total += $i
    i++
}
print total
}'

DO-WHILE Statement

Calculate the sum of each field for each line

echo '1 2 3' | gawk '{
i=1
total=0
do {
    total += $i
    i++
} while(i<=NF)
print total
}'

Built-in Functions

  • int(x): take the integer part of x.
  • exp(x): x to the power.
  • sqrt(x): square root of x.
  • rand(): a random number greater than 0 and less than 1.
  • length(x): length of string x.
  • tolower(x): convert x to lowercase.
  • toupper(x): convert x to uppercase.

There are many more, such as gensub, gsub.

Custom Functions

Custom functions must appear before BEGIN block.

gawk '
function random(ts, num) {
    srand(ts)
    return int(num * rand())
}
BEGIN {
    ts=systime()
    print ts
    print random(ts, 10)
}'

You can use function library files and then reference them.

cat <<'EOF' > funclib.gawk
function random(ts, num) {
    srand(ts)
    return int(num * rand())
}
EOF

The gawk program file is as follows.

cat <<'EOF' > test.gawk
BEGIN {
    ts=systime()
    print ts
    print random(ts, 10)
}
EOF

Use the -f option to reference two files.

gawk -f funclib.gawk -f test.gawk

You cannot use inline program mode when referencing function libraries, you need to reference both.

Other Examples

Custom Variables

Support mathematical operations and floating point numbers, not stronger than bash 🤪.

gawk 'BEGIN{a=2; a=a*2/3; print a}'
1.33333

Array Operations

Features: associative arrays, similar to dictionaries, unordered.

gawk 'BEGIN{arr["name"]="foo"; print arr["name"]}'

You can use numeric subscripts, which are actually dictionaries.

gawk 'BEGIN{arr[3]="foo"; print arr[3]}'

Traverse the array, delete elements.

gawk 'BEGIN{
arr["a"]=1
arr[2]=2
arr["c"]="cat"
delete arr[2]
for (k in arr) {
    print "key:",k," val:", arr[k]
}
}
'
key: a  val: 1
key: c  val: cat

Formatted Printing

Processing floating point numbers.

gawk 'BEGIN{printf "%.2f\n", 2/3}'
0.67

Specify width.

echo -e 'foo\nfoobar' | gawk '{printf "%8s\n", $1}'

Left alignment.

echo -e 'foo\nfoobar' | gawk '{printf "%-8s\n", $1}'