GAWK-1
Gnu Awk
Gawk
is the GNU version of the text processing tool.
In most GNU/Linux distributions, Gawk
is the default awk
implementation, so there is usually no difference in daily use.
The Gawk
command defaults to using ERE mode.
Basic Syntax
OPTIONS
: command options.program
: there is a program in this damn command.file
: the file being processed, if omitted, read from STDIN.
Omitting file
enters interactive mode, where one line is executed at a time.
Execution Process
- Read a line of data:
- If there is a matching rule:
- If the match is successful: perform the corresponding operation.
- If the match fails: do not perform the operation.
- If there is no matching rule: perform the corresponding operation.
- If there is a matching rule:
Basic Usage
Create the foo file.
For each line of data, Gawk
defaults to using space/tab to separate fields.
$N
: represents the Nth field.$0
: represents the entire line of data.
BEGIN/END Structure
- BEGIN: initialization, executed before interpretation.
- BODY: executed once for each record.
- END: end of processing.
Note the use of single quotes 'EOF'
to create the file, so that special characters $
are not processed.
Common Options
Specify Separator
The -F
option can modify the line separator.
Specify File
The -f
option can specify a file.
Assign Variable Parameters
The -v
option can assign values to variables before BEGIN.
If you don’t need to use it in BEGIN, you can omit the -v
parameter.
Built-in Variables
Variable $N
$N
can also be assigned, and double quotes for strings cannot be omitted.
Variable FS
Field Separator, field separator.
Variable NF
Number of Fields, represents the number of fields in the record.
Variable NR
Number of Records, represents the current record number being processed, the default value is 1, and 1 is added after processing each line.
Can be used to skip the first line of text, the NR
value of the first line is 1.
Variable RS
Record Separator, input record separator, the default value is \n
, which means that each record is separated by a newline.
Setting RS
to ""
means that an empty line is used as the record separator. For the following text, it will be divided into two records, upper and lower.
Set FS="\n"
, then you can get each line of record through $N
. RS
and FS
are usually used together.
Variable OFS
Output Field Separator, output field separator.
Variable FIELDWIDTHS
Specify character width for separation.
Conditional and Structure
Conditional Expression
==
, <
, <=
, >
, >=
.
Output all users who start with bash.
Conditional Statement
A single statement inside if
does not need {}
.
Multiple statements inside if
need {}
.
For a single line else
statement, the previous statement needs a ;
.
Multiple lines do not need a semicolon.
FOR Statement
Calculate the sum of each field for each line, both +=
and ++
are supported.
WHILE Statement
Calculate the sum of each field for each line.
DO-WHILE Statement
Calculate the sum of each field for each line
Function Related
Built-in Functions
int(x)
: take the integer part of x.exp(x)
: x to the power.sqrt(x)
: square root of x.rand()
: a random number greater than 0 and less than 1.length(x)
: length of string x.tolower(x)
: convert x to lowercase.toupper(x)
: convert x to uppercase.
There are many more, such as gensub
, gsub
.
Custom Functions
Custom functions must appear before BEGIN
block.
You can use function library files and then reference them.
The gawk program file is as follows.
Use the -f option to reference two files.
You cannot use inline program mode when referencing function libraries, you need to reference both.
Other Examples
Custom Variables
Support mathematical operations and floating point numbers, not stronger than bash 🤪.
Array Operations
Features: associative arrays, similar to dictionaries, unordered.
You can use numeric subscripts, which are actually dictionaries.
Traverse the array, delete elements.
Formatted Printing
Processing floating point numbers.
Specify width.
Left alignment.