Hello Everyone,
In this post I will share what I have learned on awk
.
A little bit introuduction of awk
AWK is a domain-specific language designed for text processing and
typically used as a data extraction and reporting tool. Like sed and
grep, it is a filter, and is a standard feature of most Unix-like
operating systems
awk
has more variety of usages and all that information is well documented at
https://www.gnu.org/software/gawk/manual/gawk.html
All the code [more code in Gitlab repo than in blog, cause I am too lazy to reproduce my work :( ], along with sample files I am using in this blog post are commited to Github: https://github.com/rajagennu/awk_tutorial
AWK IF-Else
File: answers.txt
a,1,1
b,3,4
c,5,2
d,6,1
e,3,3
f,3,7
awk -F ',' '{if($2==$3){print $1","$2","$3} else {print "No Duplicates"}}' answers.txt
output
a,1,1
No Duplicates
No Duplicates
No Duplicates
e,3,3
No Duplicates
AWK While
First lets understand what NF in awk. as per documentation ‘NF is a predefined variable whose value is the number of fields in the current record. awk automatically updates the value of NF each time it reads a record. No matter how many fields there are, the last field in a record can be represented by $NF’
File: top10CEO.txt
Rich Lesser, Boston Consulting Group,99%
Shantanu Narayen, Adobe,99%
Peter Pisters, MD Anderson Cancer Center,99%
Gary C. Kelly, Southwest Airlines,98%
Alfred F. Kelly, Jr. Visa Inc.,97%
Satya Nadella, Microsoft,97%
Charles C. Butt, H.E.B.,97%
Ed Bastian, Delta Air Lines,97%
Paul Cormier, Red Hat,97%
Horacio D. Rozanski, Booz Allen Hamilton,97%
awk -F',' '{i=0; while(i<=NF) { print i ":"$i; i++;}}' top10CEO.txt
0:Rich Lesser, Boston Consulting Group,99%
1:Rich Lesser
2: Boston Consulting Group
3:99%
0:Shantanu Narayen, Adobe,99%
1:Shantanu Narayen
2: Adobe
3:99%
0:Peter Pisters, MD Anderson Cancer Center,99%
1:Peter Pisters
2: MD Anderson Cancer Center
3:99%
0:Gary C. Kelly, Southwest Airlines,98%
1:Gary C. Kelly
2: Southwest Airlines
3:98%
0:Alfred F. Kelly, Jr. Visa Inc.,97%
1:Alfred F. Kelly
2: Jr. Visa Inc.
3:97%
0:Satya Nadella, Microsoft,97%
1:Satya Nadella
2: Microsoft
3:97%
0:Charles C. Butt, H.E.B.,97%
1:Charles C. Butt
2: H.E.B.
3:97%
0:Ed Bastian, Delta Air Lines,97%
1:Ed Bastian
2: Delta Air Lines
3:97%
0:Paul Cormier, Red Hat,97%
1:Paul Cormier
2: Red Hat
3:97%
0:Horacio D. Rozanski, Booz Allen Hamilton,97%
1:Horacio D. Rozanski
2: Booz Allen Hamilton
3:97%
AWK for loop
awk '{for (i = 1; i <= 3; i++) print $i}' top10CEO.txt
AWK Selectors
Selectors used for deciding whether a particular awk action should be executed or not.
For example display only CEOs with their name starting ‘S’
awk -F',' '$1 ~ /^S/ {print $0}' top10CEO.txt
Shantanu Narayen, Adobe,99%
Satya Nadella, Microsoft,97%
relational expressions
awk -F',' '$3 > "98%" {print $0}' top10CEO.txt
Rich Lesser, Boston Consulting Group,99%
Shantanu Narayen, Adobe,99%
Peter Pisters, MD Anderson Cancer Center,99%
Range patterns
awk -F',' '/Peter Pisters/,/Satya Nadella/ {print $1 $3}' top10CEO.txt
Peter Pisters99%
Gary C. Kelly98%
Alfred F. Kelly97%
Satya Nadella97%
BEGIN…END
Special expression patterns include BEGIN and END which denote program initialization and end. The BEGIN pattern matches the beginning of the input, before the first record is processed. The END pattern matches the end of the input, after the last record has been processed.
awk -F',' 'BEGIN { print "starting list of top 10 CEOs" }; {print $1 $2 $3} END{print "end list of top 10 CEOs"}' top10CEO.txt
starting list of top 10 CEOs
Rich Lesser Boston Consulting Group99%
Shantanu Narayen Adobe99%
Peter Pisters MD Anderson Cancer Center99%
Gary C. Kelly Southwest Airlines98%
Alfred F. Kelly Jr. Visa Inc.97%
Satya Nadella Microsoft97%
Charles C. Butt H.E.B.97%
Ed Bastian Delta Air Lines97%
Paul Cormier Red Hat97%
Horacio D. Rozanski Booz Allen Hamilton97%
end list of top 10 CEOs
AWK ‘&&’ ‘||’ ‘!’
AWK supports && || !
I would like to see the CEO with score greater than 97% and starting with letter S
awk -F',' '$3 > "97%" && $1 ~/^S/ { print $1}' top10CEO.txt
Shantanu Narayen
AWK variables
- $0 -> Print full line
- $1, $2… -> Field 1 and File 2…
- NR-> Number of row, usually print the current row number
echo -e "Hello\nGoodMorning" | awk '{print NR"\t" $0}'
1 Hello
2 GoodMorning
- NF-> Number of fields, when you call NF it will print number of fields, and when you call $NF it will print last field, so if you have a use case like you would like to get last field, then $NF is the best case
echo -e "123,459,905\n456,544,345" | awk -F',' '{print $NF}'
echo -e "123,459,905\n456,544,345" | awk -F',' '{print NF}'
905
345
3
3
Dont mess with IFS and RS unless you know what you are doing.
AWK Length
awk -F',' '{print "number of chars in line " NR "=" length($0)}' top10CEO.txt
number of chars in line 1=40
number of chars in line 2=27
number of chars in line 3=44
number of chars in line 4=37
number of chars in line 5=34
number of chars in line 6=28
number of chars in line 7=27
number of chars in line 8=31
number of chars in line 9=25
number of chars in line 10=44
Bottom Line:
AWK is a great text/file processing/manipulation with so many use cases. I just added what I have learned, if you like this post, please subscribe to my blog and add a star on my gitrepo : https://github.com/rajagennu/awk_tutorial
Thakn you.