How To - Shell Scripting - Bash - Basics - 2 - AWK command usage

How To - Shell Scripting - Bash - Basics - 2 - AWK command usage

Hello Everyone,

In this post I will share what I have learned on awk.

A little bit introuduction of awk

AWK is a domain-specific language designed for text processing and
typically used as a data extraction and reporting tool. Like sed and
grep, it is a filter, and is a standard feature of most Unix-like
operating systems

awk has more variety of usages and all that information is well documented at
https://www.gnu.org/software/gawk/manual/gawk.html

All the code [more code in Gitlab repo than in blog, cause I am too lazy to reproduce my work :( ], along with sample files I am using in this blog post are commited to Github: https://github.com/rajagennu/awk_tutorial

AWK IF-Else

File: answers.txt

a,1,1
b,3,4
c,5,2
d,6,1
e,3,3
f,3,7
awk  -F  ','  '{if($2==$3){print $1","$2","$3} else {print "No Duplicates"}}' answers.txt

output

a,1,1
No Duplicates
No Duplicates
No Duplicates
e,3,3
No Duplicates

AWK While

First lets understand what NF in awk. as per documentation ‘NF is a predefined variable whose value is the number of fields in the current record. awk automatically updates the value of NF each time it reads a record. No matter how many fields there are, the last field in a record can be represented by $NF’

File: top10CEO.txt

Rich Lesser, Boston Consulting Group,99%
Shantanu Narayen, Adobe,99%
Peter Pisters, MD Anderson Cancer Center,99%
Gary C. Kelly, Southwest Airlines,98%
Alfred F. Kelly, Jr. Visa Inc.,97%
Satya Nadella, Microsoft,97%
Charles C. Butt, H.E.B.,97%
Ed Bastian, Delta Air Lines,97%
Paul Cormier, Red Hat,97%
Horacio D. Rozanski, Booz Allen Hamilton,97%
awk  -F',' '{i=0; while(i<=NF) { print i ":"$i; i++;}}' top10CEO.txt

0:Rich Lesser, Boston Consulting Group,99%
1:Rich Lesser
2: Boston Consulting Group
3:99%
0:Shantanu Narayen, Adobe,99%
1:Shantanu Narayen
2: Adobe
3:99%
0:Peter Pisters, MD Anderson Cancer Center,99%
1:Peter Pisters
2: MD Anderson Cancer Center
3:99%
0:Gary C. Kelly, Southwest Airlines,98%
1:Gary C. Kelly
2: Southwest Airlines
3:98%
0:Alfred F. Kelly, Jr. Visa Inc.,97%
1:Alfred F. Kelly
2: Jr. Visa Inc.
3:97%
0:Satya Nadella, Microsoft,97%
1:Satya Nadella
2: Microsoft
3:97%
0:Charles C. Butt, H.E.B.,97%
1:Charles C. Butt
2: H.E.B.
3:97%
0:Ed Bastian, Delta Air Lines,97%
1:Ed Bastian
2: Delta Air Lines
3:97%
0:Paul Cormier, Red Hat,97%
1:Paul Cormier
2: Red Hat
3:97%
0:Horacio D. Rozanski, Booz Allen Hamilton,97%
1:Horacio D. Rozanski
2: Booz Allen Hamilton
3:97%

AWK for loop

awk '{for (i = 1; i <= 3; i++) print $i}' top10CEO.txt

AWK Selectors

Selectors used for deciding whether a particular awk action should be executed or not.
For example display only CEOs with their name starting ‘S’

awk  -F','  '$1 ~ /^S/ {print $0}' top10CEO.txt

Shantanu Narayen, Adobe,99%
Satya Nadella, Microsoft,97%

relational expressions

awk  -F','  '$3 > "98%" {print $0}' top10CEO.txt

Rich Lesser, Boston Consulting Group,99%
Shantanu Narayen, Adobe,99%
Peter Pisters, MD Anderson Cancer Center,99%

Range patterns

awk  -F','  '/Peter Pisters/,/Satya Nadella/ {print $1 $3}' top10CEO.txt

Peter Pisters99%
Gary C. Kelly98%
Alfred F. Kelly97%
Satya Nadella97%

BEGIN…END

Special expression patterns include BEGIN and END which denote program initialization and end. The BEGIN pattern matches the beginning of the input, before the first record is processed. The END pattern matches the end of the input, after the last record has been processed.

awk  -F','  'BEGIN { print "starting list of top 10 CEOs" }; {print $1 $2 $3} END{print "end list of top 10 CEOs"}' top10CEO.txt

starting list of top 10 CEOs
Rich Lesser Boston Consulting Group99%
Shantanu Narayen Adobe99%
Peter Pisters MD Anderson Cancer Center99%
Gary C. Kelly Southwest Airlines98%
Alfred F. Kelly Jr. Visa Inc.97%
Satya Nadella Microsoft97%
Charles C. Butt H.E.B.97%
Ed Bastian Delta Air Lines97%
Paul Cormier Red Hat97%
Horacio D. Rozanski Booz Allen Hamilton97%
end list of top 10 CEOs

AWK ‘&&’ ‘||’ ‘!’

AWK supports && || !
I would like to see the CEO with score greater than 97% and starting with letter S

awk  -F','  '$3 > "97%" && $1 ~/^S/ { print $1}' top10CEO.txt

Shantanu Narayen

AWK variables

  • $0 -> Print full line
  • $1, $2… -> Field 1 and File 2…
  • NR-> Number of row, usually print the current row number
echo  -e  "Hello\nGoodMorning"  |  awk  '{print NR"\t" $0}'
1   Hello
2   GoodMorning
  • NF-> Number of fields, when you call NF it will print number of fields, and when you call $NF it will print last field, so if you have a use case like you would like to get last field, then $NF is the best case
echo  -e  "123,459,905\n456,544,345"  |  awk  -F','  '{print $NF}'
echo  -e  "123,459,905\n456,544,345"  |  awk  -F','  '{print NF}'

905
345
3
3

Dont mess with IFS and RS unless you know what you are doing.

AWK Length

awk  -F','  '{print "number of chars in line " NR "=" length($0)}' top10CEO.txt

number of chars in line 1=40
number of chars in line 2=27
number of chars in line 3=44
number of chars in line 4=37
number of chars in line 5=34
number of chars in line 6=28
number of chars in line 7=27
number of chars in line 8=31
number of chars in line 9=25
number of chars in line 10=44

Bottom Line:
AWK is a great text/file processing/manipulation with so many use cases. I just added what I have learned, if you like this post, please subscribe to my blog and add a star on my gitrepo : https://github.com/rajagennu/awk_tutorial

Thakn you.

0 comments:

Post a Comment