Linux

A regular expression is a set of characters that specify a pattern, are used when you want to search for specify lines of text containing a particular pattern

Sample.txt

vim, sed, grep, more

Anchors are used to specify the position of the pattern in relation to a line of text

^ - matches the beginning of the line
- The "^" is only an anchor if it is the first character in a regular expression
$ - matches the end of the line
- The "$" is only an anchor if it is the last character
If you need to match a "^" at the beginning of the line, or a "$" at the end of a line, you must escape the special characters with a backslash
\< and \>, represent the start and end of a word

Character Sets match one or more characters in a single position

. (dot) - a single character
Specifying a Range of Characters with [...]
- [0-9], a single digit
- [a-zA-Z], a single character
- [^agd] - the character is not one of those included within the square brackets
- [0-9-z], any number, or any character between "9" and "z"
- [0-9\-a\]], any number, or a "-", a "a", or a "]"
Remember pattern
- $ and $, allows us to remember a pattern, recall the remembered pattern with "\" followed by a single digit

Modifiers specify how many times the previous character set is repeated

* - the preceding character matches 0 or more times
\{n,m\} - the preceding character matches at least n times and not more than m times, any numbers between 0 and 255 can be used. The second number may be omitted, which removes the upper limit

modifiers like "*" and "\{1,5\}" only act as modifiers if they follow a character set

^*, any line starting with an asterisk
\{4,8\}, any line starting with "{4,8}"

awk, egrep

( | ), match a choice of patterns

'^(From|Subject)',  "^[FS][ru][ob][mj]e*c*t*" with basic regular expression

? - the preceding character matches 0 or 1 times only

+ - the preceding character matches 1 or more times

\w, matches word characters

\W, matches nonword characters

\s, whitespace

\S, nonwhitespace

\d, digit

\D, nondigit

\A, beginning of a string

\b, word boundary

\B, nonword boundary

[[:alnum:], alphanumeric

[:cntrl:], control character

[:lower:], lower case character

[:space:], whitespace

[:alpha:], alphabetic

[:digit:], digit

[:print:], printable character

[:upper:], upper Case Character

[:blank:], whitespace, tabs, etc.

[:graph:], printable and visible characters

[:punct:], punctuation

[:xdigit:], extended Digit

grep "[[:digit:]]" sample.txt

pattern {action}

AWK is line oriented

The default pattern is something that matches every line

awk '/Fred/ {print $3}' sample.txt

BEGIN, specify actions to be taken before any lines are read

END, specify actions to be taken after the last line is read

BEGIN { print "START" }
      { print         }
END   { print "STOP"  }

awk Variable	Meaning
$0	Whole line
$1	The first field of the input line
FILENAME	Name of current input file
RS	Input record separator character
OFS	Output field separator string
ORS	Output record separator string
NF	Number of fields in input record
NR	Number of input record
OFMT	Output format of number
FS	Field separator character

awk '{print "# of field: " NF " # of records: " NR}' sample.txt

Commands

if ( conditional ) statement [ else statement ]
while ( conditional ) statement
for ( expression ; conditional ; expression ) statement
for ( variable in array ) statement
break
continue
{ [ statement ] ...}
variable=expression
print [ expression-list ] [ > expression ]
printf format [ , expression-list ] [ > expression ]
next
exit

Arithmetic

awk '{print $3, $3*10}' sample.txt

awk '{a=$3; b=$3*10; print a, b}' sample.txt

awk '{a=$3; total=total+a; print "Total:", $3, total}' sample.txt

Regular expression

~, match

!~, not match

# f.awk
{
	if ($1 ~ /Fred/)
		print $1, $3
	else
		print $0
}

awk -f f.awk sample.txt

# a.awk
/Susy/ {print $1, $3}

awk -f a.awk sample.txt, implement awk command from awk script

# b.awk
BEGIN {
	print "--------------------------"
	print "-------Sample.txt---------"
	print "--------------------------"
}

{
	total = total + $3
}

END {
	printf "Total: %10d\n", total
}

awk -f b.awk sample.txt

Flow control

# c.awk
BEGIN {
	print "Input an arithmetic expression: "
}

{
	if ( $2 == "+")
		result = $1 + $3
	else if ( $2 == "*")
		result = $1 * $3
	else
	{
		print "Operator is illegal ..."
		exit 1
	}
}

END {
	printf "Result: %10d\n", result
}

awk -f c.awk
1 + 2
Ctrl + D

Loop

# d.awk
BEGIN {
	print "==========Loop==========="
}

{
	sum = 0
	for( i = 0; i < 10; i++)
	{
		sum += i
	}

	printf "Total: %10d\n", sum
	exit 1
}

# e.awk
BEGIN {
	print "==================="
}

{
	for(j = 1; j <= NF; j++)
		printf "%10s", $j
	printf "\n"
}

Associate array

# g.awk
BEGIN {
	print "===========User List==========="
	idx = 0
}

{
	userName[idx] = $1
	idx++
}

END {
	for(i = 0; i < idx; i++)
		print userName[i];
}

# h.awk
BEGIN {
	print "===========User List==========="
}

{
	userName[$1] = $3
}

END {
	for(n in userName)
		print n, userName[n];
}

Numerical Functions

# i.awk
BEGIN {
	print "Arithmetic functions"
	print "===================="
}

{
	printf "%10s%10f\n", $1, cos($3)
}

# j.awk
BEGIN {
	print "Random Number"
	print "===================="
}

{
	printf "%10s%10f\n", $1, rand()
}

String Functions

index(string,search)

length(string)

split(string,array,separator)

{
	n = split($0, array, " ")
	for (i = 1; i <= n; i++)
		printf "%10s", array[i]
	printf "\n"
}

substr(string,position)

sub(regex,replacement, string), substitute the first match

gsub(regex,replacement, string), substitute with g option

{
	if(gsub("[aeiou]", "-", $0))
		print $0
}

match(string,regex)

{
	if (match($1, /Fred/))
		printf "%10s%10f\n", $1, rand()
}

system

{
	if(system("cat n.awk") != 0)
		print "Command does not work ..."
}

/g, global replacement

/p, print

/w, write to a file

/I, ignore case

/d, delete

/!, reversing

-n, not print anything unless an explicit request to print is found

Substitution

sed 's/Fred/Lin/g' sample.txt > temp.txt, replace "Fred" by "Lin"
sed 's/\(Susy\)\{1,\}/Lin/g' sample.txt, substitute one or more "Susy" with one "Lin"
sed 's/Susy/(&)/g' sample.txt, use & to represent the found string
sed -E 's/[0-9]+/(& &)/g', use extended regular expression with "-E" on Mac, "-r" on Linux system
sed 's/^\([a-zA-Z]\{4\}\) .*\([0-9][0-9]*\)/\2 \1/g sample.txt, remeber the patter 1 and 2 and substitute the line with 2 and 1
sed 's/fred/lin/Ig' sample.txt, substitute 'Fred', 'FRED', et.al. by 'lin'
sed -e 's/a/A/' -e 's/b/B/' sample.txt, multiple commands in one line

sed '2,8 s/Susy/Lin/g' sample.txt, substitle "Susy" from line 2 to line 8 by "Lin"
sed '/Fred/s/20/10/g' sample.txt, substitute "20" by "10" in the line containing "Fred"
sed '/Fred/s//Lin/g' sample.txt, substitute "Fred" by "Lin" in the line containing "Fred"
sed '/^[a-zA-Z]\{4\}/s//Lin/g' sample.txt, substitute the name containing four characters by "Lin"

sed '/^$/d', delete blank line
who | sed -n '/lchen/p', search 'lchen' in the output of command who
sed -n '/Susy/p', search the lines containing "Susy" and print them out
sed -n '/Fred/!p' sample.txt, print the line which does not contain "Fred"
sed '10 quit' sample.txt, quit at line 10
sed '/Susy/ i\ Add this line before every line with WORD', insert a line before the lines containing "Susy"
sed -n "/Susy/=", print the line number for the lines containing "Susy"
sed 'y/abcd/ABCD/' sample.txt, transfer "a" to "A", "b" to "B", et. al.

sed -f s.sed sample.txt, implement sed commands from sed script

1i\
Substitute the price in the line containing "Fred"
/Fred/s/20/10/g

/[pattern], search words matching a specific pattern

/Fred, find "Fred"
/\<Susy\>, search the single word Susy, not "SusySusy"
/\s\d$, search a single digit at the end of the line
/[aeiou]\{2\}, search the string which contains two consecutive vowel
/1.\{1,\}, search a number having two digits and starting with "1"
/".\{-\}", non-greedy search the content between two doule qutation marks

:range s[ubstitute]/pattern/string/cgiI

range

%, the whole file
number, an absolute line number
., the current line
$, the last line in the file
't, position of mark "t"
/pattern/, the next line where text "pattern" matches
?pattern?, the previous line where text "pattern" matches

cgiI

c, confirm each substitution
g, replace all occurrences in the line
i, ignore case for the pattern
I, don't ignore case for the pattern

:/me/ s/me/lin/g, substitute "me" by "lin" in the next line where the pattern matches
10,15, s/me/lin/g, substitute "me" from line 10 to line 15
10+1, 15, s/me/lin/g, substitute "me" from line 11 to line 15
:/me/ y, search the next line where the pattern matches and copy to the memory
:// normal p, search for the next Section line and put (paste) the saved text on the next line
:%s/me/lin/g, substitute "me" in the whole file by "lin"

:%s/[aeiou]\{2\}/VOWEL/g, replace the string which contains two consecutive vowel with "VOWEL"
:%s/\<Susy\>/TEMP/g, substitute the single word Susy with "TEMP"
:%s/\d\{2,\}$/100/g, substitute the two digit number by 100
:%s/\(Susy\)\{2,\}/Susy/g, substitute the repeat "Susy" by a single "Susy"

grep -n "mellon" sample.txt, match "mellon" in sample.txt and display the line numbers
grep -c "mellon" sample.txt, display how many lines match the pattern
grep -i "fred" sample.txt, make the search case insensitive
grep -v "mellon" sample.txt, take the complement of the regular expression
grep -l "mellon" *, print the filenames of files with lines which match the expression
grep --color=auto "^[A-K]", color the found key words

grep '[aeiou]\{2,\}' sample.txt, search the string which contains two consecutive vowel
grep "\<Susy\>" sample.txt, search the single word Susy, not "SusySusy"
grep "2.\{1,\}" sample.txt, search a number having two digits and starting with "2"
grep "\(Susy\)\{2,\}" sample.txt, search the string containing two consecutive "Susy"
grep "^[a-zA-Z]\{4\}\>" sample.txt, search a line starting with four characters
grep "\s[[:digit:]]\{1\}$" sample.txt --color=auto, search single digit at the end of the line

cat /etc/passwd | grep root
dmesg | grep -n --color=auto 'eth'
grep -r ‘energywise’ *, search the pattern in the current directory and its sub directories

egrep -n "mellon" sample.txt, match "mellon" in sample.txt and display the line numbers
egrep -c "mellon" sample.txt, display how many lines match the pattern
egrep -i "fred" sample.txt, make the search case insensitive
egrep -v "mellon" sample.txt, take the complement of the regular expression
egrep -l "mellon" *, print the filenames of files with lines which match the expression
egrep --color=auto "^[A-K]", color the found key words

egrep '[aeiou]{2,}' sample.txt, search the string which contains two consecutive vowel
egrep "\<Susy\>" sample.txt, search the single word Susy, not "SusySusy"
egrep "2.+" sample.txt, search a number having two digits and starting with "2"
egrep "(Susy){2,}" sample.txt, search the string containing two consecutive "Susy"
egrep "^[a-zA-Z]{4}\>" sample.txt, search a line starting with four characters
egrep "(or|is|go)" sample.txt, search the string containing "or", "is", or "go"
egrep "2$" sample.txt, search the string ending with "2"
egrep '^[A-K]' sample.txt, search the string starting with "A" to "K"