CS 497C – Introduction to UNIX Lecture 29: - Filters Using Regular Expressions – grep and sed Chin-Chih Chang [email protected] Regular Expressions • egrep’s extended set includes two special characters - + and ?. They are often used in place of * to restrict the matching scope. • + - matches one or more occurrences of the previous character. • ? – matches zero or one occurrence of the previous character. $ egrep “true?man” emp.lst Regular Expressions • The |, ( and ) can be used to search for multiple patterns. $ egrep ‘wood(house|cock)’ emp.lst • sed is a multipurpose too which combines the work of several filters. • Designed by Lee McMahon, it is derived from the ed line editor. • sed is used to perform noniteractive operations. sed: The Stream Editor • sed has numerous features – almost bordering on a programming language but its functions have been taken over by perl. • Everything in sed is an instruction. An instruction combines an address for selecting lines with an action to be taken on them: sed options ‘address action’ file(s) • The address and action are enclosed within single quotes. sed: The Stream Editor • The components of a sed instruction are shown as below: sed ’1,$ s/^bold/BOLD/g’ foo address action • You can have multiple instructions in a single sed command, each with its own address and action components. • Addressing in sed is done in two ways: – By line number (like 3,7p). – By specifying a pattern (like /From:/p). Line Addressing • In the first form, the address specifies either a single line or a set of two (3,7) to select a group of contiguous lines. • The second one uses one or two patterns. • In either case, the action (p, the print command) is appended to this address. • You can simulate head -3 by the 3q instruction in which 3 is the address and q is the quit action. Line Addressing $ sed ‘3q’ emp.lst • sed uses the p (print) command to print the output. $ sed ‘1,2p’ emp.lst • By default, sed prints all lines on the standard output in addition to the lines affected by the action. So the addressed lines are printed twice.