Drew's grep tutorial

Introduction

In the simplest terms, grep (global regular expression print) is a small family of commands that search input files for a search string, and print the lines that match it. Although this may not seem like a terribly useful command at first, grep is considered one of the most useful commands in any Unix system. Grep is made up of three separate, yet connected commands, grep, egrep, and fgrep, a sort of holy trinity of Unix commands. All three of the grep commands work the same way. Beginning at the first line in the file, grep copies a line into a buffer, compares it against the search string, and if the comparison passes, prints the line to the screen. Grep will repeat this process until the file runs out of lines. Notice that nowhere in this process does grep store lines, change lines, or search only a part of a line.



A Simple Example

The simplest possible example of grep is simply:

grep "boot" a_file

In this example, grep would loop through every line of the file "a_file" and print out every line that contains the text "boot." To see this command in action, you will need to provide a file for grep to process. You may either create your own, or if you wish to follow along with this tutorial, you can fetch my example file from this wesite by typing the following command at the command prompt

wget http://www.uccs.edu/~ahitchco/grep/a_file

Alternatively, you may click here and download the file, or you could open your favorite editor and type it in manually. The file's contents are:

boot
book
booze
machine
boots
bungie
bark
aardvark
broken$tuff
robots

The file is not particulary interesting, but it gives us something to test our commands with. When you are ready to proceed, try the following command

grep "boo" a_file

Grep will list all of the lines that contain the word 'boo':

boot
book
booze
boots

Useful Options

This is nice, but if you were working with a large c file of something similar, it would probably be much more useful to you if the lines identified which line in the file they were, what way you could track down a particular string more easily, if you needed to open the file in an editor to make some changes. This can be accomplished by ading the -n parameter:

grep -n "boo" a_file

This yeilds a much more useful result, which explains which lines matched the search string:

1:boot
2:book
3:booze
5:boots

Another interesting switch is -v, which will print the negative result. In other words, grep will print all of the lines that do not match the search string, rather than printing the lines that match it. In the following case, grep will print every line that does not contain the string "boo," and will display the line numbers, as in the last example

grep -vn "boo" a_file

In this particular case, it will print

4:machine
6:bungie
7:bark
8:aaradvark
9:robots

The -c option tells grep to supress the printing of matching lines, and only display the number of lines that match the query. For instance, the following will print the number 4, because there are 4 occurences of "boo" in a_file.

grep -c "boo" a_file
4

The -l option prints only the filenames of files in the query that have lines that match the search string. This is useful if you are searching through multiple files for the same string. like so:

grep -l "boo" *

An option more useful for searching through non-code files is -i, ignore case. This option will treat upper and lower case as equivalent while matching the search string. In the following example, the lines containg "boo" will be printed out, even though the search string is uppercase.

grep -i "BOO" a_file

The -x option looks for eXact matches only. In other words, the following command will print nothing, because there are no lines that only contain the pattern "boo"

grep -x "boo" a_file

Finally, -f allows you to specify a file containing the search string, one instance where this could be useful is if one had a complex search string that one may not want to type over and over again.

echo "i want to search for this text" > search
grep -f search a_file

Regular Expressions

Since grep is named the "global regular expression print" it's not surprising that grep can also search for regular expressions in addition to normal strings. Regular expressions are searched for in the same way a normal string is. In fact, the strings we entered before were just very simple regular expressions. If you are unfamiliar with regular expressions, this page provides an excellent tutorial. The following command will search the file for lines ending with the letter e:

grep "e$" a_file

This will, of course, print

booze
machine
bungie

egrep

While grep supports a handful of regular expression commands, it does not support certain useful sequences such as the + and ? operators. If you would like to use these, you will have to use extended grep (egrep). Egrep is equivalent to grep -E, but as it is fairly common to want the extended functionality, egrep is also its own separate command.The Following command illustrates the ?, which matches 1 or 0 occurences of the previous character

grep "boots?" a_file

This query will return

boot
boots

One of the more powerful constructs that egrep supports that grep does not is the pipe (|), wich funcitons as an "or." another way I could get the same result as above with a different query is:

egrep "boot|boots"

fgrep

Fgrep is the third member of the grep family. It stands for "fast grep" and for good reason. Fgrep is faster than other grep commands because it does not interpret regluar expressions, it only searches for strings of literal characters. Fgrep is equivalent to grep -F. If one fgreped for boot|boots, rather than interpreting that as a search for either the word boot or the word boots, frep would simply search for the literal string "boot|boots" in the file. For instance, with normal grep the following command would search for lines ending with the word "broken"

fgrep "broken$" a_file

However, we can see that with fgrep, it will return the line "broken$tuff" because it is not interpreting the dollar sign, only the entire string as literal characters. It is a good practice to use fgrep instead of grep for situations like these.

Examples

Now that we have skimmed over the basic funtions of the commands in the grep family, we can look at a few examples of more advanced functionality. The following example is an example of grepping through the output of another program rather than a file. This particualar example will print out the files that find returns that contain the text "hello" (although this could be done without using grep at all)

find | grep "hello"

Normally, grep does not have a way to search through portions of files, but when the file is first processed by another program, this is possible. This example performs a grep on the last 8 lines of a_file

tail -n8 a_file | grep "boo"

By using the exec switch with the find command, we can find files that contain the search string. The following will search for the string "boo" in every directory below the current directory

find . -exec grep "boo" {} \;

grep is the only command of the three that supports backreferences and saving. The following uses backreferences to find lines that contain two of the same lowercase letter in succession.

grep "\([a-z]\)\1" a_file