Grep

From Wikibooks, open books for an open world
Jump to: navigation, search

Grep is a Unix utility that searches through either information piped to it or files in the current directory. An example should help clarify things.

Let's say that we wanted to search through a directory, and wanted to find all the files that had the string "hello" in their name. You might issue the 'ls' command in a shell to list the directory's content and:

$ ls
DumpSite.sh  crontab.txt  nagios-3.0.6  xmpppy  xymon-4.3.0-beta2

and look through everything manually, or you could use the 'ls' command and pipe the output of ls to grep:

$ ls |grep crontab
crontab.txt

On the contrary, if you want to filter a list unless some entries, put it in the parameter -v:

$ ls |grep -v crontab
DumpSite.sh
nagios-3.0.6
xmpppy
xymon-4.3.0-beta2

the '|' character is the representation of the pipe basically directs the output of the 'ls' command as input for grep. You should get a nice (perhaps empty) list with all the files that have "hello" in their names.

For search term, grep can take regular expressions rather than plain strings. A simple example for that might be looking for all .txt OR .jpg files in a directory :

$ ls | grep '.*\(txt\|jpg\)'

The regex here is made up from .* which can stand for anything in a file's name and \(txt\|jpg\) which yields either txt or jpg as file endings.

Options[edit]

Command-line options aka switches of grep:

  • -e pattern
  • -i: Ignore uppercase vs. lowercase.
  • -v: Invert match.
  • -c: Output count of matching lines only.
  • -l: Output matching files only.
  • -n: Precede each matching line with a line number.
  • -b: A historical curiosity: precede each matching line with a block number.
  • -h: Output matching lines without preceding them by file names.
  • -s: Output status only.
  • -x
  • -f file: Take regexes from a file.

Command-line options aka switches of GNU grep, beyond the bare-bones grep:

  • --help
  • -V, --version
  • --regexp=pattern, in addition to -e pattern
  • --invert-match, in addition to -v
  • --word-regexp, in addition to -w
  • --line-regexp, in addition to -x
  • -A num, --after-context=num
  • -B num, --before-context=num
  • -C num, -num, --context=num
  • and more ...

Links:

Regular expressions[edit]

Grep uses a particular version of regular expressions different from sed and Perl. Grep covers POSIX basic regular expressions (see also Regular Expressions/Posix Basic Regular Expressions).

Regular expression features available in grep include *, ., ^, $, [ ], [^ ], \( \), \n, \{i\}, \{i,j\}, \{i,\}.

Regular expression features available in GNU grep as a GNU extension include \?, \+, \b, \B, \<, \>, \w, \W, \s, \S.

Regular expression features available in grep with -E switch include ?, +, |, ( ), {i}, {i,j}, {i,}.

Predefined character classes supported by grep include [:alpha:], [:blank:], [:cntrl:], [:digit:], [:graph:], [:lower:], [:print:], [:punct:], [:space:], [:upper:], and [:xdigit:].

Regular expression features unavailable in grep include Perl's \d, \D, \A and \Z.

Links:

Examples[edit]

Examples of grep use:

  • echo file.txt | grep ".*\(txt\|doc\)"
    • Matches. "\(" and "\)" create a group, while "\|" separates items in the group. The group matches if at least one of its items matches.
  • echo a456 | grep "[a-zA-Z][0-9][0-9]*"
    • Matches. "[" and "]" are delimiters for character groups. "*" stands for zero, one, or any other number of the previous.
  • echo a456 | grep -i "[A-Z][0-9]\+"
    • Matches. "\+" stands for one or more occurrences of the previous. Unlike "*", "+" has to be preceded by "\". "-i" makes the search case-insensitive.
  • echo file.txt | grep -E ".*(txt|doc)"
    • Matches. "-E" stands for extended regular expressions. In extended regex, "(" and "|" do not need "\" to act as special characters; they need "\" to act as literals, that is, stand for themselves.
  • echo abbc | grep -E "abb?c"
    • In extended regular expressions enabled by -E switch, the question mark matches zero or one occurrences of the previous.
  • echo abbc | grep "abb\?c"
    • In GNU Grep, \? (question mark preceded by a backslash) matches zero or one occurrences of the previous.
  • echo a4c | grep -P "a\dc"
    • In GNU Grep of some versions, matches. "-P" stands for Perl regular expresions; "\d" in the regex stands for a digit.
  • grep -P "\x22hello\x22" file.txt
    • In GNU Grep of some versions, searches for the string starting with a quotation mark, followed with "hello", followed with another quotation mark. Makes use of "-P", which turns on Perl regex. In Perl regex, "\x22" stands for a quotation mark, via standing for the character with the hexadecimal ASCII value of 22.
  • grep -P "a\t+b" file.txt
    • In GNU Grep of some versions, refers to the tab character (tabulator) by "\t". Enabled by -P.
  • grep -r "soughtPattern" . --include=*.java
    • In GNU Grep of some versions, searches files recursively. Notice the period standing for the current directory.
  • grep -Fxv -f file2.txt file1.txt
    • Outputs set difference: file1.txt - file2.txt. Uses -F to interpret search term literally aka non-regex, -x to match whole lines only, -v to invert match, and -f to take the search terms from a file.
  • grep -Fx -f file1.txt file2.txt
    • Outputs set intersection: those lines of file1.txt that are also in file2.txt.
  • grep -P "Sch\xc3\xb6nheit" *
    • Search in unicode UTF-8 encoded files for the German word "Schönheit". Takes advantage of Perl regex via -P; uses \x followed by hexadecimals to search for the UTF-8 encoding of ö, which is C3B6. To find out the hexadecimal code UTF-8 text, use a UTF-8 enabled plain text editor to create a file containing the text, and then use hex showing program (hexdump on multiple operating systems) to find the hex code of the text. UTF-8 encoding is not be confused with the code point; the code point of ö is F6, while the UTF-8 encoding of it is C3B6.
  • perl -ne "print if /\x22hello\x22/" file.txt
    • Not really a grep example but a Perl oneliner that you can use if Perl is available and grep is not.

Versions[edit]

Old versions of GNU grep can be obtained from GNU ftp server.

Release announcements of GNU grep are at a savannah group.

A changelog of GNU grep is available from git.savannah.gnu.org.

A version of GNU grep for MS Windows is available from GnuWin32 project, as well as from Cygwin.

See also[edit]

External links[edit]