LPI Linux Certification/Process Text Streams Using Filters

From Wikibooks, open books for an open world
< LPI Linux Certification
Jump to: navigation, search

Detailed Objective[edit]

Weight: 3

Description:
Candidates should be able to apply filters to text streams.

  • Key knowledge area(s):
    • Send text files and output streams through text utility filters to modify the output using standard UNIX commands found in the GNU textutils package.
  • The following is a partial list of the used files, terms and utilities:
    • cat
    • cut
    • expand
    • fmt
    • head
    • od
    • join
    • nl
    • paste
    • pr
    • sed
    • sort
    • split
    • tail
    • tr
    • unexpand
    • uniq
    • wc

Pattern matching and wildcards[edit]

Wildcards are pattern matching characters commonly used to find file names or text within a file. Common utilizations of a wildcard are: locating file names that you don't fully remember, locating files that have something in common, or performing operations on multiple files rather than individual.

The shell interprets these special characters:

! @ # $ % ^ & * ( ) { } [ ] | \ ; ~ ' " ` ?

The characters used for wildcard are:

?  *  [  ]  ~

If you use the wildcard characters the shell will try to generate a file from them. Try the following:

echo all files *

Special wildcard characters

? match any one character.
* Any string
[abcfghz] One char set
[a-z] One char in range
[!x-z] Not in set
~ Home directory
~user User home directory

Examples:

? One character filenames only
[aA]??? Four characters, starting with a or A.
~toto pathname of toto home directory
[!0-9]* All string not starting with a number.

What about these commands?

ls [a-z][A-Z]??.[uk]
ls big*
ls a???a
ls ??*

Shell and wildcards[edit]

A shell command line can be a simple command or more complex.

ls -l [fF]*
ls *.c | more
ls -l [a-s]* | mail `users`

The first event in the shell is to interpret wildcards. Only the shell interprets unquoted wildcards.

Quoting and Comments[edit]

Quoting[edit]

Do quote to prevent the shell interpreting the special characters and to transform multiple words into one shell word.

  • 'string' - Nearly everything within the quote is literal:
echo 'He did it, "Why?"'
echo 'Because "#@&^:-)"'
echo '$VAR='Me
  • "string" - Like 'string', however it interprets $, \, !:
echo "What's happening?"
echo "I don't know but check this $ANSWER"
  • The backslash (\) treats the following character as literal:
echo \$VAR=Me
echo What\'s happening\?
  • How could we display the backslash? With the following line:
echo \\

Comments[edit]

You can add comments in a command line or a script. Use the character #. A white space must immediately precede #.

Examples:

echo $HOME # Print my Home directory
echo "### PASSED ###" # Only this part is a comment
echo The key h#, not g was pressed.

Commands

  • cat, tac: Concatenate files and print on the standard output, from beginning to end or end to beginning, respectively.
  • head, tail: Output the first and last part of files.
  • nl: Number lines of files.
  • wc: Print the number of lines, words, and bytes (in that order) in files.
  • cut: Remove sections from each line of files.
  • tr: Translate or delete character.
  • expand, unexpand: Convert tabs to spaces and space to tabs.
  • paste: Merge lines of files.
  • join: Join lines of two files on a common field.
  • uniq: Remove duplicate lines from a sorted file.
  • split: Split a file into pieces.
  • fmt: Simple optimal text formatter.
  • pr: Convert text files for printing.
  • sort: Sort lines of text files.
  • od: Dump files in octal and other formats.

Concatenate files[edit]

To concatenate files, use cat.

cat [options] [files...]
tac [options] [files...]

The results are displayed to the standard output.

Common options:

-s: never more than one single blank line.
-n: number all output lines.

Examples:

cat file  # Display file to the standard output.
cat chapter* # Display all chapters to standard output.
cat -n -s file # Display file with line number with single blank line.

To concatenate files in reverse order, use tac.

View the begining and the end of a file[edit]

To view only few lines at the beginning or at the end of a file, use head or tail.

head [options] [files...]
tail [options] [files...]

The results are displayed to the standard output.

Common options:

-n: number of lines to be displayed. (head and tail)
-c: number of bytes to be diplayed (head and tail)
-f: append output. (tail)
-s #: iteration for new data every # sec. (tail)

Examples:

head file # Display the first 10 lines of file.
head -n 2 file # Display the first 2 lines of file.
tail -c 10 file # Display the last 10 bytes of file.
tail -f -s 1 /var/log/messages 
Display the last 10 lines of messages, block and check for new data every second.

Numbering file lines[edit]

To add the line number to a file, use nl.

nl [options] [files...]

The results are displayed to the standard output.

Common options:

-i #: increment line number by #.
-b: numbering style:
   a: number all lines
   t: non-empty lines
   n: number no lines
-n: numbering format:
   rz: right justified
   ln: left justified.

Examples:

nl file # Add the line number in each line in the file.
nl -b t -n rz file # Add the line number to each non-empty line with zero-completed format.

Counting items in a file[edit]

To print the number of lines, words and bytes of a file, use wc.

wc [options] [files...]

The results are displayed to the standard output.

Common options:

-c: print the size in bytes.
-m: print the number of characters.
-w: print the number of words.
-l: print the number of lines.
-L: print the length of the longest line.

Examples:

wc *.[ch] # Display the number of lines, words, and characters for all files .c or .h.
wc -L file # Display the size of the longest line.
wc -w file # Display the number of words.

Cutting fields in files[edit]

To remove sections from each line of files, use cut.

cut [options] [files...]

The results are displayed to the standard output.

Common options:

-b #: Extract the byte at position #.
-f #: Extract the field number #.

Examples:

cut -b 4 file # Extract and display the 4th byte of each line of file. 
cut -b 4,7 file # Extract and display the 4th and 7th byte of each line.
cut -b -2,4-6, 20- file # Extract characters leading up to 2 (1 and 2), 4 to 6, and 20 to the end of the line for each line of file.
cut -f 1,3 -d: /etc/passwd # Extract the username and ID of each line in /etc/passwd.

The default delimiter is TAB but can be specified with -d.

Character conversion[edit]

To translate the standard input (stdin) to standard output, use tr.

tr [options] SET1 SET2

Common options: -d: delete character in SET1. -s: replace sequence of characters in SET1 by one.

Examples:

tr ‘a‘ 'A'  # Translate lower a with A
tr ‘[A-Z]’ ‘[a-z]’ # Translate uppercase to lowercase
tr -d ‘ ‘ # Delete all spaces from file

To convert tabs to spaces, use expand and to convert spaces to tabs, use unexpand.

expand  file
unexpand file

Line manipulation[edit]

To paste multiple lines of files, use paste.

paste [options] [files...]

Common options:

-d #: delimiter: Use # for the delimiter.
-s: serial: paste one file at a time.

Examples:

paste f1 f2 # Display line of f1 followed by f2.
paste -d: file1 file2 # Use ':' for the delimiter.

To join multiple lines of files, use join.

join file1 file2 

To remove duplicated line, use uniq.

uniq [options] [files...]

Common options:

-d: only print duplicated lines.
-u: only print unique lines.

Examples:

uniq -cd file # Display the number of duplicated line.

Splitting files

To split big files, use split.

split [options] file

Common options:

-l #: split every # lines.
-b #: split file in bytes or b for 512 bytes, k for kilobytes, m for megabytes.

Examples:

split -l 25 file  # Split file into 25-line files.
split -b 512 file # Split file into 512-byte files.
split -b 2b file  # Split file into 2*512-byte files.

Formatting for printing[edit]

To format a file, use fmt.

fmt [options] [files...]

Common options: -w #: maximum line width.

Examples:

$ fmt -w 35 file # Display lines with a maximum width of 35 characters.

To format a file for a printer, use pr.

pr [options] [files...]

Common options: -d: double space.

Examples:

$ pr -d file # Format file with double-spacing.

Sort lines of text files[edit]

To sort the lines of the named files, use sort.

sort [options] file

The results are displayed to the standard output.

Common options:

-r : Reverse
-f : Ignore case
-n : Numeric
-o file: Redirect output to file
-u : No duplicate records
-t; : Use ';' as delimiter, rather than tab or space.

Examples:

sort file -r
sort file -ro result

Binary file dump[edit]

To dump a binary file, use od.

od [options] file

The results are displayed to the standard output and start with an offset address in octal format.

Common options:

-c: each byte as a character
-x: 2-byte in hex
-d: 2-byte in decimal
-X: 4-byte in hex
-D: 4-byte in decimal

Examples:

$ od -cx /bin/ls
0000000 177   E   L   F 001 001 001  \0  \0  \0  \0  \0  \0  \0  \0  \0
       457f 464c 0101 0001 0000 0000 0000 0000
0000020 002  \0 003  \0 001  \0  \0  \0     224 004  \b   4  \0  \0  \0
       0002 0003 0001 0000 9420 0804 0034 0000
0000040   °   ²  \0  \0  \0  \0  \0  \0   4  \0      \0 006  \0   (  \0
       b2b0 0000 0000 0000 0034 0020 0006 0028
0000060 032  \0 031  \0 006  \0  \0  \0   4  \0  \0  \0   4 200 004  \b
       001a 0019 0006 0000 0034 0000 8034 0804

Exercises[edit]

  1. Use wildcard characters and list all filenames that contain any character followed by 'in' in the /etc directory.
  2. Use wildcard characters and list all filenames that start with any character between 'a' and 'e' that have at least two more characters and do not end with a number.
  3. Use wildcard characters and list all filenames of exactly 4 characters and all filenames starting with an uppercase letter. Do not descend into any directory found.
  4. Use wildcard characters and list all files that contain 'sh' in /bin.
  5. Display your environment variable HOME preceded by the string "$HOME value is:"
  6. Display the contents of $SHELL with two asterisk characters before and after it.
  7. How would you display the following string of characters as is with echo using double quote and \.
    • @ # $ % ^ & * ( ) ' " \
  8. Compose echo commands to display the following two strings:
    • That's what he said!
    • 'Never Again!' he replied.
  9. Display the number of words in all files that begin with the letter 'h' in the /etc directory.
  10. How would you send a 2M (megabyte) file with two 1.44 M floppy. How would you put back together the split file?
  11. What is the command to translate the : delimiter in /etc/password by #?