A Quick Introduction to Unix/Print version
| This is the print version of A Quick Introduction to Unix
You won't see this message or any elements not part of the book's content when you print or preview this page.
Unix is made up of three components
- the kernel;
- the shell; and
The kernel of Unix is the heart of the operating system. It allocates time and memory to programs and handles the file structure and communication between the different parts of the computer system such as the keyboard and the screen.
The shell is an interface between the user and the Unix kernel. It resembles the ‘dos box’ that Windows displays if you run the command cmd. When a user logs in, Unix checks their username and password and then starts a program called the shell. The shell interprets the commands the user types and transmits them to the kernel to be executed. These commands are programs.
There are a variety of shells available for the various Unix systems. The expert user can customise their own shell and users can use different shells on the same machine.
The shell and kernel work together like this:
- a user types cat somefile to display a file;
- the shell finds the program cat;
- the shell instructs the kernel to run the program cat on somefile;
- When the program finishes the kernel passes control back to the shell and displays the Unix prompt.
There are a number of different shells for Unix. People can become very attached to the shell they prefer. Popular shells include
- sh - the bourne shell
- bash - the bourne again shell
- csh - the c shell
- ksh - the Korn shell (strangely, not named for the band)
- zsh - the z shell
You can invoke any shell installed on your system with one of the above commands at the prompt. (Notice that this means you start a new shell within a shell!). We won't cover the distinctions about different shells at this time. When you begin to write Unix Shell Scripts you will probably want to choose a shell and stick with it if you can. Each shell has its advocates. The Bourne again shell is popular with many script writers. Some in the Unix community have expressed doubts about the suitability of the c shell for scripting, but this is something you can address when you know more about Unix (see Csh Programming Considered Harmful).
You can start a shell of a particular kind by typing the shell name at the command line prompt.
Programs are not part of the operating system as such, but they are logical sequences of commands, developed for implementing specific tasks. They usually include application software running at the user end.
Shells and subshells
Shells and subshells
When we open a terminal session a shell will be started for us. That is why we have a prompt visible in our terminal window.
Within a shell we can open another shell - a subshell - by invoking the shell program file or executable. Imagine for example that you have a shell open, the plain vanilla shell sh. You see this:
At the prompt in this shell, I can invoke a new subshell. I can invoke bash shell with the command bash. The new shell can be closed down with the command exit. If I invoke a bash subshell I will see something like:
And if I close this shell with exit I see:
Shells contain processes and variables
A shell provides
- a prompt for the user to communicate by starting, managing, interacting with and ending processes
- a container for processes and variables
When you invoke a subshell, any processes started in that shell are contained within it. Any data items that are created die with the shell unless they are committed to some persistent storage - perhaps written to a file on a hard drive or exported from the parent shell to new shells. Exporting is not covered here: we assume for now that variables die in the shell they are created in.
To make this more concrete, consider this sequence. We start from any shell in a terminal session and create a shell variable, like this
$ MYVARIABLE="This is the original shell"
To check the contents we can use echo and we prefix the variable name with $, like this:
$ echo $MYVARIABLE
Now we invoke a subshell. We will use the bash shell. Like this
This starts a new bash subshell which we can see from the changed prompt, and in the subshell we create a shell variable like this
bash% MYVARIABLE="This is the subshell. When it closes, this variable is destroyed!"
To check the contents of this variable we use echo exactly as before:
bash% echo $MYVARIABLE
Before going further, make sure you know what this will display.
Now, if we exit, the bash subshell closes and we return to our original shell. Work out what the result of typing
$ echo $MYVARIABLE
All Unix files are integrated in a single directory structure. The file-system is arranged in a structure like an inverted tree. The top of this tree is the root and is written as a slash ‘/’.
In the diagram on the left, we see that the home directory of the user ccaajim contains two sub-directories (stats and pictures) and a file called train.doc.
The full path to the file train.doc is /nfs/fs-i/UM0098/ccaajim/train.doc
This is rather different from the view you get of a Windows file structure. Unix integrates all the files into one directory structure rather than listing different physical storage devices each with its own root. The filestructure is logical rather than physical. If the computer had a CD ROM drive then it might appear as a folder under, say, nfs, called cd.
You will find some directories on all (or almost all) Unix systems and it may help to have an idea what they contain, even if most users will never go near some of them.
|Directory name||Typical contents|
|/bin||commands and programs used by all the users of the system|
|/boot||files required by the boot loader|
|/dev||CD/DVD-ROM, floppy drives, USB devices, etc.|
|/etc||System configuration files|
|/home||User data files|
Changing to different directories
cd (change directory)
The command cd somedirectory means change the current working directory to somedirectory. The current working directory is the directory that you are currently working in.
Listing Files and Directories
Listing files and directories
When you log in, you are in your home directory. This directory is associated with your userid, for example, ccaajim, and it is where your personal files and subdirectories are stored.
To find out what is in a directory you can type
The ls command lists the contents of your current working directory.
Some files will usually have been created by the System Administrator when your account was created. If no files are visible in your home directory, you will simply return to the Unix prompt.
ls lists only those files whose name does not begin with a dot (.) Files beginning with a dot are hidden files and usually contain program configuration information. They are hidden because they are typically generated by other programs and not directly edited by users.To list all the files in your home directory, including those whose names begin with a dot, type
% ls -a
The -a in the command is an example of an option. Options change the behaviour of commands. Test the output of ls -l and of ls -la.
Another useful option is -t which displays the directory contents newest first by timestamp.
pwd (print working directory)
Pathnames tell you where you are in the whole file-system. So, to find out the pathname to your home-directory, type cd ~ to get back to your home and then
The full pathname looks something like this -
which means that ccaajim (your home directory) is in the sub-directory UM0098 (the group directory),which in turn is located in the fs-i sub-directory, which is in the nfs sub-directory, which is in the top-level root directory, called / (pronounced root) .
Home directories and pathnames
In some Unix systems (including socrates on WTS) you will find it necessary to prefix pathnames with ~ if you want to ensure that they start from your home directory rather than the current directory or the root.
First use cd to get back to your home-directory, then type
% ls mytraining
to list the contents of your mytraining directory.
% ls archive
You will get a message like this -
archive: No such file or directory
The reason is that archive is not in your current working directory. To use a command on a file (or directory) that is not in the current working directory, you must either cd to the correct directory, or specify its full pathname. To list the contents of your archive directory you must type
% ls mytraining/archive
Since here the path is quite straightforward and relatively easy to type, using the full pathname will work reasonably well. It might be however, that the path is not so straightforward and then you will find the ~ abbreviation useful.
~ (your home directory)
Your home directory can also be referred to by the tilde ~ character. It can be used to specify paths starting at your home directory. So typing
% ls ~/mytraining
will list the contents of your mytraining directory, no matter where you currently are in the file system.
What do you think
% ls ~
Files and Processes
Everything in Unix is a file or a process. In Unix a file is just a destination for or a source of a stream of data. Thus a printer, for example, is a file and so is the screen.
A process is a program that is currently running. So a process may be associated with a file. The file stores the instructions that are executed for that process to run.
Another way to look at it is that file is a collection of data that can be referred to by name. Files are created by users either directly (using text editors, running compilers etc.) or indirectly (by running some program - like processing a text input file to produce a formatted file for printing).
Examples of files include:
- a text document;
- a program written in a programming language such as C++ or Java;
- a jpeg image;
- a directory: directories can be thought of as the analogue of Windows’ folders. Directories are files that contain links to other files.
The standard input and output and the standard error stream
There are two files that have somewhat opaque names, stdin and stdout. These names refer to default sources of and destinations for data. Consider the process initiated by the command ls. The default output of this process is a list of files in the current working directory which is then displayed on screen. This illustrates the default output stdout which is nothing but the screen. The standard input by contrast is the keyboard - thus also known as stdin.
In shell programming, it is often useful to prevent error messages from Unix commands from being displayed on screen. Instead, they are either suppressed or sent to a file. This is done by redirecting the error messages to a filename or to /dev/null - the null device or destination. To use these streams (stdin, stdout, stderr) in the shell, we refer to them by the numerical descriptors rather than by name.
To review: Commands effectively take their input from files and direct their output to files. By default, the output file is the screen, and the input file is the keyboard.
A wildcard is a character that can stand for all members of some class of characters. When you use a wildcard the computer systems substitutes the members of the class for the wildcard character. The examples below will make this clearer. We will use the command ls for illustration.
The * wildcard
The character * is a wildcard and matches zero or more character(s) in a file (or directory) name. For example, in your mytraining directory, you might type
% ls list*
This will list all files in the current directory starting with list.
You could type
% ls *list
This will list all files in the current directory ending with list.
The ? wildcard
The character ? will match exactly one character.
So ?ouse will match files like house and mouse, but not grouse.
An example use of this syntax is:
% ls ?list
Change to the bash shell.
Use the command pwd. Write down the full path to your home directory.
Look at the image of an example filestructure tree here. Trace and write down the path from the root to the folder ccaastu
Change your current view to your home directory.
List all the files with the extension .bat.
How many files are there? ls -1 | wc -l
mkdir (make directory)
You can make a subdirectory of your home directory for your own data files. To make a subdirectory called mytraining in your current working directory type
% mkdir mytraining
To see the directory you have just created, type
You can also make a hidden directory if you want to. Use a dot as the first character of the directory name for it to be hidden.
Creating down a path
Surprisingly often you want to create a directory but not directly at your current position in the directory tree. Suppose I am in my home directory and I want to create directories to hold some files for two courses but in
The first course is an introduction to Linux, second an introduction to Solaris. I can create these directories like this:
% mkdir ~/documents/2010/IT/training/Unix/Linux
% mkdir ~/documents/2010/IT/training/Unix/Solaris
Naturally, you don't have to specify the complete path from ~, but I have done so here for clarity.
Creating a whole path
Perhaps less often you want to create not a single directory but rather a directory subtree - that is a directory and directories beneath it. Suppose I am in my home directory and I want to create directories to hold some files for two courses: Redhat for Beginners and Advanced Redhat. We can do this with the -p option on the mkdir command.
Here is the command to create the first directory:
% mkdir -p ~/documents/2010/IT/training/Unix/Linux/Redhat/beginners
This creates the two directories we require in one go. We can then issue the command
% mkdir ~/documents/2010/IT/training/Unix/Linux/Redhat/advanced
Notice the difference between these two.
Files are essentially containers. You can fill a file with a variety of kinds of data, rather as you might fill a bucket with water or sand. There are occasions when the facility to create an empty bucket is useful.
The command touch
This creates a file called my_new_file. One other use of touch is to update a file's date and time. This is done with the option -t. For example
%touch -t 201010112230 my_new_file
This would update the time and date stamp for the file causing it to show the date and time 11 October 2010 at 22:30. The data and time format is yyyymmddhhmm.
You can check the effect of touch with ls -l
Why use touch?
One very good use for touch is to help create directory and file structures. Suppose that in a university departmental office each year you must create various administrative files. You will need a file of the new students; a file of staff; a file of options offered that year and so on. From your home you might first of all create a directory to hold the student files:
% mkdir -p /degreeadmin/2010/students
and then to be sure that you have the correct file structure
% touch /degreeadmin/2010/students/student_list.txt
and similarly for any other files and directories you need.
The abbreviated directories . and ..
The directories named . and .. are relative names. They are interpreted by the shell in the current context. While this takes a moment or two longer to grasp than ordinary absolute directory names, it is a very useful thing about Unix. In any directory you can type
% ls -a
As you will see, there are two directories listed called (.) and (..). These appear in all Unix directories.
Current directory (.)
In Unix . means the current directory, so typing
% cd .
means that you stay where you are.
This may not seem very useful at first, but you will often find it very useful – remember that it is a relative directory name.
Parent directory (..)
.. means the parent of the current directory, so typing
% cd ..
will take you up one directory.
Home directory (~)
Typing cd alone or cd ~ always returns you to your home directory. This is very useful if you are lost in the file system. Typing cd / takes you to the root.
What do you think
% ls ~/..
Use the commands cd, ls and pwd to explore the file system. (Remember, if you get lost, type cd by itself or cd ~ to return to your home-directory).
Make a directory in your home directory called unixstuff. Make another directory inside the unixstuff directory called archive.
To make a new copy of some file, say file1 in the current working directory with the name file2, you can use the command cp like this
cp file1 file2
Sometimes you will wish to copy a file across directories. There are two ways you could do this. First you might use cd to get to your destination directory.
% cd ~/mydirectory
Then at the Unix prompt, type,
% cp ~/science.txt .
Don't forget to type the dot . at the end of this first command line. Remember, in Unix, the dot means the current directory.
Now you can if you wish make copies of this file in this directory in the usual way, for example
% cp science.txt science.bak
Another way to achieve the same result is to use the full, absolute pathname - that is start at the root (/) and specify all the directories in the path for the original and all the directories in the path for the destination. You might have a command looking like
cp /nfs/fs-i/UM0098/ccaajim/train.doc /nfs/fs-i/UM0098/ccaajim/myretiredfiles/train.doc.bak
Of course, it's easy to make mistakes typing long path names.
Copying a directory and files
You can use cp to copy a directory and the files it contains (including subdirectories and their files) to a new location. The command looks like
cp -r ~/training/linux/* ~/training/backup/linux
This makes a copy of the complete contents of the directory linux in backup/linux. The new, target, directory will be created for you.
Suppose that you want to rename a file. In Unix, we talk about moving the file and the command that does it is mv. For example
mv file1 file2
moves (or renames) file1 to file2.
This differs from copying the file: you end up with only one file rather than two.
You can move files across directories by specifying the pathname just as you can for the cp or copy command.
Deleting files and directories
To delete (remove) a file you use the command rm. So you might create a copy of a science.txt file and then delete it. You would type
% cp science.txt tempfile.txt % ls % rm tempfile.txt % ls
To delete several files you may be able to use a wildcard. Consider the following command
% rm *.txt
This will delete all files in the current directory with names ending .txt. If you use extensions (the bit of the filename after the dot) systematically this can be useful.
rmdir (remove directory)
You can use the rmdir command to remove a directory. If you have an empty directory that you no long require for example, you can type
(Of course, your directory probably isn't called someoldolddirectory but I think you get the idea.)
If however, you try to remove a directory that has files in it, you will not be able to since Unix will not let you remove a non-empty directory. The solution is to use the option –r on the rm command like this
% rm -r directory
and this will remove a directory and all its subdirectories even if they contain files. The -r stands for recursive.
Move to the directory ~/unixstuff. Create a backup of your science.txt file by copying it to a file called science.bak and moving it to the archive directory you created above.
Create a directory called tempstuff using mkdir. Copy science.txt to tempstuff. Remove the directory tempstuff. What happens? Can you explain why?
When you issue a command that has some output, we say that by default it will write to the standard output which is the screen. If a command needs input we say it reads from the standard input which is the keyboard. The ls command which we have seen a lot of produces a list of files and directories as its output and it prints it on screen. We are going to use a new command - cat - to investigate how we can 'redirect' streams from the standard input or output.
Displaying file contents
You can use the command cat to take input and write it to the standard output. Usually you use it like this
and it will put the contents of myfile.txt on screen and then return you to the prompt. It scrolls the contents very fast!
Using cat to capture from the keyboard
When you use cat in the way described, the input is from a file and it writes to the standard output (you guessed it: the screen). But we can direct cat to take its input from elsewhere than a file and we can send its output to somewhere other than the screen.
You can use cat to capture what is typed at the keyboard and send it to some output. We can try it by typing cat without specifying a file name, like this
You can now type at the keyboard and when you have entered as many characters as you wish press return for a new line. When you are finished typing altogether, type Control-d (written as ^D for short) to end the process. If you run cat like this without naming a file to read, it reads the standard input (the keyboard) and on receiving the 'end of file' (^D), copies it to the standard output (the screen).
Redirecting the Output
As I have said, you can redirect both the input and the output from commands.
You use the > symbol to redirect the output to a file. For example, to create a file called colours containing a list of colour names, type
% cat > colours
Then type in the names of some colours. Press [Return] after each one and end with ^D.
pink yellow purple ^D(this means press [Ctrl] and [d] to stop).
The cat command reads the standard input (the keyboard) and the > redirects the output (which would normally go to the screen) into a file called colours
To read the contents of the file, type
% cat colours
(This is not a particularly good way to create text files - normally I would recommend an editor like vi or emacs or pico but we're learning about readirection, not text editing.)
Appending to a file
If you use >> to redirect, then standard output will be redirected to the end of the file and appended to the existing contents. So to add more items to the file colours, type
% cat >> colours
Then type in the names of some more colours
red ^D(Control D to stop)
This redirects the keyboard input to the end of the file colours.
To read the contents of the file, type
% cat colours
Now we are going to create create a file colours2
% cat > colours2 green blue ^D(Control D to stop)
You now have two files. One lists four colours, the other two.
Joining two files together
We will now use the cat command to join (concatenate) colours and colours2 into a new file called allcolours. This is in fact the original purpose of the command. You do it like this:
% cat colours colours2 > allcolours
What this is doing is reading the contents of colours and colours2 in turn and combining the results in the file allcolours. To read the contents of the new file, type
% cat allcolours
Redirecting the Input
We use the < symbol to redirect the input for a command.
For example, the command sort sorts a list alphabetically or numerically. Type
Then type in the names of some animals. Press [Return] after each one.
dog cat bird ape ^D(control d to stop)
The output will be
ape bird cat dog
Redirecting input to a file
Using < you can redirect the input to come from a file rather than the keyboard. For example, to sort the list of colours, type
% sort < allcolours
and the sorted list will be output to the screen.
To redirect the output of the sorted list to a file, type,
% sort < allcolours > sortedcolours
Use cat to display the contents of the file sortedcolours.
As well as redirecting the output of a command to a file, we can pass it along to another process using piping. For example
% sort < allcolours | wc -l
In this case, the output of the sort command is passed on to the wc command and processed before being displayed.
Searching Text Files
Searching the contents of a file
Simple searching using less
Using less, you can search though a text file for a keyword (pattern). For example, to search through science.txt for the word science, type
% less science.txt
This writes the first screenfull of the file to the screen. While still in less, you can type a forward slash [/] followed by the word to search
As you can see, less finds and highlights the keyword. Type [n] to search for the next occurrence of the word.
This is useful but relatively limited and inflexible. It is not hard to imagine simple situations where you might want to quickly check the contents of a file (is this the essay where I talked about left recursion?). But we often want to do just a bit more. That is where the very famous Unix utility grep comes in.
grep is one of many standard Unix utilities. It searches files for specified words or patterns. First clear the screen (type clear at the prompt), then type
% grep science science.txt
As you can see, grep has printed out each line containing the word science but it is case sensitive. If we type
% grep Science science.txt
we see it is distinguishes between Science and science. To ignore upper/lower case distinctions, use the -i option, i.e. type
% grep -i science science.txt
To search for a phrase or pattern (i.e. a string of characters with a space in it) you must enclose it in single quotes. For example to search for current domain, type
% grep -i 'current domain' science.txt
Some of the other options for grep are:
|-v||display those lines that do NOT match|
|-n||precede each matching line with the line number|
|-c||print only the total count of matched lines|
Try some of them and see the different results. Don't forget, you can use more than one option at a time. For example, the number of lines without the words science or Science is
% grep -ivc science science.txt
grep is one of the most powerful Unix utilities. There are extensions such as egrep as well. A good knowledge of the power of grep can make you a very productive Unix user. This is however a quick introduction and so this is all are going to cover.
wc (word count)
A handy little utility is the wc command, short for word count. To do a word count on science.txt, type
% wc -w science.txt
To find out how many lines the file has, type
% wc -l science.txt
More grep examples
Beginning of line
A search can be constrained to find the string at the beginning of the line with the symbol ^. Example:
grep '^A' filename
Finds the string A at the beginning of lines.
End of line
A search can be constrained to find the string at the end of the line with the symbol $. Example:
grep '5$' filename
Finds the string 5 at the end of lines.
Counting empty lines
The combination search string ^$ finds empty lines.
To match any single character
The meta-character . matches any single character except the end of line character.
The input file contains these lines:
one bone throne clone
We search with
grep '.one' filename
The results are
bone throne clone
The first line doesn't match.
To match zero or more characters
The meta-character * matches zero or more occurences of the previous character.
The input file bells containes these lines
bel bell belll be bet
We search with
grep 'bel*' bells
The results are
bel bell belll be bet
The input file is as the previous example. The . is used after the * to require at least a single character.
We search with
grep 'bel*.' bells
The results are
bel bell belll
Contrast this with the previous example. Here, we match everything except be.
The input file is as before.
We search with
grep 'bel.*' bells
The results are
bel bell belll
You can use a list of characters surrounded by [ and ] which will match on any single character in the list.
The input file is lines:
This is the zero line Here y Crosses x
we search with
grep [xyz] lines
The result is
This is the zero line Here y Crosses x
The input file is as before.
we search with
grep [xyb] lines
The result is
Here y Crosses x
File access rights
In your home directory, type
% ls -l
You will see that you now get lots of detail about the contents of your directory.
Each file (and directory) has access rights, which may be displayed by typing ls -l. Also, ls -lg gives additional information as to which group owns the file (istrain in the following example):
-rwxrw-r-- 1 ccaajim istrain 3210 Aug15 14:25 train.doc
In the left-hand column is a 10 symbol string consisting of the symbols d, r, w, x, -, and, occasionally, s or S. The important ones for you right now are r for read, w for write and x for execute. If d is present, it will be at the left hand end of the string, and indicates a directory: otherwise the string will start with -.
The 9 remaining symbols indicate the permissions, or access rights, and are taken as three groups of three.
The left group of 3 gives the file permissions for the user that owns the file (or directory) (ccaajim in the above example). The rightmost group gives the permissions for all others (called world in Unix speak). The middle three columns are the rights ceded to the group to which the use account belongs.
The symbols r, w, etc., have slightly different meanings depending on whether they refer to a simple file or to a directory.
Access rights on files
- r (or -), indicates read permission (or otherwise), that is, the presence or absence of permission to read and copy the file
- w (or -), indicates write permission (or otherwise), that is, the permission (or otherwise) to change a file
- x (or -), indicates execution permission (or otherwise), that is, the permission to execute a file, where appropriate
Access rights on directories
- r allows users to list files in the directory;
- w means that users may delete files from the directory or move files into it;
- x means the right to access files in the directory. This implies that you may read files in the directory provided you have read permission on the individual files.
So, in order to read a file, you must have execute permission on the directory containing that file, and hence on any directory containing that directory as a subdirectory, and so on, up the tree.
|-rwxrwxrwx||a file that everyone can read, write and execute (and delete)|
|-rw-------||a file that only the owner can read and write - no-one else can read or write and no-one has execution rights (e.g. your mailbox file)|
Changing access rights
chmod (changing file mode)
Only the owner of a file can use chmod to change the permissions of a file. The options of chmod are as follows
|a||all (that is u and g and o)|
|w||write (and delete)|
|x||execute (and access directory)|
|-||take away permission|
For example, to remove read write and execute permissions on the file allcolours for the group and others, type
% chmod go-rwx allcolours
This will leave the other permissions unaffected.
To give read and write permissions on the file allcolours to all,
% chmod a+rw allcolours
Using integer parameters with chmod
As well as using the syntax outlined above you can also use chmod with a numeric parameter that represents the users and permissions intended. A common example is
% chmod 755 myscript.sh
This example is equivalent to chmod u=wrx,o=rx
How does this work? Well, let's call the number a triple to remind us that it's a string of three digits. Each digit represents the permissions for one of u, g and o. We give each possible permission a numeric value like this
|0||clear the permission|
In our example above, the number string is 755. There is only one way this could add up (so to speak).
|u||7||4 + 2 + 1|
|g||5||4 + 0 + 1|
|o||5||4 + 0 + 1|
Which means that chmod 755 filename means read, write and execute for the file owner and read and execute for group and others.
Pico: the simplest editor
So far we have been using built-in Unix commands issued directly at the shell prompt. To take things a little further we really need to be able to create and edit text files. There are a large number of different editors available for Unix users, among them
We are going to use pico because it is very widely installed on Unix systems and is by far the easiest to use for the beginner.
Pico is a terminal based program and when you start it the terminal switches from the prompt to the pico editing screen. So, if you type
you should see
You can start Pico with a filename as well, typing, for example
Notice that I gave the file the extension .sh to remind me - and Unix - that it is a script file (of which more later). The screen will look just the same if the file doesn't exist when you start but if the file does already exist then its contents will be displayed on screen. You enter text directly at the Pico cursor and issue commands with combinations of the control key and a character (the character part is not case sensitive). If you issue a command and wish to cancel it you use ^c. You can move about your file with the arrow keys.
Some Pico behaviour will take a little getting used to.
|^ alone||begins to mark text as selected. Now move the cursor with the arrow keys and text is highlighted. The selection ends at the current cursor position.|
|^k||cuts selected text|
|^u||pastes the last cut|
|^i||creates a tab/indent|
|^o||saves the file when you have finished editing|
Pico does have a little more functionality than this (although not much!) but this is enough for current purposes. Before you go on to creating scripts, use the Pico editor to create a text file of your own making sure to practise these few commands.
Using a text editor (or if you insist cat), create a file called list2 containing the following fruit: orange, plum, mango, grapefruit. Display the contents of list2.
Using pipes, display all lines of list2 containing the letter 'p', and sort the result. Search the file science.txt for the word ‘science’, sort the output and store it in a file (make up your own name for the file)
Use the editor Pico to create a text file and name it as you wish. Try changing access permissions on the file you have just created and on the directory archive Use ls -l to check that the permissions have changed.
My First Shell Script
What is a shell script?
So far we have been issuing Unix commands at the shell prompt. This is a very straightforward way of working but in some circumstances it isn't ideal. Suppose you have a file that you process in a particular, complex way - subjecting its contents to a string of different Unix processes? You can do this using pipes and redirects at the command line, but if you make a mistake you may have to start again. (Of course, trying out a complex procedure for the first time you were working on a copy of the data file, weren't you?). It would also be irritating if you needed to do this often, maybe regularly, and had to go through the process a command at a time with all the opportunities for making typos that this would provide.
Fortunately, Unix provides a quite simple way to avoid these situation. You can create a text file containing Unix commands; give it a name with the extension .sh and then execute the whole bunch of commands by invoking this filename at the command prompt. Let's start with a very simple example.
Creating a script
In the pico editor, create a file containing the following text, exactly as it appears here.
#!/usr/bin/bash ls -l .*
Save this file as hid.sh in your home directory. We are going to use this script at the command line, just like a command. This will list all the files and directories (and their contents) that begin with a dot (which is to say the hidden files and directories). The first line in the script (the exclamation mark in this context is called a "bang", so the first line begins with hash bang also called shebang) ensures that Unix can find the shell to execute this file.
Before we run it we have to deal with the permissions on this file so that it can be executed. Unix does not allow files to be executed by default (and this is a very good thing). The command to make the file executable is
% chmod 755 hid.sh
(I've used a shorthand here - 755 - to set the permissions for read and execute for group and other and write, read, execute for owner).
Now you can execute the commands in the file just by invoking the filename at the prompt:
(I have to type ./ because this file isn't in the current path. For the moment I want to ignore this complication - it has nothing to do with shell scripting and everything to do with Unix environment variables).
A more useful script
To illustrate a rather more interesting shell script, I'm going to process a file called science.txt. I created this file by stripping all the images and formatting from the Wikipedia article on science. You are of course welcome to try the same.
From the point of view of "real Unix scripting", what I do next is rather unnatural. Unix power users would not create a shell file of the kind I do below but would definitely use the piping and redirecting of grep output directly on the command line. But my aim here is to dazzle, hopefully inspire, and teach a little. So, learn and pass on.
Let us imagine that you are a historian of ideas. You want to know how Wikipedia presents the development of the idea of science. To begin with we'll just look at the lines of the Wikipedia article on science that actually contain the word science (as I mentioned above, I'm using a file that I created that only contains unformatted text from the article). How can we find these lines using what we know about Unix? The answer, as I'm sure you knew, is with grep. To find all the lines that contain the word science we would issue the command
%grep 'science' science.txt
So now create a text file with this as the first line after the shebang directive. You can call it scisearch.sh. When you have saved your file and changed the permissions, test it. Does it do what you expected? If it doesn't correct it, if it does, carry on.
This could be quite interesting. But rather than just display the results on screen, it would be more useful if they were saved to a file. We can do this with a redirect. Open the file scisearch.sh and change it so that it reads
grep 'science' science.txt > scioutput.txt
When you have made this change test and if necessary amend your file again.
Now, this is already an interesting file and it illustrates something about shell scripting but we can improve it. At the moment the search is case sensitive so amend it to read
grep -i 'science' science.txt > scioutput.txt
so that it finds not just science but Science. As usual you should test. You probably could get away without testing this last change but in real life it really is a good idea to test a script after each change so that you fix problems quickly before they get too difficult to untangle - or debug as the jargon has it.
One final amendment suggests itself. Let's add line numbers so that if we want to check a reference to our search term in context we can easily find it.
grep -in 'science' science.txt > scioutput.txt
(Of course now you could use a copy of the original with the lines numbered - you could do this with cat -n and a redirect of the output) Now you should check the file scioutput.txt - use less or open it with Pico - to see that the contents are what you expect.
This is a fairly simple shell script. It's only real purpose is to illustrate the general principle of creating script files. However, I do think it's worthwhile reflecting for a moment on how you might go about doing this in Microsoft Windows.
Generalising the script with variables
Our script is fine as it is but it's rather specialised. Suppose that I wanted to carry out a similar process one day on a file about religion. One approach would be to create a new shell script with different files and search terms. This is not optimal though. A better approach would be to parametrize or generalise the existing script.
The shell provides you with a number of variable names to represent positional parameters. These are values that can be substituted into your script from the command line based on the order they are typed in. The variable $0 is reserved for the name of your script. We don't need that right now. Instead we are going to use three numbered variables to represent the search string, input file and output file for our data processing. They will be called by the names $1, $2 and $3 respectively in the script.
Amend your script file so that it reads
grep -in $1 $2 > $3
So how do we use this new version? At the command line we substitute the terms for the variable names. We might type
scisearch.sh 'religion' religion.txt reloutput.txt
which assumes that we are searching for the string religion in a file religion.txt and sending the output to reloutput.txt. In many flavours of Unix you could now go onto make your script more interesting - by adding context to your output, for example capturing not just the single lines but those that precede and follow them as well, but we won't go into that here. It would be as well now to rename our script since it no longer has anything particularly to do with science.
This new script still only introduces the most basic scripting idea, but you are perhaps now in a position to look at a more detailed introduction to Unix.
You can learn more about scripting from the Bourne Again Shell Scripting book.
Controlling Jobs in Unix
A process in Unix - that is something doing a particular job - can run either in foreground or background. A job in foreground will be showing currently in the shell and you cannot communicate with the shell until either the job is finished or you interrupt it. A job running in background starts and returns you to the prompt where you can enter further commands while the background process continues. A background job can write to the current terminal window.
Jobs start by default in the foreground.
You can either start a job in background or send it to background after it has started.
To start a job in background
To the command to start the job you append &. So, we can run the gnu c compiler to compile the program hello in the background like this:
% gcc hello.c -o hello &
To move a started job to background
If you start a job in the foreground, you can move it to background. First, you stop the job with control-z, then use the command bg to send the stopped job to background. So, we might start the web browser Lynx, stop it, send it to the background like this:
%lynx ^z %bg
The job stopped by control-z is passed to the background and continues running.
To Foreground a Job
You can call a job to foreground using the command fg. Used on its own it will recall job most recently started in background. If we have just sent lynx to the background as above then:
will move the job into foreground.
What Jobs are Running?
Either of two commands can be used to find out what jobs are running in background.
The command jobs is available in many shells and reports the jobs running, the job numbers, the process name (and if you want the process group id with the option -l). You can use it like this:
In the output the numbers in square brackets ('[' and ']') are the job numbers that are used by the process control commands such as fg and bg.
The command ps will print information about processes currently running. Actually, it is a little more complicated than that. The command prints information about processes controlled by the current terminal and current effective user id. The process identification number needed by some commands for controlling the process will be given in this case in the first column of output.
Using job numbers
Now that you can identify a job's number using jobs or ps, you can use them to control jobs. If you have several jobs running in background, you can select one to bring to foreground by using its job number. Instead of using bare fg you add the job number like this:
This would take the job with the number two (identified by jobs) and bring it to the foreground.
Killing a Job
You can terminate Unix jobs in different ways. A simple way is to bring the job to foreground and terminate it, with control-c for example.
The Unix command kill can be used to terminate a background process. You can follow kill either by the job number (prefixed with %) or the process identification number (PID). A common misconception is that the -9 (SIGKILL signal) is required to terminate a process. This is a bad practice because the SIGKILL signal does not allow the process to gracefully terminate and makes it immediately terminate. This can cause problems with memory management. The proper way to kill a process is with the -2 (SIGINT) signal which allows the process to perform cleanup:
%kill -2 %2
The %2 specifies the PID of the process. To kill using a job number, the percent sign is omitted:
%kill -2 1367
If the -2 signal does not work, the process may be blocked or may be executing improperly. In this case, use -1 (SIGHUP), -15 (SIGTERM), and then at last resort -9 (SIGKILL).
There are times when it may be helpful to have two different names that refer to the same file. For example, a file may have a very long name that cannot be changed because it is needed for identification purposes. For ease of use, you could also refer to the file using a shorter name. These different references to the same file are called links, and there are two different kinds of links: hard links and soft links.
A hard link is a name that directly references a file on the file system. Most filenames are hard links: they refer to a file system location on a storage medium. More than one filename can refer to that file system location. If file1 and file2 are both hard links that refer to a file that contains text, any editing saved in file1 will show up when you look at file2. Hard links are independent, so if you remove file1, file2 still refers to the file on the filesystem.
A soft link (also called a Symbolic link) is a name that references a filename (not a file). If file1 is the name of a file in your home directory (a hard link to the file on the file system) and file2 is a soft link, it refers not to the file on the file system, but to file1. Anything saved in file1 will still show up in file2, but if file1 is removed, file2 will no longer refer to a valid filename.
The Unix command to create links is ln. By default, the ln command creates hard links. The option -s tells the command to create a soft link instead.
Let's create a file and then create a hard link and a soft link.
% touch file.txt % ls -l % ln file.txt hardlink.txt % ln -s file.txt softlink.txt % ls -l
To get all the system environment Variables:
To get one by its name:
To modify it: