Distributed Systems

Memory model

Every process in Unix has an address space that looks like this:

global vars—text segment—heap—stack

Remember the low memory is at the top, and the high memory is at the bottom. So when items are added to the stack, and it grows upwards, each element has a lower memory address.

Unix processes

fork() is your friend

You should know what this does:

main()
{
    int i, cpid;
    cpid = fork();
    if (cpid < 0) exit(1);
    else if (cpid == 0) {
        /* this is the child process. */
        for (i=0; i<1000; i++) printf("child: %d\n", i);
        exit(0);
    }
    else {
        /* this is parent process */
        for (i=0; i<1000; i++) printf("parent: %d\n", i);
        waitpid(cpid);
    }
return 0;
}

The call to fork creates a new process with an entire new address space. That means a copy is made of everything: new global vars, new text code, new heap, and new stack. Even object dynamically allocated with malloc are copied. Remember that the pointers to those dynamically-allocated objects are set based on relative addressing, so even the new copied memory is allocated somewhere else, the pointers into the heap still work.

process states

The four most popular UNIX process states are ready, waiting, running, and zombie. The first three are easy. Zombie processes are child processes that have completed and are waiting for their parents to clean them up. Usually parents clean up completed (zombie) child processes by calling wait or waitpid.

If the parent process dies before the child process finishes, eventually the root process will come around and clean them up.

Pipes

simple example

So, after fork(), if each process has its own memory space, how to set up communication? With pipes! Each process may have its own file descriptor table, but they still reference the same files.

Example pipe code, where child reads and parent writes:

NOTE: feel free to add in all that error-checking goodness, but try to keep the point obvious

int p[2];
pipe(p); /* call by address. */
cpid = fork();
if (cpid < 0) exit(1);

/* child process */
if (cpid == 0) {
    close(p[1]); /* child won't need to write, so close. */
    read(p[0], buf, len);
    close(p[0]);
    _exit(0); /* read man 3 exit and man _exit to understand the difference. */
}

/* parent */
else {
    close(p[0]); /* parents can be so close-minded! */
    write(p[1], buf, len);
    close(p[1]);
    waitpid(cpid);
}

Study and understand the above code.

Implementing a very simple shell with pipes

We've all done stuff like this at the command line:

$ ls -l | sort | head

But how does that work? In MS-DOS, this functionality was implemented sort of like this:

ls -l > /tmpfile1
sort /tmpfile1 > /tmpfile2
head /tmpfile2

The three tasks were run sequentially, rather than concurrently. This is bad if the output of the first program is enormous, and this setup certainly can't handle an infinite stream of data.

You can implement the ls -l | sort | head stuff in C. To keep it manageable, the following code implements "ls | wc -l" instead:

int pipes[2];
pipe(pipes);

int pid = fork();
if(pid < 0) /* Check for error */
{
    fprintf(stderr, "fork unsuccessful.");
    exit(1);
}
if(0 == pid) /* We are the child */
{
    /*
     * We don't need the writing end of the pipe.  This closes it.
     */
    close(pipes[1]);
    
    /*
     * Duplicate our output pipe to file descriptor 0.
     * This means that anything that would normally be read from stdin
     * is now read from the pipe.
     */
    dup2(pipes[0], 0);
    
    /*
     * We've duplicated the pipe, so there is no need to leave this copy
     * hanging around.
     */
    close(pipes[0]);
    
    /*
     * Execute the "wc -l" command.  This replaces our process.
     */
    execlp("wc", "wc", "-l", (char*)0);
}
else /* We must be the parent */
{
    /*
     * We don't need the reading end of the pipe.  This closes it.
     */
    close(pipes[0]);
    
    /*
     * This duplicates the pipe into stdout.  This means that standard
     * output is redirected into the pipe.
     */
    dup2(pipes[1], 1);
    
    /*
     * Since we've duplicated this end of the pipe, we won't need it
     * anymore.
     */
    close(pipes[1]);
    
    /*
     * We're done setting up.  This replaces us with the "ls" part of the
     * command.
     */
    execlp("ls", "ls", (char*)0);
}

TODO: talk more about how execv works.

Implementing I/O redirection with pipes

Another classic:

ls -l > out.txt

How can you do?

TODO: finish

err?

Threads

wikipedia thread article

Threads share memory, processes don't. They both allow programs to do multiple things simultaneously (concurrently). Most of the time on Linux, when people talk about threads, they're really talking about POSIX threads, aka pthreads.

Simple pthread example

Example pthread code:

#include <stdio.h>
#include <pthread.h>
#include <unistd.h>

int i; /* i is global, so it is visible to all functions. */

static void *foo()
{
    int k;
    for (k=0; k<10; k++) {
        sleep(1);
        printf("thread foo: %d.\n", i++);
    }
    return NULL;
}

static void *bar()
{
    int k;
    for (k=0; k<10; k++) {
        sleep(1);
        printf("thread bar: %d.\n", i++);
    }
    return NULL;
}

int main() {
   int x;
   pthread_t t1, t2;
   x = pthread_create(&t1, NULL, foo, NULL);
   if (x != 0) printf("pthread foo failed.\n");
   x = pthread_create(&t2, NULL, bar, NULL);
   if (x != 0) printf("pthread bar failed.\n");
   pthread_join(t1, NULL);
   pthread_join(t2, NULL);
   printf("all pthreads finished.\n");
   return 0;
}

This is a slice of the output:

$ ./pthreadfun 
thread foo: 0.
thread bar: 1.
thread foo: 2.
...
thread foo: 18.
thread bar: 19.
all pthreads finished.

Hopefully the main point is jumping out at you. Foo and bar alternately increment up the global integer i.

Exercise

Rewrite the above code to use fork() and create two processes rather than two threads. Then find what is the maximum value of i and explain the difference with this threaded version.

Kernel threads vs. user threads

pthreads are kernel-space threads, meaning that the OS scheduler is aware of the multiple threads. Each thread has an entry in the OS schedule table.

There are also user-level threads, where the kernel is unaware of the threads. The process must handle scheduling itself. User-level threads require two-level scheduling:

OS scheduler handles all processes
Process handles all threads.

One user-level thread can block all the other threads for the process if the programmer isn't careful. If a user-level thread makes some blocking system call, then the OS will suspend that process until that system call returns.

User threads have the advantage of allowing switching between threads in a process much more quickly than kernel-level threads. The process does the context-switching itself.

Solaris hybrid threads

Sun Solaris version 2.3 offered hybrid threads which attempt to combine the speed of user-level threads with the advantages of kernel-level threads. The hybrid threads minimize the OS scheduling cost of switching between threads at the kernel level, but they also allow a thread to make a blocking system call without blocking the entire process.

The system allowed for threads and lightweight processes. The OS scheduler only saw the lightweight processes. Each thread is attached to a lightweight process. Threads can share a process, so one application with multiple threads may appear to the OS scheduler as a single process.

In a program with two threads both bound to a single process, if thread 1 makes a blocking system call, then the OS will block the process. Meanwhile, in order to allow thread 2 to still run, the OS will create new process, detach thread 2 from the first process, and then attach it to the new process.

critical sections and mutexes

If threads are going to share vars, you need to be careful they don't both try to alter memory simultaneously.

Consider this classic C code:

i++;

This looks like one operation (add one to i), but it translates to three operations in assembly code:

lda i;  load the value stored in var i into the eax register.
inc;    increment up the eax register.
sta i;  store the value in the eax register into the var i.

Read the C programming appendix in this book for more information on how to convert a C program to assembly using gcc.

Since threads share memory, it is possible for one thread to start to do something, then get suspended by the OS scheduler.

Imagine if two threads exist and i is a global integer, initially set to 99. Each thread wants to increment up i by one, so each thread will do the three assembly steps above. After we are finished, we expect to find i set to 101.

First thread 1 starts:

lda i;    eax register holds 99.
inc i;    now, eax holds 100.

Now, imagine at this point the OS scheduler suspends thread 1 and starts thread 2:

lda i;    load 99 from i into eax.  This is bad!!!
inc i;    increment eax to 100.
sta i;    store 100 into i.

And then, the OS allows thread 1 to finish:

sta i;    store eax into i.  but eax only has 100!

So now you see how two threads accessing common variables can be risky. The worst part about this bug is that it will be inconsistent and hard to reproduce. The i++ statement is called a critical section because we need to makes sure that only one thread has access to the variable in that section.

One solution is to use some kind of lock, like this, in each thread:

    lock(ilock);
    i++;
    unlock(ilock);

Then if thread1 gets suspended, and thread2 starts, it will block on the lock(ilock) call. To do this for real, we can use mutexes. A mutex is like a semaphore that only has values of 1 or 0. The threads in this code will lock and unlock a mutex before trying to increment up i, so you avoid the above problem.

#include <stdio.h>
#include <pthread.h>

pthread_mutex_t ilock = PTHREAD_MUTEX_INITIALIZER;
int i;

static void *f1()
{
    pthread_mutex_lock(&ilock);
    i++;
    pthread_mutex_unlock(&ilock);
    return NULL;
}

static void *f2()
{
    pthread_mutex_lock(&ilock);
    i++;
    pthread_mutex_unlock(&ilock);
    return NULL;
}

int main()
{
    pthread_t t1, t2;
    int x;
    i = 99;
    x = pthread_create(&t1, NULL, f1, NULL);
    if (x != 0) printf("pthread foo failed.\n");
    x = pthread_create(&t2, NULL, f2, NULL);
    if (x != 0) printf("pthread bar failed.\n");
    pthread_join(t1, NULL);
    pthread_join(t2, NULL);
    printf("all pthreads finished.\n");
    printf("i set to %d.\n", i);
    return 0;
}

That's it!

Another solution to this problem is to use a "increment" instruction that really is atomic. (The "lock" and "unlock" functions use just such an atomic instruction). Such atomic functions (and how they can be used to avoid data corruption and deadlocks) are discussed at Wikipedia:Lock-free and wait-free algorithms.

Producer and Consumer model

This hobgoblin shows up all the time in the real world. The chef makes bowls of soup for the diner. The chef has to wait for the diner to ask for soup, and the diner has to wait for the chef to announce the soup is ready. If you ain't careful, you can end up with a deadlock which is just as nasty as it sounds.

A deadlock involves two processes (p1 and p2) and two resources (r1 and r2). Each process needs exclusive access to both resources to finish. The deadlock happens when p1 holds r1 and waits for r2 to be released, and p2 holds r2 and waits for r1. As you can see, neither process can continue.

This pseudocode illustrates how to set up a producer/consumer system and not fear deadlocks:

/*  

static soupbuffer buff;
semaphore moreplease, soupisready;

producerthread()
{
    lock(moreplease);       wait until the kids ask for more.
    cooksoup(buff);         refill the soup pot.
    unlock(soupisready);    announce that soup is ready.
}

consumerthread()
{
    lock(soupisready);      wait until the soup is ready.
    eatsoup(buff);          empty the soup pot.
    unlock(moreplease);     ask for more soup.
}
                                                                                                             
*/

Shared memory

System V provides shared memory. One process can allocate some memory as shared memory, then other processes can attach to that memory, and these processes can communicate.

one process can use shmget to allocate memory. then that process can use shmat to aim a pointer at that memory.

another process can use shmat to attach a pointer to that same memory.

shmdt will detach the pointer from shared memory.

shmctl can be used to free the allocated memory.

You can use ipcs to check on any allocated shared memory, which is a nice way to check if your program freed all allocated memory. ipcrm will delete allocated shared memory.

Signals

Signals are neat. When you hit Ctrl+C to kill a process, you're really sending SIGINT to that process. Similarly, Ctrl+Z sends SIGTSTP and `kill -9 pid` sends the 9th signal, SIGKILL. The full listing of available signals is in the seventh section of the manual pages (`man -s 7 signal`).

signal handlers

When a process receives a signal, the corresponding function is invoked. In the case of SIGKILL, the function causes the code to exit. Many of these signals can be trapped and reassigned to a custom function as illustrated in the following code:

#include <stdio.h>
#include <signal.h>
#include <unistd.h>

void sighandler(void)
{
    printf("i am un-SIGINT-able!\n");
}

int main()
{
    signal(SIGINT, sighandler);
    perror("signal");
    while (1) {
        printf("sleeping...\n");
        sleep(1);
    }
    return 0;
}

Compile and run the program above and then try to kill it with Ctrl+c. Now, try to kill the program with `kill -s SIGKILL pid`.

When Ctrl+c is pressed, the process is sent the SIGINT signal; however instead of terminating, the custom function sighandler is invoked. Since the SIGKILL signal is not trapped, the code exited normally.

The signal() function will not let you write a handler for SIGKILL.

In HP-UX, after the signal arrives, the signal handler is forgotten for that signal. So, often inside the signal handler, it re-registers it self as the handler:

void sighandler(void)
{
    printf("i am un-SIGINT-able!\n");
    signal(SIGINT, sighandler);
}

In Linux, this is not necessary.

sigalarm

The SIGALRM signal can be raised by the alarm() function, so it's a good way to generate a signal inside your program. The below code illustrates:

signal(SIGALRM, onintr);
alarm(3);
while (1) ; /* run forever. */

void onintr()
{
    printf("3 second violation.\n");
    alarm(3);
}

This program will run until you kill it.

Signals and setjmp/longjmp

This code calls the SIGALARM handler when the process receives the SIGALARM signal, and then inside the handler, jumps out of the handler without returning.

jmp_buf env;
main()
{
    setjmp(env);
    signal(SIGALRM, onintr);
}
void onintr()
{
    printf("SIGALRM handler...\n");
    longjmp(env);
}

There's an issue that might not be obvious. After a process gets a signal, then it calls the signal handler. If another signal comes in while the first signal is still being processed, the process will ignore that second signal. The OS "blocks" the signal while the handler is running.

When the handler finishes, as part of its return, the OS allows the process to get new signals.

If we jump out of a function with longjmp, then the signal handler will never return, and the process will never hear any new signals of that type coming in. However, don't worry! You can manually unblock the signal and then jump out.

This alternate handler shows how:

void handler()
{
    sigset_t set;
    sigempty(&set);
    sigaddset(&set, SIGALRM);
    sigprocmask(SIG_UNBLOCK, &set, NULL);
    printf("3 second\n");
    longjmp(env);
}

You can use the functions sigsetjmp and siglongjmp to jump out of a signal handler and optionally reset the signal mask. The below code shows an example. You'll have to use Ctrl-C to end the program:

#include <stdio.h>
#include <signal.h>
#include <setjmp.h>
#include <unistd.h>

sigjmp_buf env;

void sh()
{
    printf("handler received signal.\n");
    siglongjmp(env, 99);
}

int main ()
{
    int k;
    k = sigsetjmp(env, 1);
    if (k == 0) {
        printf("setting alarm for 4 seconds in the future.\n");
        signal(SIGALRM, sh);
        alarm(4);
        while (1) {
            printf("sleeping...\n");
            sleep(1);
        }
    }
    else {
        printf("k: %d.\n", k);
        alarm(3);
        printf("set alarm for another 3 seconds in the future.\n");
        while (1) {
            printf("sleeping...\n");
            sleep(1);
        }
    }
    return 0;
}

Signals and pause

If you want to wait for an event, but don't want to resort to some kind of spin lock, you can use the pause function to suspend a running process until a signal occurs.

Example code:

#include <signal.h>
#include <unistd.h>
#include <stdio.h>

void sh()
{
    printf("Caught SIGALRM\n");
}

int main()
{
    signal(SIGALRM, sh);
    printf("Registered sh as a SIGALRM signal handler.\n"
           "BTW, sh is %d and &sh is %d.\n" , (int) sh, (int) &sh);
    alarm(3);
    printf("Alarm scheduled in three seconds.  Calling pause()...\n");
    printf("RAHUL IS IN WIKIPEDIA WEBSITE AT CHITKARA UNIVERSITY NEAR CHANDIGARH");
    pause();

    printf("pause() must have returned.\n");
    return 0;
}

Also notice that the second print statement casts the function sh to an integer, and also casts the address of sh as an integer. Here's the output of the program:

$ ./pausefun
Registered sh as a SIGALRM signal handler.
BTW, sh is 134513700 and &sh is 134513700.
Alarm scheduled in three seconds.  Calling pause()...
Caught SIGALRM
pause() must have returned.

Notice that sh after being cast to an integer is the same as the address of sh. That 134513700 is the memory address in the text section of the sh function. Although it may seem a little strange at first, from the OS point of view, functions are really just another kind of data.

Message passing

message passing is distinct from communication via pipes because the receiver can select a particular message from the message queue. With pipes, the receiver just grabs some number of bytes off the front.

The type arg allows priority scheduling. If the 4th arg of msgrcv is > 0, then system will find the first message with the type equal to that specified 4th arg.

If type is < 0, then system will find the first message with the lowest type where type is below the absolute value of type.

If we have these messages:

Name:Type
A:400
B:200
C:300
D:200
E:100

Then type -250 would return B.

The following two programs show message passing.

mpi1.c:

#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/msg.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>

#define KEY1 9870
#define KEY2 7890
#define PERM 0666

typedef struct mymsgbuf
{
    int type;
    char msg[64];
} mtype;

int main()
{
    int msg1, msg2, i;
    mtype m1, m2;

    msg1 = msgget(KEY1, PERM | IPC_CREAT);
    msg2 = msgget(KEY2, PERM | IPC_CREAT);

    msgrcv(msg2, &m2, 64, 0, 0);

    printf("mpi1 received msg: \"%s\" from message queue %d.\n", m2.msg, msg2);

    m1.type = 20;
    strcpy(m1.msg, "I doubt");
    printf("mpi1 about to send \"%s\" to message queue %d.\n", m1.msg, msg1);
    i = msgsnd(msg1, &m1, 64, 0);
    printf("mpi1 sent \"%s\" to message queue %d with return code %d.\n", m1.msg, msg1, i);

    sleep(1);

    msgctl(msg1, IPC_RMID, (struct msqid_ds *) 0);
    msgctl(msg2, IPC_RMID, (struct msqid_ds *) 0);

    return 0;
}

mpi2.c:

#define KEY1 9870
#define KEY2 7890
#define PERM 0666

typedef struct mymsgbuf
{
    int type;
    char msg[64];
} mtype;

int main()
{
    int msg1, msg2, i;
    mtype m1, m2;

    m1.type = 10;
    strcpy(m1.msg, "Indians will win...");

    /* attach to message queues. */
    msg1 = msgget(KEY1, PERM);
    msg2 = msgget(KEY2, PERM);
    if (msg1 == -1 || msg2 == -1) perror("mpi2 msgget");

    /* send message to mpi1. */
    printf("mpi2 about to send message \"%s\" to message queue %d.\n", m1.msg, msg2);
    msgsnd(msg2, &m1, 64, 0);

    /* send message to mpi2. */
    strcpy(m2.msg, "...");
    /*
    msgrcv(msg2, &m2, 64, 0, 0);
    */
    i = msgrcv(msg1, &m2, 64, 0, 0);
    if (i == -1) perror("mpi2 msgrcv");
    printf("mpi2 received msg: \"%s\" from message queue %d with return code %d.\n", m2.msg, msg1, i);

    /* cleanup. */
    msgctl(msg1, IPC_RMID, (struct msqid_ds *) 0);
    msgctl(msg2, IPC_RMID, (struct msqid_ds *) 0);

    return 0;
}

Now compile mpi1.c into mpi1 and compile mpi2.c as mpi2:

$ gcc -o mpi1 mpi1.c
$ gcc -o mpi2 mpi2.c

Now run mpi1 in background and then run mpi2:

$ ./mpi1 &
[1] 29151
$ ./mpi2
mpi2 about to send message "Indians will win..." to message queue 2195457.
mpi1 received msg: "Indians will win..." from message queue 2195457.
mpi1 about to send "I doubt" to message queue 2162688.
mpi1 sent "I doubt" to message queue 2162688 with return code 0.
mpi2 received msg: "I doubt" from message queue 2162688 with return code 64.
[1]+  Done                    ./mpi1

All about the stack

This section describes what happens in the stack and registers when one function calls another function and passes a few parameters.

Imagine we have this C code:

void foo(x, y, z)
{
    int i, j;
}
int a, b, c;
a = 4;
b = 5;
c = 6;
foo(a, b, c);

Behind the scenes, first c gets pushed, then b, then a, then the return address of the calling function. This is what the stack looks like (the stuff at the top was the most recently pushed object):

 -8    j
 -4    i
  0  ebp
 +4   return address
 +8    a
+12    b
+16    c

segfaults revealed!

We've all seen this lovely error:

$ ./try
Segmentation fault

Most of the time, you find out something like is the problem:

char ss[3];
strcpy(ss, "this is a way too long string!\n");

After this command: char ss[3]; the stack can fit three characters into one 4-byte word.

 -4          ; this is the space (4 bytes, 1 word) allocated for ss[3].
  0     ebp
 +4     r.a.

Now watch what happens when the strcpy() command fills in the stack:

-4      't', 'h', 'i' ,'s'   ; we can fit the first 4 chars here, but the rest spill over.
 0      ' ', 'i', 's', ' a'  ; ACK!  we just overwrote the old base pointer!
+4      ' ', 'w', 'a', 'y',  ; now we're doomed; when this function calls it will try to hop to
                             ; the return address stored here, but that has been overwritten.

So hopefully it is more clear what a segfault is. Your program is trying to hop to a memory address outside the process memory, and the OS puts the smack down.

Networking

OSI reference

7 layers:

application
presentation
session
transport
network
data link
physical

In Unix, the first three (application, presentation, and session) are often bundled together.

Imagine that a network looks like this:

A---Router1---Router2---B.

If a process (application) on A wants to communicate with a process on B, this is sort of what happens:

The transport layer adds A's port and B's RPC server port.

The network layer adds A's IP address and B's IP address.

Any communication from A must go through the two routers first.

The data link layer adds A's mac address and router 1's mac address.

The physical layer converts the frame into the physical signal.

When the message is received by router 1, the frame is read in and sent to the bottom of the OSI model. Router 1's mac address is removed and replaced with Router 2's mac address.

The UNIX file system

Inodes

Each directory is actually a file. That file lists the names of files and subdirectories in that directory. It associates names to inodes.

Each inode has the following data:

userID
groupID
mode
timestamp (accessed, created, modified all tracked separately)
link count
file size
direct pointers to 10 data blocks that contain file contents
single indirect pointer to a data block that points to data blocks that contain the file contents
double indirect pointer to a data block ...

Addressing

Since each inode has 10 direct address pointers, if a file is small enough to fit on 10 data blocks or less, then the inode can use direct addressing.

If we assume that we use 4 bytes to specify the disk address, then we can have 2³² different addresses. If we set the data block size to 4 kb, then one data block can hold 1000 addresses. So, with single indirect addressing, we can point to a file that uses 1000 data blocks of data.

Therefore, the maximum allowed file size for single indirect addressing (assuming 4kb data blocks and 4byte addresses) is 4 megabytes (1000 data blocks * 4 kb per data block).

Easy formula: data block size / disk address = max file size.

Of course, in double or triple (or quadruple, quintuple, etc.) addressing, you just gotta take the analysis out further.

File Size	Blocks Used
1 block	1 block used via direct addressing.
11 blocks	12 blocks used: 10 blocks of data via direct addressing, 1 block for indirect addressing, and 1 for data.
150 blocks	10 blocks via direct addressing, 1 + 128 blocks through single addressing, 1 + 1 + 12 1 double indirect + indirect + 12 data blocks

There are advantages and disadvantages of small vs. large data blocks. Lots of small files and big data blocks causes low utilization. For example, your .vimrc file may only have 300 bytes of data, but it has to use at least one data block, so if your data block is 4kb in size, well, that's a lot of wasted space.

However, small blocks lead to big files being scattered out across lots of data blocks, and so the hard drive reader has to skip all over the place gathering up the data. Large data blocks reduce fragmentation.

As the number of blocks per file increases, disk IO cost increases.

Linking

Each directory has a table that maps file names to inode numbers. A file can have any number of names associated with it. Each inode keeps track of its link count. The command rm foo.txt edits the directory and removes the entry that maps foo.txt to a particular inode. If that inode has zero links, then it is deleted.

You might want to read the man page for ln at this point to understand the difference between soft and hard links.

RAID

Compared to a monolithic system, a distributed system can have a much lower chance that the system will fail today—even though the distributed system has a much higher chance that one part will fail today. High-availability distributed systems use techniques inspired by RAID (redundant array of inexpensive disks) to tolerate a failure in any one part, without loss of data or functionality.

Wikipedia has related information at RAID

For more details, see

The set-user-ID bit

The set-user-ID bit (sometimes shortened to setuid or simply suid) alters how the OS handles permissions. For example, the chsh command allows a user to change her login shell, like from bash to tcsh. The login shell for each user is stored in /etc/passwd, but users don't have write access. So how can a user run chsh and change that file? The answer is with suid bits. Look at the permissions on chsh:

$ ls -l `which chsh`
-rwsr-xr-x    1 root     root        28088 2004-11-02 16:51 /usr/bin/chsh

So, what is that s in there for? That, my friend, is the set-user-ID bit, which means that this program runs as if it were executed by root.

A program with the setuid bit set (or the similar setgid bit) will be run with the privileges of the program's owning user or group respectively.

Remote Procedure Calls

Remote procedure calls (RPC) are a way hiding all the details of running a process on a far away remote machine.

If A wants to make a remote procedure call on host B, then A needs B's RPC server port.

An RPC frame includes:

function number
parameters

The command rpcinfo will list what RPC services are running on the machine.

Generally developing RPC program involves eight steps:

Build and test conventional application.
Divide the program by choosing a set of procedures to move to a remote machine. Place the selected procedures in a separate file.
Write a rpcgen specification file for the remote program. Choose a program number and version number. The specification file should be named with a .x suffix. For example, a remote database program could have a spec file of rdb.x.
Run rpcgen command to check the specification file. If the file is valid, rpcgen will generate source code that will be used to build the client and server. Example:

rpcgen -C rdb.x

will generate 4 output files: rdb.h rdb_clnt.c rdb_svc.c rdb_xdr.c

Write stub interface routines for the client and server side. You can use the code developed in step 1. For example, you might write a rdb.c file that handles the client interface, and a rdb_svc_proc.c to handle the work on the server side.
Compile and link the client program.

gcc -o rdb rdb.c rdb_clnt.c rdb_xdr.c

Compile and link the server program.

gcc -o rdb_svc rdb_svc_proc.c rdb_svc.c rdb_xdr.c

Start server and invoke client.

Distributed Algorithms

Properties of distributed algorithms

Information can be scattered among machines.
Each process makes decisions based on local information.
Machines can share information.
A single point of failure should not cause the entire system to crash.
Machines can't rely on a single common clock or global time.

A simple example of how non-global clocks cause problems:

you compile foo.c (via make) to foo.o on machine A.
you log in to machine B, which has a clock that is 15 minutes behind B.
You edit foo.c again, then run make. make compares the timestamp on foo.o to foo.c and foo.o is still newer, so make will not recompile foo.o.

Synchronization

Physical clocks agree with "real time" like Universal Coordinated Time.

Logical clocks don't worry about being accurate to some global standard. Instead, all machines in a system try to agree with each other.

Keeping a local physical clock synchronized is challenging because even if the machine can contact the authoritative time server, there is some propagation delay.

How often does a machine need to request an update from a time server? Assume that the machine has a clock $C$ that increments at rate $dC$ .

The UTC clock $t$ increments at rate $dt$ .

If $dC/dt=1$ then the local physical clock C is perfect and doesn't need to be updated.

If $dC/dt>1$ then the local physical clock C runs at a faster rate.

If $dC/dt<1$ then the local physical clock C runs at a slower rate then the time server.

$\rho$ represents the maximum drift rate specified by a manufacturer.

$1-\rho <=dC/dt<=1+\rho$

$\delta$ represents the maximum tolerable error. every $\delta /\rho$ period, a machine must request an update.

As the error tolerance $\delta$ increases, update frequency falls. As drift rate $\rho$ goes up, the update frequency goes up also.

Appendix A: C language overview

setjmp and longjump

Use these to do what C++, python, Java, etc. does with throwing exceptions. First call setjmp to set a location to jump to. Then other functions can call longjump to exit the current function and how to the location specified in setjmp.

functions with variable-length arguments

In other words, how does printf(...) work?

conditional compilation

One example:

main()
{
    #ifdef DEBUG
        printf("this is a debug statement!\n");
    #endif
}

$ gcc -DDEBUG prog.c; ./a.out
this is a debug statement!

$gcc prog.c; ./a.out
$

In the second compilation, the printf block ain't included.

Another similar trick is used to prevent the same header file from being included repeatedly. It is common for one C program to have multiple .c files which include multiple .h files. You can put this code at the top of your .h file to prevent the same .h file from being included redundantly:

#ifndef _FOO_H
#define _FOO_H 1

/* function foo is a worn-out cliche. */
int foo();

#endif

make and makefiles

Four type of statements are allowed in Makefiles: macros, dependency rules, commands, and comments. Here's a trivial example makefile:

CC = /usr/bin/gcc
CFLAGS = -Wall -pedantic -g

#saves me the trouble of typing all those flags.

proxy: proxy.c proxy.h
   $(CC) $(CFLAGS) proxy proxy.c

debug-proxy: proxy.c proxy.h
   $(CC) $(CFLAGS) -DDEBUG -o proxy proxy.c
 
clean:
   /bin/rm -f *~ httpd a.out httpd.log proxy proxy.log CACHE.*

I can run any statement by typing make and the name. Or by default, make will run the first rule it hits if it gets no arg.

$ make           # runs make proxy
$ make clean     #runs /bin/rm -f ....

Macros are accessible after being declared by either $(CFLAGS) or ${CFLAGS}. $CFLAGS is not valid. If the macro is one letter, like

A = a.out

Then $A is valid. But you should stay away from that anyway, because lots of the single-char macros already have meaning.

Each tab underneath a dependency rule starts a new shell, so if you want to change shell stuff, you gotta do it all on one line:

bogus:
    echo $$PWD; cd ..; echo $$PWD;
    echo $$PWD;

The output:

$ make bogus 
echo $PWD; cd ..; echo $PWD
/home/student/wiwilson/cis620
/home/student/wiwilson
echo $pwd

BTW, if you want to suppress the default echoing of the commands, put an @ at the beginning:

bored:
    echo "meh"
    @echo "snah"

And here's the output:

$ make bored
echo "meh"
meh
snah

How one make command can recursively call make in subdirectories:

SRCDIR = src1 src2
OBJ    = src1/f1.o src2/f2.o

snake: ${SRCDIR} ${OBJ}
    ${CC} -o snake ${OBJ}

${SRCDIR}: /tmp
    cd $@; make

This confused me at first. The $@ symbol refers to the target for the rule, in this case, the ${SRCDIR}.

You can use makedepend inside a makefile to resolve all that hairy .h stuff automatically:

CFILES = main.c foo.c bar.c

depend:
    makedepend $(CFILES)
# don't edit below here.

TODO: show example before and after Makefile.

If you have lots and lots of files and you want to compile them all into objects, this rule works:

.c.o:
    $(CC) -c $<

Converting C to assembly

gcc -S try.c

This will convert your program to assembly code, using GNU syntax. gcc will create a file called try.s

gcc -S -masm=intel try.c

This will convert your code to assembly code using intel syntax. This may only work on intel CPUs.

You can convert try.s to an executable like so:

gcc try.s

And you can even compile the executable for use with gdb:

gcc -g try.s

And then you can step through the underlying assembly code in gdb. Hurray!

Appendix B: Solutions to Exercises

4.1.1 POSIX Threads

The task for this exercise was to rewrite the example code to use fork() rather than threads. You should have something similar to this:

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>

int main()
{
    /*
     * The name of our process.  This lets us tell the difference when we
     * do output.
     */
    const char* name;

    /*
     * Our counter variable.  Because this is all in one function, it need not
     * be global.
     */
    int i;
    i = 0;
    
    pid_t pid = fork();
    if(pid == 0)
    {
        name = "Child";
    }
    else
    {
        name = "Parent";
    }

    /* Our output loop. */
    int j;
    for(j = 0; j < 10; j++)
    {
        sleep(1);
        printf("%s process: %d\n", name, i);
        i++;
    }
    
    return 0;
}

You were also asked in what way the output was different, and why. In the version that made use of POSIX Threads, the counter was incremented twice each second: once by each thread. When fork() was used, it was only incremented once. The reason for this is that when fork() is called, the child process writes to its own copy of the counter. When threads are used, however, both threads share their memory space. This means that they are both writing to the same copy of the counter, and it is incremented twice.

Contributors

Please, give yourself credit if you've made any contributions!

Matthew Wilson Started this textbook in Fall 2004, while enrolled in a graduate comparative operating systems interfaces course at Cleveland State University.

Lachlan Gunn