PROCESS MANAGEMENT
CS230 SYSTEMS PROGRAMMING I
ROD ULRICH
PROCESS MANAGEMENT
• Processes are, after files, the most fundamental abstraction in a UNIX
system.
• As object code in execution—active, alive, running programs-processes are
more than just assembly language; they consist of data, resources, state, and a
virtualized computer.
PROGRAMS, PROCESSES, AND THREADS
• A process is a running program.
• A process includes the binary image, loaded into memory, but also an
instance of virtualized memory, kernel resources such as open files, a
security context such as an associated user, and one or more threads.
PROGRAMS, PROCESSES, AND THREADS
• In a single threaded process, the process is the thread.
• There is one instance of virtualized memory and one virtualized processor.
• In a multithreaded process, there are multiple threads.
• As the virtualization of memory is associated with the process, the threads all
share the same memory address space.
PROCESS ID
• Each process is represented by a unique identifier, the process ID (frequently shortened to pid).
• The idle process, which is the process that the kernel “runs” when there are no other runnable
processes, has the pid 0.
• The first process that the kernel executes after booting the system, called the init process, has the
pid 1.
• Normally, the init process on Linux is the init program.
• We use the term “init” to refer to both the initial process that the kernel runs, and the specific
program used for that purpose.
PROCESS ID
• Unless the user explicitly tells the kernel what process to run (through the init
kernel command-line parameter), the kernel has to identify a suitable init
process on its own— a rare example where the kernel dictates policy.
PROCESS ID
• The Linux kernel tries four executables, in the following order:
• /sbin/init: The preferred and most likely location for the init process.
• /etc/init: Another likely location for the init process.
• /bin/init: A fallback location for the init process.
• /bin/sh: The location of the Bourne shell, which the kernel tries to run if it fails to find
an init process.
PROCESS ID ALLOCATION
• The kernel allocates process IDs to processes in a strictly linear fashion.
• If pid 17 is the highest number currently allocated, pid 18 will be allocated
next, even if the process last assigned pid 17 is no longer running when the
new process starts.
• The kernel does not reuse process ID values until it wraps around from the
top—that is, earlier values will not be reused until the value in
/proc/sys/kernel/pid_max is allocated.
PROCESS HIERARCHY
• The process that spawns a new process is known as the parent; the new
process is known as the child.
• Every process is spawned from another process (except the init process).
• So, every child has a parent.
• This relationship is recorded in each process’s parent process ID (ppid),
which is the pid of the child’s parent.
PROCESS HIERARCHY
• Each process is owned by a user and a group.
• This ownership is used to assign rights to resources.
• To the kernel, users and groups are integer values.
• Through the files /etc/passwd and /etc/group, these integers are mapped to the human-
readable names with which UNIX users are familiar, such as the user root.
• Each child process inherits its parent’s user and group ownership.
• Each process is also part of a process group.
• Process ID is represented by the pid_t type, which is defined in The header file <sys/types.h>.
OBTAINING THE PROCESS ID & PARENT
PROCESS ID
• The getpid() system call returns the process ID of the invoking process:
#include <sys/types.h>
#include <unistd.h>
pid_t getpid (void);
• The getppid() system call returns the process ID of the invoking process’s
parent:
#include <sys/types.h>
#include <unistd.h>
pid_t getppid (void);
RUNNING A NEW PROCESS
• In UNIX, the act of loading into memory and executing a program image is separate
from the act of creating a new process.
• One system call loads a binary program into memory, replacing the previous
contents of the address space, and begins execution of the new program.
• This is called executing a new program, and the functionality is provided by the
exec family of calls.
• A different system call is used to create a new process, which initially is a near-
duplicate of its parent process.
EXEC FAMILY OF CALLS
• There is no single exec function; instead, there is a family of exec functions built on a single system
call.
#include <unistd.h>
int execl (const char *path,
const char *arg,
...);
• A call to execl() replaces the current process image with a new one by loading into memory the
program pointed at by path.
• The parameter arg is the first argument to this program.
• The ellipsis signifies a variable number of arguments—the execl() function is variadic, which means
that additional arguments may optionally follow, one by one.
REST OF EXECL()
• In addition to execl(), there are five other members of the exec family:
#include <unistd.h>
int execlp (const char *file,
const char *arg,
...);
int execle (const char *path,
const char *arg,
...,
REST OF EXECL()
char * const envp[]);
int execv (const char *path, char *const argv[]);
int execvp (const char *file, char *const argv[]);
int execve (const char *filename,
char *const argv[],
char *const envp[]);
REST OF EXECL()
• The syntax is simple.
• The l and v delineate whether the arguments are provided via a list or
an array (vector).
• The p denotes that the user’s full path is searched for the given file.
• Commands using the p variants can specify just a filename, so long as
it is located in the user’s path.
• Finally, the e notes that a new environment is also supplied for the
new process.
FORK() SYSTEM CALL
• A new process running the same image as the current one
can be created via the fork() system call:
#include <sys/types.h>
#include <unistd.h>
pid_t fork (void);
FORK() SYSTEM CALL
• A successful call to fork() creates a new process, identical in almost
all aspects to the invoking process.
• Both processes continue to run, returning from fork() as if nothing
special had happened.
• The new process is called the “child” of the original process, which in
turn is called the “parent.”
• In the child, a successful invocation of fork() returns 0.
• In the parent, fork() returns the pid of the child.
FORK() SYSTEM CALL
• The child and the parent process are identical, except for a
few necessary differences:
• The pid of the child is, of course, newly allocated and different from
that of the parent.
• The child’s parent pid is set to the pid of its parent process.
• Resource statistics are reset to zero in the child.
• Any pending signals are cleared and not inherited by the child.
• Any acquired file locks are not inherited by the child.
FORK() SYSTEM CALL
• The most common usage of fork() is to create a new process
in which a new binary image is then loaded-think of a shell
running a new program for the user or a process spawning a
helper program.
• First the process forks a new process, and then the child
executes a new binary image.
• This “fork plus exec” combination is frequent and simple.
FORK() SYSTEM CALL
• The following example spawns a new process running the
binary /bin/windlass:
pid_t pid;
pid = fork ();
if (pid == −1)
perror ("fork");
/* the child ... */
FORK() SYSTEM CALL
if (!pid) {
const char *args[] = { "windlass", NULL };
int ret;
ret = execv ("/bin/windlass", args);
if (ret == −1) {
perror ("execv");
exit (EXIT_FAILURE);
}
}
FORK() SYSTEM CALL
• The parent process continues running with no change, other
than that it now has a new child.
• The call to execv() changes the child to running the
/bin/windlass program.
COPY ON WRITE
• Copy-on-write is a lazy optimization strategy designed to mitigate the
overhead of duplicating resources.
• If multiple consumers request read access to their own copies of a
resource, duplicate copies of the resource need not be made.
• Instead, each consumer can be handed a pointer to the same resource.
• So long as no consumer attempts to modify its “copy” of the resource,
the illusion of exclusive access to the resource remains, and the
overhead of a copy is avoided.
COPY ON WRITE
• If a consumer does attempt to modify its copy of the
resource, at that point, the resource is transparently
duplicated, and the copy is given to the modifying consumer.
• The consumer, never the wiser, can then modify its copy of
the resource while the other consumers continue to share the
original, unchanged version.
• So the name: the copy occurs only on write.
COPY ON WRITE
• The primary benefit is that if a consumer never modifies its
copy of the resource, a copy is never needed.
• The general advantage of lazy algorithms—that they defer
expensive actions until the last possible moment—also
applies.
• In the specific example of virtual memory, copy-on-write is
implemented on a per-page basis.
COPY ON WRITE
• The kernel implementation is simple.
• The pages are marked as read-only and as copy-on-write in the
kernel’s page-related data structures.
• If either process attempts to modify a page, a page fault occurs.
• The kernel then handles the page fault by transparently making
a copy of the page; at this point, the page’s copy-on-write
attribute is cleared, and it is no longer shared.
COPY ON WRITE
• Copy-on-write has yet a bigger benefit in the case of forking.
• Because a large percentage of forks are followed by an exec,
copying the parent’s address space into the child’s address
space is often a complete waste of time: if the child
summarily executes a new binary image, its previous address
space is wiped out.
• Copy-on-write optimizes for this case.
TERMINATING A PROCESS
• POSIX defines a standard function for terminating the current process:
#include <stdlib.h>
void exit (int status);
• A call to exit() performs some basic shutdown steps, then instructs the
kernel to terminate the process.
• This function has no way of returning an error-in fact, it never returns at
all.
• It does not make sense for any instructions to follow the exit() call.
TERMINATING A PROCESS
• Before terminating the process, the C library performs the
following shutdown steps, in order:
1. Call any functions registered with atexit() or on_exit(), in the reverse
order of their registration.
2. Flush all open standard I/O streams (see Chapter 3).
3. Remove any temporary files created with the tmpfile() function.
• When a process exits, the kernel cleans up all of the resources
that it created on the process’s behalf that are no longer in use.
WAITING FOR TERMINATED CHILD
PROCESSES
• If a child process were to entirely disappear when terminated no remnants would
remain for the parent to investigate.
• The original designers of UNIX decided that when a child dies before its parent, the
kernel should put the child into a special process state.
• A process in this state is known as a zombie, a minimal skeleton of what was once the
process-some basic kernel data structures containing potentially useful data-is retained.
• A process in this state waits for its parent to inquire about its status (a procedure known
as waiting on the zombie process).
• Only after the parent obtains the information preserved about the terminated child
does the process formally exit and cease to exist even as a zombie.
WAITING FOR TERMINATED CHILD
PROCESSES
• The Linux kernel provides several interfaces for obtaining
information about terminated children.
• The simplest such interface, defined by POSIX, is wait():
#include <sys/types.h>
#include <sys/wait.h>
pid_t wait (int *status);
WAITING FOR TERMINATED CHILD
PROCESSES
• A call to wait() returns the pid of a terminated child or −1 on error.
• If no child has terminated, the call blocks until a child terminates.
• If a child has already terminated, the call returns immediately.
• Consequently, a call to wait() in response to news of a child’s demise
—say, upon receipt of a SIGCHLD—will always return without
blocking.
WAITING FOR TERMINATED CHILD
PROCESSES
• Example program that uses wait() to figure out what happened to its child:
• Show wait.c example
• This program forks a child, which immediately exits.
• The parent process then executes the wait() system call to determine the status of its child.
• The process prints the child’s pid, and how it died.
• Because in this case the child terminated by returning from main(), we know that we will see
output similar to the following:
$ ./wait
pid=8529
Normal termination with exit status=1
WAITING FOR A SPECIFIC PROCESS
• Observing the behavior of child processes is important.
• However, a process has multiple children, and does not wish to wait for all of
them, but rather for a specific child process.
• One solution would be to make multiple invocations of wait(), each time
noting the return value.
• This is cumbersome, though-what if you later wanted to check the status of a
different terminated process?
• The parent would have to save all of the wait() output in case it needed it
later.
WAITING FOR A SPECIFIC PROCESS
• If you know the pid of the process you want to wait for, you can
use the waitpid() system call:
#include <sys/types.h>
#include <sys/wait.h>
pid_t waitpid (pid_t pid, int *status, int options);
• The waitpid() call is a more powerful version of wait().
• Its additional parameters allow for fine-tuning.
LAUNCHING AND WAITING FOR A NEW
PROCESS
• POSIX defines an interface that couples spawning a new process and waiting for its termination
—think of it as synchronous process creation.
• If a process is spawning a child only to immediately wait for its termination, it makes sense to
use this interface:
#define _XOPEN_SOURCE /* if we want WEXITSTATUS, etc. */
#include <stdlib.h>
int system (const char *command);
• The system() function is so named because the synchronous process invocation is called
shelling out to the system.
• It is common to use system() to run a simple utility or shell script, often with the explicit goal of
simply obtaining its return value.
LAUNCHING AND WAITING FOR A NEW
PROCESS
• A call to system() invokes the command provided by the command parameter,
including any additional arguments.
• The command parameter is suffixed to the arguments /bin/sh-c.
• In this sense, the parameter is passed wholesale to the shell.
• On success, the return value is the return status of the command as provided
by wait().
• The exit code of the executed command is obtained via WEXITSTATUS.
• If invoking /bin/sh itself failed, the value given by WEXITSTATUS is the same as
that returned by exit(127).
LAUNCHING AND WAITING FOR A NEW
PROCESS
• Because it is also possible for the invoked command to return 127, there is
no surefire method to check whether the shell itself returned that error.
• On error, the call returns −1.
• During execution of the command, SIGCHLD is blocked, and SIGINT and
SIGQUIT are ignored.
• Ignoring SIGINT and SIGQUIT has several implications, particularly if
system() is invoked inside a loop.
• If calling system() from within a loop, you should ensure that the program
properly checks the exit status of the child.
LAUNCHING AND WAITING FOR A NEW
PROCESS
• For example:
do {
int ret;
ret = system ("pidof rudderd");
if (WIFSIGNALED (ret) &&
(WTERMSIG (ret) == SIGINT ||
WTERMSIG (ret) == SIGQUIT))
break; /* or otherwise handle */
} while (1);
LAUNCHING AND WAITING FOR A NEW
PROCESS
• Implementing system() using fork(), a function from the exec family, and wait pid() is a useful
exercise.
• An example follows: fork.c example. See
[Link]
?usp=sharing
for more information.
• Note that this example does not block or disable any signals, unlike the official system().
• This behavior may be better or worse but leaving at least SIGINT unblocked is often smart
because it allows the invoked command to be interrupted in the way a user normally expects.
• A better implementation could add additional pointers as parameters that, when non-NULL,
signify errors currently differentiable from each other.
• For example, one might add fork_failed and shell_failed.
ZOMBIES
• A process that has terminated but has not yet been waited upon by its
parent is called a “zombie.”
• Zombie processes continue to consume system resources, enough to
maintain a mere skeleton of what they once were.
• These resources remain so that parent processes that want to check up on
the status of their children can obtain information relating to the life and
termination of those processes.
• Once the parent does so, the kernel cleans up the process for good and
the zombie ceases to exist.
USERS AND GROUPS
• In UNIX, processes are associated with users and groups.
• The user and group identifiers are numeric values
represented by the C types uid_t and gid_t, respectively.
• The mapping between numeric values and human-readable
names—as in the root user having the uid 0-is performed in
user space using the files /etc/passwd and /etc/group.
• The kernel deals only with the numeric values.
USERS AND GROUPS
• In a Linux system, a process’s user and group IDs dictate the operations that the process may
undertake.
• Processes must run under the appropriate users and groups.
• Many processes run as the root user.
• Good software development encourages the doctrine of least-privileged rights, meaning that a
process should execute with the minimum level of rights possible.
• This requirement is dynamic: if a process requires root privileges to perform an operation early
in its life but does not require these extensive privileges thereafter, it should drop root
privileges as soon as possible.
• Many processes-particularly those that need root privileges to carry out certain operations-
often manipulate their user or group IDs.
SESSIONS AND PROCESS GROUPS
• Each process is a member of a process group, which is a
collection of one or more processes generally associated with
each other for the purposes of job control.
• The primary attribute of a process group is that signals may
be sent to all processes in the group: a single action can
terminate, stop, or continue all processes in the same
process group.
SESSIONS AND PROCESS GROUPS
• When a new user first logs into a machine, the login process creates a new
session that consists of a single process, the user’s login shell.
• The login shell functions as the session leader.
• The pid of the session leader is used as the session ID.
• A session is a collection of one or more process groups.
• Sessions arrange a logged-in user’s activities and associate that user with a
controlling terminal, which is a specific tty device that handles the user’s
terminal I/O.
• Sessions are largely the business of shells.
SESSIONS AND PROCESS GROUPS
• Each process group is identified by a process group ID (pgid) and has a process
group leader.
• The process group ID is equal to the pid of the process group leader.
• Process groups exist so long as they have one remaining member.
• Linux provides several interfaces for setting and retrieving the session and
process groups associated with a given process.
• These are primarily of use for shells, but can also be useful to processes such
as daemons that want to get out of the business of sessions and process groups
altogether.
SESSION SYSTEM CALLS
• Shells create new sessions on login.
• This is done via a special system call, which makes creating a
new session easy:
#include <unistd.h>
pid_t setsid (void);
SESSION SYSTEM CALLS
• A call to setsid() creates a new session, assuming that the process is
not already a process group leader.
• The calling process is made the session leader and sole member of
the new session, which has no controlling tty.
• The call also creates a new process group inside the session and
makes the calling process the process group leader and sole member.
• The new session’s and process group’s IDs are set to the calling
process’s pid.
SESSION SYSTEM CALLS
• In other words, setsid() creates a new process group inside of
a new session and makes the invoking process the leader of
both.
• This is useful for daemons, which do not want to be members
of existing sessions or to have controlling terminals, and for
shells, which want to create a new session for each user upon
login.
DAEMONS
• A daemon is a process that runs in the background, not connecting to any
controlling terminal.
• Daemons are normally started at boot time, are run as root or some other
special user (such as apache or postfix), and handle system-level tasks.
• As a convention, the name of a daemon often ends in d (as in crond and
sshd), but this is not required or even universal.
• A daemon has two general requirements: it must run as a child of init and
it must not be connected to a terminal.
DAEMONS
• In general, a program performs the following steps to become a
daemon:
• Call fork(). This creates a new process, which will become the daemon.
• In the parent, call exit(). This ensures that the original parent (the daemon’s
grandparent) is satisfied that its child terminated, that the daemon’s parent is
no longer running, and that the daemon is not a process group leader.
• Call setsid(), giving the daemon a new process group and session, both of which
have it as leader. This also ensures that the process has no associated
controlling terminal (as the process just created a new session and will not
assign one).
DAEMONS
• Change the working directory to the root directory via chdir(). This is
done because the inherited working directory can be anywhere on the
filesystem.
• Close all file descriptors. You do not want to inherit open file
descriptors, and, unaware, hold them open.
• Open file descriptors 0, 1, and 2 (standard in, standard out, and
standard error) and redirect them to /dev/null.
DAEMONS
• UNIX systems provide a daemon() function in their C library
that automates the cumbersome manual usage into this
simple call:
#include <unistd.h>
int daemon (int nochdir, int noclose);