Operating System
Ch 4
What is a Thread?
• Let us take an example of a human body.
• A human body has different parts having different functionalities
which are working parallel ( Eg: Eyes, ears, hands, etc).
• Similarly in computers, a single process might have multiple
functionalities running parallel where each functionality can be
considered as a thread.
What is Thread in OS?
• Thread is a sequential flow of tasks within a process.
• Threads in OS can be of the same or different types.
• Threads are used to increase the performance of the applications.
• Each thread has its own thread id, program counter, stack, and set of registers.
• But the threads of a single process might share the same code and data/file.
• Threads are also termed as lightweight processes as they share common
resources.
• It shares its code section, data section, other operating-system resources, such
as open files and signals with other threads belonging to the same process
• Eg: While playing a movie on a device the audio and video are controlled by
different threads in the background.
• Each thread belongs to exactly one process and no thread can exist outside
a process.
• Each thread represents a separate flow of control.
• Threads have been successfully used in implementing network servers and
web server.
• They also provide a suitable foundation for parallel execution of
applications on shared memory multiprocessors.
• The process can be split down into so many threads. For example, in a
browser, many tabs can be viewed as threads. MS Word uses many threads
- formatting text from one thread, processing input from another thread,
etc.
• An application typically is implemented as a separate process with
several threads of control
• For example
• A web browser might have one thread display images or text while another
thread retrieves data from the network.
• Process creation is time consuming and resource intensive. If the new
process will perform the same tasks as the existing process.
• It is generally more efficient to use one process that contains multiple
threads
• Most operating-system kernels are now multithreaded.
• Several threads operate in the kernel, and each thread performs a
specific task, such as
• managing devices,
• managing memory,
• or interrupt handling
Working of a single-threaded and a
multithreaded process.
Need of thread
• Threads in the operating system provide multiple benefits and
improve the overall performance of the system. Some of the reasons
threads are needed in the operating system are:
• Since threads use the same data and code, the operational cost
between threads is low.
• Creating and terminating a thread is faster compared to creating
or terminating a process.
• Context switching is faster in threads compared to processes.
Why Multithreading?
• In Multithreading, the idea is to divide a single process into multiple
threads instead of creating a whole new process. Multithreading is done to
achieve parallelism and to improve the performance of the applications as
it is faster in many ways which were discussed above. The other
advantages of multithreading are mentioned below.
• Resource Sharing: Threads of a single process share the same resources
such as code, data/file.
• Responsiveness: Program responsiveness enables a program to run even if
part of the program is blocked or executing a lengthy operation. Thus,
increasing the responsiveness to the user.
• Economy: It is more economical to use threads as they share the resources
of a single process. On the other hand, creating processes is expensive.
Difference between process and thread
Process Thread
Thread is light weight, taking lesser resources than a
Process is heavy weight or resource intensive.
process.
Thread switching does not need to interact with operating
Process switching needs interaction with operating system.
system.
In multiple processing environments, each process executes
All threads can share same set of open files, child processes.
the same code but has its own memory and file resources.
If one process is blocked, then no other process can execute While one thread is blocked and waiting, a second thread in
until the first process is unblocked. the same task can run.
Multiple processes without using threads use more
Multiple threaded processes use fewer resources.
resources.
In multiple processes each process operates independently
One thread can read, write or change another thread's data.
of the others.
Types of thread
User Level Threads
• In this case, the thread management kernel is not aware of the
existence of threads.
• The thread library contains code for creating and destroying threads,
for passing message and data between threads, for scheduling thread
execution and for saving and restoring thread contexts.
• The application starts with a single thread.
Advantages of User-level threads
• The user threads can be easily implemented than the kernel thread.
• User-level threads can be applied to such types of operating systems that
do not support threads at the kernel-level.
• It is faster and efficient.
• Context switch time is shorter than the kernel-level threads.
• It does not require modifications of the operating system.
• User-level threads representation is very simple. The register, PC, stack,
and mini thread control blocks are stored in the address space of the user-
level process.
• It is simple to create, switch, and synchronize threads without the
intervention of the process.
• Disadvantages of User-level threads
• User-level threads lack coordination between the thread and the
kernel.
• If a thread causes a page fault, the entire process is blocked.
• In a typical operating system, most system calls are blocking.
• Multithreaded application cannot take advantage of multiprocessing.
Kernel level thread
• The kernel thread recognizes the operating system.
• There is a thread control block and process control block in the system for
each thread and process in the kernel-level thread.
• The kernel-level thread is implemented by the operating system.
• The kernel knows about all the threads and manages them.
• The kernel-level thread offers a system call to create and manage the
threads from user-space.
• The implementation of kernel threads is more difficult than the user
thread.
• Context switch time is longer in the kernel thread.
• If one thread in a process is blocked, the Kernel can schedule another
thread of the same process
• Example: Window Solaris.
Advantages of Kernel-level threads
• The kernel-level thread is fully aware of all threads.
• The scheduler may decide to spend more CPU time in the process of
threads being large numerical.
• The kernel-level thread is good for those applications that block the
frequency.
Disadvantages of Kernel-level threads
• The kernel thread manages and schedules all threads.
• The implementation of kernel threads is difficult than the user thread.
• The kernel-level thread is slower than user-level threads.
Multithreading Models
• Some operating system provide a combined user level thread and
Kernel level thread facility. Solaris is a good example of this combined
approach. In a combined system, multiple threads within the same
application can run in parallel on multiple processors and a blocking
system call need not block the entire process. Multithreading models
are three types
• Many to many relationship.
• Many to one relationship.
• One to one relationship.
Many to Many Model
• The many-to-many model multiplexes any number of user threads
onto an equal or smaller number of kernel threads.
• In this model, developers can create as many user threads as
necessary and the corresponding Kernel threads can run in parallel on
a multiprocessor machine.
• This model provides the best accuracy on concurrency and when a
thread performs a blocking system call, the kernel can schedule
another thread for execution.
The following diagram shows the many-to-many threading model where 6 user level
threads are multiplexing with 6 kernel level threads.
Many to One Model
• Many-to-one model maps many user level threads to one Kernel-level
thread.
• Thread management is done in user space by the thread library.
• When thread makes a blocking system call, the entire process will be
blocked.
• Only one thread can access the Kernel at a time, so multiple threads
are unable to run in parallel on multiprocessors.
• If the user-level thread libraries are implemented in the operating
system in such a way that the system does not support them, then
the Kernel threads use the many-to-one relationship modes.
One to One Model
• There is one-to-one relationship of user-level thread to the kernel-
level thread.
• This model provides more concurrency than the many-to-one model.
It also allows another thread to run when a thread makes a blocking
system call.
• It supports multiple threads to execute in parallel on microprocessors.
• Disadvantage of this model is that creating user thread requires the
corresponding Kernel thread.
• OS/2, windows NT and windows 2000 use one to one relationship
model.
Difference between User-Level & Kernel-Level
Thread
User-Level Threads Kernel-Level Thread
User-level threads are faster to create and Kernel-level threads are slower to create and
manage. manage.
Implementation is by a thread library at the user Operating system supports creation of Kernel
level. threads.
User-level thread is generic and can run on any Kernel-level thread is specific to the operating
operating system. system.
Multi-threaded applications cannot take Kernel routines themselves can be
advantage of multiprocessing. multithreaded.
Benefits of Threads
• Enhanced throughput of the system: When the process is split into
many threads, and each thread is treated as a job, the number of jobs
done in the unit time increases. That is why the throughput of the
system also increases.
• Effective Utilization of Multiprocessor system: When you have more
than one thread in one process, you can schedule more than one
thread in more than one processor.
• Faster context switch: The context switching period between threads
is less than the process context switching. The process context switch
means more overhead for the CPU.
• Responsiveness: When the process is split into several threads, and
when a thread completes its execution, that process can be
responded to as soon as possible.
• Communication: Multiple-thread communication is simple because
the threads share the same address space, while in process, we adopt
just a few exclusive communication strategies for communication
between two processes.
• Resource sharing: Resources can be shared between all threads
within a process, such as code, data, and files. Note: The stack and
register cannot be shared between threads. There is a stack and
register for each thread.
Multicore Programming
• On a system with a single computing core, concurrency merely
means that the execution of the threads will be interleaved over time,
because the processing core is capable of executing only one thread
at a time.
• On a system with multiple cores, however, concurrency means that
the threads can run in parallel, because the system can assign a
separate thread to each core
• A system is parallel if it can perform more than one task
simultaneously.
• In contrast, a concurrent system supports more than one task by
allowing all the tasks to make progress.
• Thus, it is possible to have concurrency without parallelism.
• CPU schedulers were designed to provide the illusion of parallelism by
rapidly switching between processes in the system,
• thereby allowing each process to make progress.
• Such processes were running concurrently, but not in parallel
• Systems have grown from tens of threads to thousands of threads,
• CPU designers have improved system performance by adding
hardware to improve thread performance.
Multicore Programming
• In general, five areas present challenges in programming for multicore
systems:
• Identifying tasks. This involves examining applications to find areas that can
be divided into separate, concurrent tasks.
• Balance. While identifying tasks that can run in parallel, programmers must
also ensure that the tasks perform equal work of equal value.
• Data splitting. Just as applications are divided into separate tasks, the data
accessed and manipulated by the tasks must be divided to run on separate
cores.
• Data dependency. The data accessed by the tasks must be examined for
dependencies between two or more tasks. When one task depends on data
from another, programmers must ensure that the execution of the tasks is
synchronized to accommodate the data dependency
• Testing and debugging. When a program is running in parallel on multiple
cores, many different execution paths are possible. Testing and debugging
such concurrent programs is inherently more difficult than testing and
debugging single-threaded applications.
Types of Parallelism
• In general, there are two types of parallelism: data parallelism and task
parallelism.
• Data parallelism focuses on distributing subsets of the same data across
multiple computing cores and performing the same operation on each
core.
• Task parallelism involves distributing not data but tasks (threads) across
multiple computing cores. Each thread is performing a unique operation.
• Different threads may be operating on the same data, or they may be operating on
different data.
• Data parallelism involves the distribution of data across multiple cores and
task parallelism on the distribution of tasks across multiple cores
Amdahl's law
• The observation is that the performance improvement that can be
gained through parallel processing is limited by the part of a system
that's inherently sequential -- that is, the set of operations that must
be run in series.
• Example: If a program is composed of a set of instructions that take
10 hours to run in series, and a one-hour portion of that program
can't be parallelized, the absolute best you can expect is to approach
the limit of 10x improvement in execution time. That's because no
matter how much you parallelize the nine hours that can run in
parallel, you will never be able to do away with the one hour that
must be run in sequence.
Thread Libraries
• A thread library provides the programmer with an API for creating and
managing threads.
• There are two primary ways of implementing a thread library.
• The first approach is to provide a library entirely in user space with no
kernel support.
• All code and data structures for the library exist in user space.
• This means that invoking a function in the library results in a local function call in
user space and not a system call.
• The second approach is to implement a kernel-level library supported
directly by the operating system.
• In this case, code and data structures for the library exist in kernel space.
• Invoking a function in the API for the library typically results in a system call to the
kernel.
There are three main thread libraries in use today:
• POSIX Pthreads - may be provided as either a user or kernel library, as
an extension to the POSIX standard.
• Win32 threads - provided as a kernel-level library on Windows
systems.
• Java threads - Since Java generally runs on a Java Virtual Machine, the
implementation of threads is based upon whatever OS and hardware
the JVM is running on, i.e. either Pthreads or Win32 threads
depending on the system.
PThread
• May be provided either as user-level or kernel-level
• Pthreads, is an execution model that exists independently from a language, as
well as a parallel execution model.
• It allows a program to control multiple different flows of work that overlap in
time.
• A POSIX standard API for thread creation and synchronization
• API specifies behavior of the thread library, implementation is up to development
of the library
• Common in UNIX operating systems (Solaris, Linux, Mac OS X)
• Global variables are shared amongst all threads.
• One thread can wait for the others to rejoin before continuing.
• pThreads begin execution in a specified function
Windows Threads
• Technique is similar to the Pthreads technique in several ways
• Thread created using CreateThread () function just as in Pthreads
• set of attributes for the thread is passed to this function.
• These attributes include security information, the size of the stack,
and a flag that can be set to indicate if the thread is to start in a
suspended state.
Java Threads
• Java threads are managed by the JVM
• Typically implemented using the threads model provided by underlying OS
• Java threads may be created by:
• Extending Thread class
• Implementing the Runnable interface
• ALL Java programs use Threads - even "common" single-threaded ones.
Thread Issues
• Semantics of fork() and exec() system calls
• Thread cancellation of target thread
• Asynchronous or deferred
• Signal handling
• Thread pools
• Thread-specific data
• Scheduler activations
Fork() & exec()
• Does fork() duplicate only the calling thread
• Some UNIX systems have chosen to have two versions of fork()
• If exec() is called immediately after fork(), then duplicating all threads
is un necessary. Because, exec() will load a program to just one thread
• If the process doesn’t call exec() after fork(), then all threads could be
duplicated
Thread Cancellation
• Terminating a thread before it has finished
• Two general approaches:
• Asynchronous cancellation terminates the target thread immediately.
• Deferred cancellation allows the target thread to periodically check if it
should be cancelled (Opportunity to terminate itself in an orderly fashion).
• Cancelling a thread asynchronously can cause issues
Signal Handler
• Signals are used in UNIX systems to notify a process that a particular event has occurred.
• A signal handler is used to process signals
1. Signal is generated by particular event
2. Signal is delivered to a process
3. Signal is handled
• Options:
• Deliver the signal to the thread to which the signal applies
• Deliver the signal to every thread in the process
• Deliver the signal to certain threads in the process
• Assign a specific thread to receive all signals for the process
Thread Pools
• Create a number of threads in a pool where they await work (Web Server)
• Advantages:
• Usually slightly faster to service a request with an existing thread than create a new thread
• Allows the number of threads in the application(s) to be bound to the size of the pool
• After completing its tasks it returns to the pool and waits
• No. of threads in the pool >> No. of CPUs, Amount of Physical Memory, Expected No. of client requests.
• Dynamically adjust the No. of threads according to usage pattern
Thread Specific Data
• Allows each thread to have its own copy of data
• Useful when you do not have control over the thread creation process (i.e., when
using a thread pool)
• Almost all thread libraries provide support for this
Scheduler Activation
• Both M:M and Two-level models require communication to maintain the
appropriate number of kernel threads allocated to the application
• Scheduler activations provide upcalls - a communication mechanism from the
kernel to the thread library
• This communication allows an application to maintain the correct number kernel
threads
Windows XP Thread
Windows Thread
• Implements the one-to-one mapping, kernel-level
• Each thread contains
• A thread id
• Register set
• Separate user and kernel stacks
• Private data storage area
• The register set, stacks, and private storage area are known as the context of the threads
• The primary data structures of a thread include:
• ETHREAD (executive thread block)
• KTHREAD (kernel thread block)
• TEB (thread environment block)
Linux Thread
• Linux refers to them as tasks rather than threads
• Thread creation is done through clone() system call
• clone() allows a child task to share the address space of the parent
task (process)