Chapter 4
Threads
Spring 2020 © 2020 by Greg Ozbirn, UTD, for 1
use with Stalling's 9th Ed. OS book
Processes and Threads
A process has two main characteristics:
• Resource ownership
• Scheduling/execution
However, these could be split apart and managed separately.
2
Threads
If splitting into two parts, we have:
• Process: owns resources
• Thread: executes
3
Multithreading
• Multithreading refers to multiple threads of execution in a
single process.
• A process that is not multithreaded is called single-threaded.
4
OS’s and Threads
• MS-DOS supports a single thread.
• Traditional UNIX supports multiple user processes but
only supports one thread per process.
• Modern OS’s such as Windows and Solaris support
multiple threads per process.
5
Java
DOS Run-time
environment
Some Unix versions Windows, Solaris
6
p.155
Multithreading
Process:
• Virtual address space for the process image.
• Resource ownership.
Thread:
• Thread execution state (running, etc.)
• Thread context
• Stack
• Local variables
• Shared access to memory and resources of its process
7
8
Thread Benefits
• Far less time to create a new thread within a process than
to spawn a new process (maybe factor of 10 on some
systems).
• Less time to terminate a thread than a process.
• Less time to switch between threads than processes.
• Communication between threads is more efficient than
between processes (since they share memory).
9
Thread Example
• Suppose a file server receives requests from various
clients.
• The file server could create a new thread for each request.
• On a multiprocessor machine the threads could even run
simultaneously.
• Thread efficiency seen especially if threads must
communicate or coordinate.
• Threads can make program organization easier.
10
Other Thread Examples
• Foreground and background work: example: spreadsheet,
one thread for user interaction and one for executing
commands.
• Asynchronous processing: word processor, thread to
periodically save to disk. (Not synchronized with the main
processing in the application.)
• Speed-up execution: overlapping activity (read and
process) especially on a multiprocessor.
• Modularity: structuring a program logically into threads
may aid with program design.
11
Thread States
• As with processes, threads may be in different states.
• States associated with a thread are running, ready and
blocked.
• Process state changes can affect the threads of the process:
– Suspending a process involves suspending all threads
of the process since all threads share the same address
space.
– Termination of a process would terminate all threads
within the process.
12
Parallelism in Threads
• A key benefit of threads is to achieve a speed-up in
execution through parallelism.
• For example, a program might have two activities that
could be done in parallel.
• If these activities are done in sequence, it will take a
certain amount of time to complete.
• But if the activities are done in parallel, by using a separate
thread for each, then the activities could complete twice as
fast.
13
Example: RPC using Threads
• RPC is remote procedure call, it means to call a procedure
of another process, usually on another machine.
• The next two slides illustrate how threading may be used
to improve the efficiency of RPC calls.
• With a single thread, each RPC must run in series (one
after another).
• With multiple threads, even on a uniprocessor, the two
calls can proceed and wait in two different threads, thus
allowing a degree of parallelism.
14
15
16
Thread Interleaving
• On a uniprocessor with multithreading, threads among the
same process or differing processes are interleaved.
• This is similar to how processes are interleaved without
threads.
• See example on next slide.
17
18
Thread Synchronization
• Threads share the resources of their process.
• Because of this their activities may need to be coordinated.
• For example, if two threads of a process are operating on
the same linked-list, it is important that they not interfere
with each other or the list could be corrupted.
• The next chapter addresses issues of thread
synchronization.
19
Thread Blocking
• Does blocking a thread block its process?
• If so, then the threading benefit is reduced since we would
like one thread to run in our application while another is
blocked.
• The answer to this depends on the type of threading in use.
• The next slide discusses the two major kinds of thread
support: User-level and Kernel-level.
20
User-Level and Kernel-Level Threads
There are two ways to implement thread support in an OS.
• User-Level Threads: the application supports threading
through a thread library and the OS isn’t aware of threads.
• Kernel-Level Threads: the operating system supports
threads itself.
21
User-Level Threads
• A process may use a thread library to create separate threads
within itself.
• These threads are application-level threads, the OS isn’t aware
of them, it just schedules the process as usual.
• The application’s thread library determines which of the
application’s threads to run whenever the process is scheduled.
• The application can spawn new threads which are managed by
the thread library.
• So, the OS runs the process, and the process uses a library to
create and manage its threads.
22
Thread 2 does
blocking call,
blocks process
Thread 2 needs
something
Process has a
from thread 1,
timeout, goes
it enters a
to ready state.
blocked state.
23
User-Level Threads
• Advantages
Disadvantages
– Most
Threadsystem
switching
calls does
are blocking
not involve
for processes.
the kernel: So
no all
modethreads
switching
within
– aScheduling
process will
canbebeblocked
application specific: choose the best algorithm.
– The
Can kernel
run on can
any only assignneeds
OS. Only processes to processors.
a thread library Two threads
within the same process cannot run simultaneously on two
processors
24
Kernel-Level Threads
• The OS may be designed to support threads itself.
• The application creates threads by calling operating system
functions (an API).
• The OS schedules and manages individual threads rather
than entire processes.
25
Kernel-Level Threads
• Advantages • Drawbacks
– the kernel can – thread switching within
simultaneously schedule the same process
many threads of the same requires a mode switch
process on many processors to the kernel, which
– blocking is done on a thread may result in a
level, so one thread in a significant slow down
process can block while (factor of 10 on some
another runs (doesn’t block systems).
the entire process)
– kernel routines can be
multithreaded
26
Combined Approaches
• Solaris used a combined approach to threads prior to
version 9.
• This lets applications create threads in user space that are
then mapped onto an available pool of threads maintained
by the kernel.
• This approach gets much of the efficiency of ULTs without
the problem of blocking all threads of a process.
• The number of KLTs for an application may be specified
by the programmer for best performance.
• Solaris 9 and 10 now use KLTs.
27
28
Multicore and Multithreading
• From Amdahl’s law:
execution time on a single processor
Speedup = -----------------------------------------------
execution time on N parallel processors
1
= ---------------------
(1-f) + (f/N)
where f is parallelizable portion
29
Speedup
• If serial portion is 10%, on an 8-core system:
f = 0.9
Speedup = 1/[(1-0.9) + (0.9 / 8)]
= 1/[ (0.1) + (0.1125) ]
= 1/(0.2125)
= 4.70
• So, with only 10% of processing being serial, speedup is
only 4.7.
30
31
Speedup
• Typically, there is an increased overhead cost as the
number of processors involved increases.
• This can result in degraded performance as the number of
processes increases.
• So the actually speedup might be more like the graph on
the following slide:
32
33
Speedup
• However, when great attention is paid to reducing the
serial fraction, good results can be obtained.
• Many kinds of servers, such as database servers that must
handle numerous independent tasks can be made to run
effectively on a multicore system.
• This is demonstrated for a few example systems on the
next slide:
34
35
Speedup
• Other examples of software that may benefit with multiple
cores:
• Multithreaded applications
• Multiprocess applications
• Java applications (JVM can benefit as well)
• Multi-instance applications (running more than one
instance in parallel)
36
Application Example
• Valve Game Software.
• Valve has created a number of popular games, as well as
the Source engine, one of the most widely played game
engines available.
• In recent years, Valve reprogrammed Source to use
multithreading to exploit multicore chips.
• The revised code better supports games such as Half-Life
2.
Threading Options
Valve considered these threading options:
• Coarse threading:
– Subsystems are assigned to individual processor cores.
– For example, rendering on one, AI on another, physics
on another, etc.
• Fine-grained threading:
– Many small similar tasks are spread across multiple
cores.
– For example, iterations of a loop could be distributed
across cores.
• Hybrid threading:
– Combination of coarse threading for some systems and
fine-grained threading for other systems.
Coarse Threading
• Valve found 2X speed up across two processors versus a
single processor, but only in contrived cases.
• For actual gameplay, the improvement was about 1.2X.
Fine-grained Threading
• Valve also found the effective use of fine-grained
threading was difficult.
• The time per work unit can be variable, and managing the
timeline of outcomes and consequences involved complex
programming.
Hybrid Threading
• Valve found that a hybrid threading approach was the most
promising and would scale the best, as multicore
processors with 8 or 16 cores became available.
• Valve identified systems which could operate effectively
on a single processor.
• One example is sound mixing, which requires little user
interaction, and works on its own set of data.
• Other modules, such as scene rendering, can be organized
into a number of threads so the module can execute on a
single processor but achieve greater performance if spread
across many processors.
Rendering Example
• In the rendering module, higher-level threads spawn lower
level threads as needed.
• The rendering module uses a world list, which is a
database representation of the game’s world.
• First the areas of the world to be rendered are determined,
then the objects in these areas are identified.
• The renderer must determine the rendering of each object
from multiple points of view.
Shared DB Access
• Valve found that locking the database for a thread was too
inefficient.
• With further investigation, they found that 95% of the time
a thread is reading the database, and only 5% of the time
writing to it.
• A concurrency approach known as single-writer-multiple-
readers model was used.
End of Slides
based on Stalling's official slides 44