0% found this document useful (0 votes)
13 views59 pages

Multithreading and Thread Libraries Overview

Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views59 pages

Multithreading and Thread Libraries Overview

Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

 Overview

 Multicore Programming
 Multithreading Models
 Thread Libraries
 Implicit Threading
 Threading Issues
 Operating System Examples
 To introduce the notion of a thread—a fundamental unit of CPU
utilization that forms the basis of multithreaded computer
systems
 To discuss the APIs for the Pthreads, Windows, and Java thread
libraries
 To explore several strategies that provide implicit threading
 To examine issues related to multithreaded programming
 To cover operating system support for threads in Windows and
Linux
 Breaking task into smaller task can finish job quickly
 Parallelization can improve performance of applications:
 Each computing entity performs small part of job
 Can compute in parallel if supported by hardware
 GPUs can perform thousands of tasks in parallel

 An application can be implemented with several threads


 Threads are processing entities, similar to processes
 Light weight processes
 Basic unit of CPU utilization
 Help in achieving parallelization

 A traditional process has single thread of control


 Light weight processes
 Basic unit of CPU utilization
 Help in achieving parallelization

 If a process has multiple threads of control, it can perform more than


one task at a time.
 Example
 Web browser: One thread for display, another for retrieving data
 Word processor: a thread for display, another thread to respond to user
input, another to perform spell check
 Shares with other threads belonging to same process
 Code section
 Data section
 Other OS resources, such as Open files and signals

 Thread comprises of
 Thread ID
 Program counter
 Register set
 Stack
 Responsiveness – may allow continued execution if part of
process is blocked, especially important for user interfaces
 Resource Sharing – threads share resources of process, easier
than shared memory or message passing
 Economy – Allocating memory and resources for process
creation is costly. Because threads share resources of the process
to which they belong, it is more economical to create and context
switch threads.
 Scalability – process can take advantage of multiprocessor
architectures. Threads are running parallel in different
processors. A single threaded process can only run on one CPU,
no matter how many processors are available. Multithreading on
a multi-CPU machine increases concurrency
 Multicore or multiprocessor systems putting pressure on
programmers, challenges include:
 Dividing activities
 Balance
 Data splitting
 Data dependency
 Testing and debugging

 Parallelism implies a system can perform more than one


task simultaneously
 Concurrency supports more than one task making progress
 Single processor / core, scheduler providing concurrency
 Types of parallelism
 Data parallelism – distributes subsets of the same data across multiple
cores, same operation on each

 Task parallelism – distributing threads across cores, each thread


performing unique operation

 As # of threads grows, so does architectural support for threading


 CPUs have cores as well as hardware threads
 Consider Oracle SPARC T4 with 8 cores, and 8 hardware threads per core
 Concurrent execution on single-core system:

 Parallelism on a multi-core system:


 Identifies performance gains from adding additional cores to an application that
has both serial and parallel components
 S is serial portion

 N processing cores

 That is, if application is 75% parallel / 25% serial, moving from 1 to 2 cores
results in speedup of 1.6 times
 As N approaches infinity, speedup approaches 1 / S

Serial portion of an application has disproportionate effect on


performance gained by adding additional cores

 But does the law take into account contemporary multicore systems?
 User threads – Supported above the kernel and management done by
user-level threads library
 Three primary thread libraries:
 POSIX Pthreads
 Windows threads
 Java threads

 Kernel threads – Supported and managed directly by the Kernel


 Ultimately, there must be a relationship between user and kernel
threads to make the system to function
 Examples – virtually all general purpose operating systems, including:
 Windows
 Solaris
 Linux
 Tru64 UNIX
 Mac OS X
Three common ways to establishing the relainship:
 Many-to-One

 One-to-One

 Many-to-Many
 Many user-level threads mapped to single
kernel thread
 Thread management is done by the thread
library in user space, so it is efficient.
Limitations:
 One thread blocking causes all to block
 Because only one thread can access the
kernel at a time, Multiple threads may not
run in parallel on multicore or
multiprocessor system
 Few systems currently use this model
 Examples:
 Solaris Green Threads
 GNU Portable Threads
 Each user-level thread maps to kernel thread
 More concurrency than many-to-one model by allowing another
thread to run when a thread makes a blocking system call
 Also allows multiple threads to run in parallel on multiprocessors
 Creating a user-level thread creates a kernel thread
 Because the overhead of creating kernel threads can burden the
performance of an application, most implementations of this model
restricts the number of kernel threads supported by the system. (5
threads in 4 processors)
 Examples
 Windows
 Linux
 Solaris 9 and later
 Multiplexes many user-level threads to smaller or equal number of
kernel threads.
 The number of kernel threads may be specific to either a particular
application or a particular machine
 Developers can create as many user threads as necessary and the
corresponding kernel threads can run in parallel on a multiprocessor.
 Also when a thread performs a blocking system call the kernel can
schedule another thread for execution.
 Solaris prior to version 9
 Windows with the ThreadFiber package
 Similar to M:M, except that it allows a user thread to be bound to
kernel thread
 Examples
 IRIX
 HP-UX
 Tru64 UNIX
 Solaris 8 and earlier
 Hyperthreaded systems allow their processor core resources to become
multiple logical processor for performance.
 It enables the processor to execute two threads or sets if instructions, at
the same time. Since hyper-threading allows two streams to be executed
in parallel, it is almost like having two separate processors working
together
 Thread library provides programmer with API for creating and
managing threads
 Two primary ways of implementing
 Library entirely in user space
 Kernel-level library supported by the OS
 May be provided either as user-level or kernel-level
 A POSIX (Portable OS Interface) standard (IEEE 1003.1c) API for thread creation and
synchronization
 It is an execution model that exists independently from a language and a parallel
execution model. It allows a program to control multiple different workflows that
overlap in time. Each flow of work is referred to as a thread.
 Defines a standard interface and environment that can be used by an operating system
(OS) to provide access to POSIX-compliant applications. The standard also defines a
command interpreter (shell) and common utility programs. POSIX supports
application portability at the source code level so applications can be built to run
on any POSIX-compliant OS.
 If the code is written using the POSIX API, it need not be rewritten for another POSIX
compliant system.
 Even if the code is POSIX compliant, the CPU instructions in the compiled executable
are different across different CPU architectures.
 API specifies behavior of the thread library, implementation is up to development of
the library
 Common in UNIX operating systems (Solaris, Linux, Mac OS X)
 Java threads are managed by the JVM
 Typically implemented using the threads model provided by
underlying OS
 Java threads may be created by:

 Extending Thread class


 Implementing the Runnable interface
 Specific for Linux based System
 Fork() : The fork() system call is used to create a separate
duplicate process. New process will have different process ID
 Exec(): When an exec() system call is invoked, the program
specified in the parameter (Which will be another program) to
exec() will replace the entire process – including all threads
(process ID remains same)
#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
int main()
{
printf(“Hello!\n PID=%d \n”,getpid());
return 0;
}

Compile: gcc 1.c


Run: ./[Link]

Hello!
PID:5523
#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
int main()
{
fork();
printf(“Hello!\n PID=%d \n”,getpid());
return 0;
}

Compile: gcc 1.c


Run: ./[Link]

Hello!
PID:5523

Hello!
PID:5524
First fork()
#include <stdio.h>
P1 -> P2
#include <sys/types.h>
#include <unistd.h>
Second fork()
int main()
P1->P3
{
P2->P4
fork();
fork();
Third fork()
fork();
P1 -> P5
printf(“Hello!\n PID=%d \n”,getpid());
P2->P6
return 0;
P4->P7
}
P3->P8

Compile: gcc 1.c


Run: ./[Link]

Executed 8 times with different PID


#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
int main()
{
printf(“ PID=%d \n”,getpid());
char *args[]={“Hello”, “Welcome”,NULL};
execv(“./3”,args);
printf(“Back to 2.c”);
return 0;
}
#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
int main(int argc, char *argv[])
{
printf(“We are in 3.c \n”);
printf(“PID of 3.c = %d\n”,getpid());
return 0;
}
Compile: gcc 2.c -o out1
gcc 3.c –o out2
Run: ./out1
PID = 5523
We are in 3.c
PID of 3.c =5523 # i.e.. Content of 3.c is replaced in 2.c, process ID remains Same
# Back to 2.c statement from 2.c is not printed.
 Semantics of fork() and exec() system calls change in a
multithreaded program
 Thread cancellation of target thread
 Asynchronous or deferred

 Signal handling
 Synchronous and asynchronous

 Thread pools
 Scheduler Activations
 Semantics of fork() and exec() system calls change in a
multithreaded program

 Issues:
 If one thread in a program calls fork(), does the new process duplicate all
threads, or is the new process single-threaded

 Solution:
 Some UNIX systems have chosen to have two versions of fork(), one that
duplicates all threads in the process and another that duplicates only the
thread that invoked the fork() system call

But which version of fork() to use and when?


 Depends on the application.
 Then duplicating all threads is unnecessary, as the program
specified in the parameters to exec() will replace the process

 Inthis instance, duplicating only the calling thread is


appropriate.
 Then the separate process should duplicate all threads
 Terminating or killing a thread before it has completed
 If multiple threads are concurrently searching for the database and
one thread returns the result, the remaining threads might be
canceled
 When a user presses a button on a web browser that stops a web page
from loading any further(all threads images, text, etc.) all threads
loading the page are canceled
 Thread to be canceled is target thread
 Cancellation of target threads may occur in two different scenarios:
 Asynchronous cancellation: One thread immediately terminates the
target thread. Target thread has no power.
 Deferred cancellation: The target thread periodically checks whether it
should terminate, allowing it an opportunity to terminate itself in an
ordinary fashion. Target thread has power
 Pthread code to create
and cancel a thread:
 In situations where
 Resources have been allocated to a canceled thread
 Often the OS will reclaim system resources from a canceled thread but some
cases will not reclaim all resources. Therefore, cancelling a thread
asynchronously may not free a necessary system-wide resource.
 A thread is canceled while in the midst of updating data it is sharing with
other threads

Good way:
Differed cancellation allows thread to check whether it should be canceled
at a point when it can be canceled safely.
 Invoking thread cancellation requests cancellation, but actual
cancellation depends on thread state

 If thread has cancellation disabled, cancellation remains


pending until thread enables it
 Default type is deferred
 Cancellation only occurs when thread reaches cancellation point
 I.e. pthread_testcancel()
 Then cleanup handler is invoked

 On Linux systems, thread cancellation is handled through


signals
n Signals are used in UNIX systems to notify a process that a
particular event has occurred.
n A signal handler is used to process signals
1. Signal is generated by particular event
2. Signal is delivered to a process
3. Signal is handled by one of two signal handlers:
1. default
2. user-defined

n Examples of synchronous signals include illegal memory access or


1 divide by 0. A signal is generated to the process on these event
n Every signal has default handler that kernel runs when handling
signal
l User-defined signal handler can override default
l For single-threaded, signal delivered to process
n Handling signal in single threaded program is straight
forward. Signals are always delivered to the process.
n Where should a signal be delivered for multi-threaded?
l Deliver the signal to the thread to which the signal applies
l Deliver the signal to every thread in the process
l Deliver the signal to certain threads in the process
l Assign a specific thread to receive all signals for the process
 Growing in popularity as numbers of threads increase, program
correctness more difficult with explicit threads
 Creation and management of threads done by compilers and run-
time libraries rather than programmers
 Three methods explored
 Thread Pools
 OpenMP
 Grand Central Dispatch

 Other methods include Microsoft Threading Building Blocks (TBB),


[Link] package
 Create a number of threads in a pool where they await work
 Advantages:
 Usually slightly faster to service a request with an existing thread than
create a new thread
 Allows the number of threads in the application(s) to be bound to the
size of the pool
 Separating task to be performed from mechanics of creating task
allows different strategies for running task
 [Link] could be scheduled to run periodically

 Windows API supports thread pools:


 Set of compiler directives and an API for C,
C++, FORTRAN
 Provides support for parallel programming
in shared-memory environments
 Identifies parallel regions – blocks of code
that can run in parallel
#pragma omp parallel
Create as many threads as there are cores
#pragma omp parallel for
for(i=0;i<N;i++) {
c[i] = a[i] + b[i];
}
Run for loop in parallel
 Apple technology for Mac OS X and iOS operating systems
 Extensions to C, C++ languages, API, and run-time library
 Allows identification of parallel sections
 Manages most of the details of threading
 Block is in “^{ }” - ˆ{ printf("I am a block"); }

 Blocks placed in dispatch queue


 Assigned to available thread in thread pool when removed from
queue
 Two types of dispatch queues:
 serial – blocks removed in FIFO order, queue is per process, called
main queue
 Programmers can create additional serial queues within program
 concurrent – removed in FIFO order but several may be removed
at a time
 Three system wide queues with priorities low, default, high
 Thread-local storage (TLS) allows each thread to have its own
copy of data
 Useful when you do not have control over the thread creation
process (i.e., when using a thread pool)
 Different from local variables
 Local variables visible only during single function invocation
 TLS visible across function invocations

 Similar to static data


 TLS is unique to each thread
 Many systems implementing many to many or
two level model place an intermediate data
structure between user and kernel threads
 To the user thread library, the LWP appears to be
a virtual processor on which the application can
schedule a user thread to run
 Each LWP is attached to a kernel thread and it is
kernel threads that the OS schedules to run on
the physical processor.
 If the kernel thread blocks(waiting for an I/O
operation to complete), the LWP blocks as well.
Up the chain the user level thread attached to the
LWP also blocks
 One scheme for communication between user
thread library and kernel is known as scheduler
activation.
 The kernel provides an application with a set of virtual processors
(LWPs), and the application can schedule user threads onto an
available virtual processor.
 Moreover, the kernel must inform an application about certain events.
This procedure is known as an upcall.
 Upcalls are handled by the thread library with an upcall handler, and
upcall handlers must run on a virtual processor.
 One event that triggers an upcall occurs when an application thread
is about to block. In this scenario, the kernel makes an upcall to the
application informing it that a thread is about to block and identifying
the specific thread. Then the kernel allocates a new virtual processor
to the application.
 The application runs an upcall handler on this new virtual processor,
that saves the state of the blocking thread and relinquishes / call off
the virtual processor on which the blocking thread is running.
 Another thread that is eligible to run on the new virtual processor is
scheduled then by the upcall handler.
 Whenever the event that the blocking thread was waiting for occurs,
the kernel makes another upcall to the thread library informing it that
the previously blocked thread is now eligible to run.
 A virtual processor is also required for The upcall handler for this
event, and the kernel may allocate a new virtual processor or
preempt one of the user threads and then run the upcall handler on
its virtual processor.
 The application schedules an eligible thread to run on an available
virtual processor, after marking the unblocked thread as eligible to
run.
 Windows Threads
 Linux Threads
 Windows implements the Windows API – primary API for Win
98, Win NT,Win 2000, Win XP, and Win 7
 Implements the one-to-one mapping, kernel-level
 Each thread contains
 A thread id
 Register set representing state of processor
 Separate user and kernel stacks for when thread runs in user mode
or kernel mode
 Private data storage area used by run-time libraries and dynamic
link libraries (DLLs)
 The register set, stacks, and private storage area are known as
the context of the thread
 The primary data structures of a thread include:
 ETHREAD (executive thread block) – includes pointer to process to
which thread belongs and to KTHREAD, in kernel space
 KTHREAD (kernel thread block) – scheduling and synchronization
info, kernel-mode stack, pointer to TEB, in kernel space
 TEB (thread environment block) – thread id, user-mode stack, thread-
local storage, in user space
 Linux refers to them as tasks rather than threads
 Thread creation is done through clone() system call
 clone() allows a child task to share the address space of the
parent task (process)
 Flags control behavior

 struct task_struct points to process data structures


(shared or unique)

You might also like