0% found this document useful (0 votes)
9 views2 pages

Parallel and Distributed Computing Course

The course CSPD-407 on Parallel and Distributed Computing aims to teach students the principles, algorithms, and applications of parallel and distributed systems. It covers topics such as shared memory programming with OpenMP, distributed memory programming, and advanced concepts in parallel computing. The course includes a detailed weekly schedule and recommended textbooks for further study.

Uploaded by

Huzaifa Awan
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views2 pages

Parallel and Distributed Computing Course

The course CSPD-407 on Parallel and Distributed Computing aims to teach students the principles, algorithms, and applications of parallel and distributed systems. It covers topics such as shared memory programming with OpenMP, distributed memory programming, and advanced concepts in parallel computing. The course includes a detailed weekly schedule and recommended textbooks for further study.

Uploaded by

Huzaifa Awan
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Parallel and Distributed Computing

Semester Credit Hours Prerequisite


[BSCS-8] [3+0] [CSOS-347]

Course Code: CSPD-407

Course Description
The goal of this course is to introduce students to the principles and paradigm of parallel and distributed
systems, algorithm and applications.

Course Learning Outcomes (CLOs)


The course learning outcome along with domain and BT level are listed below

S. # CLO STATEMENT DOMAIN BT LEVEL PLO


Learn about the different parallel C2
CLO-1 architecture, profiling and parallelization of C 2
code Comprehension

Analytical modeling and performance of C4


CLO-2 C 4
parallel programs Analysis
Analyze complex problems with shared C3
CLO-3 C 3
memory programming with OpenMP Application
* BT= Bloom’s Taxonomy, C=Cognitive Domain, P=Psychomotor Domain, A= Affective
Domain

Course Materials
This course introduces the following topics to students:
 Introduction to parallel and distributed systems, tools, languages, architectural support
from the application side
 Analysis and profiling of applications
 Shared memory concepts like Threads and OpenMP,
 Distributed memory point to point collectives, Parallel and Distributed Programming
Paradigms
 Parallel and Distributed Algorithms
 Applications of Parallel and Distributed Computing, Multi-core, Client-server, GPU
 Heterogeneous Computing
 Advanced topics in Parallel & Distributed System

Course Weekly Schedule


The course schedule for 16 weeks is detailed below
Week Topic
1 Introduction to parallel and distributed computing, Flynn’s Classical Taxonomy and
general parallelism terminologies
2 Platforms for parallel programming and types of parallelism, Amdahl’s Law and
profiling
3 Parallel Memory Architectures, Parallel programming models
4 Dependence Analysis
5 Designing parallel programs
6 Inter-process Communication, Message Passing System
7 Introduction to Multithreading, C++ Threads and Design Patterns
8 Shared Memory Parallel Programming: OpenMP,
9 Programming with OpenMP
10 Distributed memory parallel programming, Heterogeneous distributed systems
11 Message Passing Interface (MPI)
12 GPU based Computing, Introduction to CUDA
13 Concurrency Control
14 Fault Tolerance
15 Asynchronous/synchronous computation/communication
16 Advanced topics in parallel and Distributed computing

Recommended Textbooks

1. Principles of Parallel Programming, Lin, C. & Snyder, C., 1st Edition (2008), Addison-
Wesley
2. Distributed Systems: Concepts and Design, Coulouris, G., Dollimore, J. & Kindberg, T.,
5th Edition, Addison-Wesley.
3. Parallel programming: For multicore and cluster systems. Rauber, Thomas, and Gudula
Rünger. Springer Science & Business Media, 2013.

Common questions

Powered by AI

In shared memory systems, inter-process communication occurs via variables that are accessible to multiple processes, using constructs like locks and semaphores to ensure synchronization and avoid race conditions. This model promotes direct data sharing and often results in more efficient data transfer. In contrast, message-passing systems utilize explicit communication where processes exchange data through messages over a network. This model is inherently scalable and suitable for distributed systems, but often introduces latency and increased complexity in programming due to the need for establishing communication links and handling message protocols .

Flynn's Classical Taxonomy is significant as it provides a framework to classify computer architectures based on the number of concurrent instruction and data streams. The taxonomy includes four classes: Single Instruction Single Data (SISD), Single Instruction Multiple Data (SIMD), Multiple Instruction Single Data (MISD), and Multiple Instruction Multiple Data (MIMD). This classification helps in understanding the processing capabilities and the nature of parallelism in different computer architectures, which is crucial for designing efficient parallel and distributed systems .

In a heterogeneous computing environment, different parallel programming paradigms like multi-core, client-server, and GPU architectures complement each other by leveraging their unique strengths. Multi-core architectures allow parallel execution of tasks on multiple CPUs in close proximity, suitable for tasks requiring low-latency synchronization. Client-server paradigms delegate tasks to distributed servers, providing robustness and scalability across networks. GPU architectures excel in handling highly parallel tasks with large data sets due to their massive number of cores optimized for vector operations. Together, they create a versatile ecosystem where tasks are allocated to the most suitable architecture, optimizing resource usage and performance .

Amdahl's Law applies to parallel computing by providing a formula to predict the theoretical maximum speedup of a task as a result of parallelizing a portion of the workload. The law illustrates that the potential speedup is limited by the sequential portion of the task. According to Amdahl's Law, if P is the proportion of the program that can be parallelized, then the theoretical speedup is given by 1/((1-P) + P/N), where N is the number of processors. The law implies that the speedup is limited as the number of processors increases because the performance is constrained by the portion of the task that cannot be parallelized .

A detailed week-by-week topic outline provides a structured learning path by progressively introducing concepts and techniques in a logical sequence, allowing for cumulative knowledge building. In the context of parallel and distributed computing, this approach ensures foundational concepts such as basic architectures and terminologies are understood before advancing to more complex topics like specific programming models, concurrency control, or fault tolerance. This scaffolding helps students develop a comprehensive understanding, which is crucial for mastering the intricate subject matter .

Shared memory programming models allow multiple processors to access the same memory location, facilitating inter-process communication through shared variables. This model simplifies the development of parallel applications by providing a unified memory space, commonly implemented through threading, such as using OpenMP. Conversely, distributed memory models require each processor to have its own private memory and communicate by passing messages across a network. This approach, used commonly with MPI, provides scalability and is suited for distributed systems, but adds complexity to the application development due to the necessity of explicit message-passing .

OpenMP is instrumental in parallel programming as it provides a set of compiler directives, runtime library routines, and environment variables designed to simplify implementing parallelism in programs that run on shared memory architectures. It supports the shared memory model by allowing developers to easily parallelize parts of a program using pragma directives, enabling multi-threading within a single address space. OpenMP handles the creation, synchronization, and management of threads, thus abstracting the complex underlying operations from the programmer and enhancing productivity .

Concurrency control is significant in parallel and distributed computing as it ensures correct execution order of operations among concurrent processes, preventing conflicts, data races, and ensuring data consistency. It is implemented using various techniques such as locking mechanisms, transactional memory, timestamp ordering, and optimistic concurrency control. These techniques coordinate access to shared resources, ensuring that parallel execution does not lead to incorrect program behavior or system instability .

CUDA, Compute Unified Device Architecture, is a parallel computing platform and application programming interface (API) model created by NVIDIA for general computing on its own GPUs. CUDA enhances GPU-based computing by allowing developers to harness the full power of GPUs for parallel computing tasks. It provides extensions to programming languages like C/C++, enabling complex computations to run on GPUs, which are highly suited for parallel tasks due to their architecture. CUDA facilitates the execution of numerous concurrent threads, significantly accelerating the speed of parallel computations .

Designing fault-tolerant distributed systems presents challenges such as ensuring consistency across nodes, maintaining data integrity, and recovering from partial system failures without significant impact on the service. Fault tolerance often requires redundancy, additional computational overhead for monitoring, and elaborate error-handling mechanisms, all of which can impact system performance by introducing latency and increasing resource consumption. Balancing performance and fault tolerance is critical to ensure reliability without degrading system efficiency significantly .

You might also like