Saylor Academy

Unit 8: Parallel Processing
This unit will address several advanced topics in computer architecture, focusing on the reasons for and the consequences of the recent switch from sequential processing to parallel processing by hardware producers. You will learn that parallel programming is not easy and that parallel processing imposes certain limitations in performance gains, as seen in the well-known Amdahl's law. You will also look into the concepts of shared memory multiprocessing and cluster processing as two common means of improving performance with parallelism. The unit will conclude with a look at some of the programming techniques used in the context of parallel machines.

Completing this unit should take you approximately 16 hours.
- Unit 8 Learning Outcomes Page
- 8.1: The Reason for the Switch to Parallel Processing
  - Eijkhout, Chow, and van de Geijn's "Introduction to High-Performance Scientific Computing, Chapter 1: Sequential Computer Architecture" URL
    Read section 1.3 of Chapter 1 on pages 23 and 24 to learn about multi-core chips. These two pages give a summary of processor and chip trends to overcome the challenge of increasing performance and addressing the heat problem of a single core.
- 8.2: Limitations in Parallel Processing: Amdahl's Law
  - Eijkhout, Chow, and van de Geijn's "Introduction to High-Performance Scientific Computing, Chapter 2: Parallel Computer Architecture" URL
    Read sections 2.1 and 2.2 of Chapter 2 on pages 41-45 to learn about parallel computer architectures. There are different types of parallelism: there is instruction-level parallelism, where a stream of instructions is simultaneously in partial stages of execution by a single processor; there are multiple streams of instructions, which are simultaneously executed by multiple processors. The former was addressed in Unit 5. The latter is addressed in this reading. A quote from the beginning of the chapter states the key ideas: "In this chapter, we will analyze this more explicit type of parallelism, the hardware that supports it, the programming that enables it, and the concepts that analyze it." This reading begins with a simple scientific computation example, followed by a description of SISD, SIMD, MISD, and MIMD architectures.
  - Ariel Ortiz Ramírez's "Parallelism and Performance" URL
    Study the section titled "Amdahl's Law." Amdahl's law explains the limitations to performance gains achievable through parallelism. Over the last several decades or so, increases in computer performance have largely come from improvements to current hardware technologies and less from software technologies. Now, however, the limits to these improvements may be near. For significant continued performance improvement, either new physical technology needs to be discovered and/or transitioned to practice, or software techniques will have to be developed to get significant gains in computing performance.
    In the equation for Amdahl's law, P is the fraction of code that can be parallelized, i.e., that must be executed serially; S is the fraction of code that cannot be parallelized; and n is the number of processors. Note P + S is 1. If there are n processors, then P + S can be executed in the same time that P/n + S can be executed. Thus, the ratio of the time using 1 processor to the time of using n processors is 1/(P/n + S). This is the speedup in going from 1 processor to n processors.
    Note that the speedup is limited, even for large n. If n is 1, the speedup is 1. If n is very large, then the speedup is 1/S. If P is very small, then P/n is even smaller, and P/n + S is approximately S, i.e., the speedup is 1/S, but S is approximately S + P, which is 1. Therefore, the speed of execution of this code using 1 processor is about the same as using n processors.
    Another way of writing Amdahl's law is 1/(P/n + [1 - P]). Thus, if P is close to 1, the speedup is 1/(P/n) or n/P, which is approximately n.
    Apply Amdahl's law to better understand how it works by substituting a variety of numeric values into this equation and sketching the graph of the equation.
  - Blaise Barney's "Introduction to Parallel Computing, Chapter 6, Section 10: Limits and Costs of Parallel Programming" URL
    In section 10 of Chapter 6, study the section titled "Amdahl's Law" up to the section titled "Complexity."
- 8.3: Shared Memory and Distributed Memory Multiprocessing
  - Multiprocessing URL
    Study these slides. This reading focuses on the problem of parallel software. It discusses scaling, uses a single example to explain shared memory and message passing, and identifies problems related to cache and memory consistency.
- 8.4: Multicore Processors and Programming with OpenMP and MPI
  - Eijkhout, Chow, and van de Geijn's "Introduction to High-Performance Scientific Computing, Chapter 2: Parallel Computer Architecture" URL
    Read section 2.5 of Chapter 2 on pages 52-68. The reading covers two extreme approaches to parallel programming. First, parallelism is handled by the lower software and hardware layers. OpenMP is applicable in this first case. Secondly, parallelism is handled by the programmer. MPI is applicable in the second case.
  - Norm Matloff's "Programming on Parallel Machines" URL
    Read Chapter 1 on pages 1-20. If you go to the table of contents, selecting the section will jump you to the desired page to avoid scrolling through the text. Chapter 1 uses a matrix times (multiplication) vector example in section 1.3.1. This chapter goes on to describe parallel approaches for computing a solution: section 1.3.2 describes a shared-memory and threads approach; section 1.3.3 describes a message passing approach; section 1.3.4 describes the MPI and R language approach. Study these sections to get an overview of the idea of software approaches to parallelism.
    Read Chapter 2 on pages 21 - 30. This chapter presents issues that slow the performance of parallel programs.
    Read Chapter 3 on pages 31 - 66 to learn about shared memory parallelism. Parallel programming and parallel software are extensive topics and our intent is to give you an overview of them; more in depth study is provided by the following chapters.
    Read Chapter 4 on pages 67 - 100. This chapter discusses MP directives and presents a variety of examples.
    Read Chapter 5 on pages 101 - 136. This chapter presents GPUs (Graphic Processing Units) and the CUDA language. This chapter also discusses parallel programming issues in the context of GPUs and CUDA and illustrates them with various examples.
    Read Chapter 7 on pages 161 - 166. This chapter illustrates the message passing approach using various examples.
    Read Chapter 8 on pages 167 - 169 for a description of MPI (Message Passage Interface), which applies to networks of workstations (NOWs). The rest of the chapter illustrates this approach with various examples.
    Read Chapter 9 on pages 193 - 206 for an overview of cloud computing and the hadoop platform, which are interesting topics for today not just for parallel computing.
    Lastly, read section 10.1 of Chapter 10 on pages 207 and 208, which explains what R is.
    The rest of the chapters of the text and the four appendices cover other interesting topics. These chapters and the appendices are optional.
- Unit 8 Assessment
  - Unit 8 Assessment Quiz
    Please take this assessment to check your understanding of the materials presented in this unit.
    
    Notes:
    
    There is no minimum required score to pass this assessment, and your score on this assessment will not factor into your overall course grade.
    
    This assessment is designed to prepare you for the Final Exam that will determine your course grade. Upon submission of your assessment you will be provided with the correct answers and/or other feedback meant to help in your understanding of the topics being assessed.
    
    You may attempt this assessment as many times as needed, whenever you would like.

CS301: Computer Architecture

Unit 8: Parallel Processing

8.1: The Reason for the Switch to Parallel Processing

8.2: Limitations in Parallel Processing: Amdahl's Law

8.3: Shared Memory and Distributed Memory Multiprocessing

8.4: Multicore Processors and Programming with OpenMP and MPI

Unit 8 Assessment