Chapter 1 -Introduction to Parallel Processing
Parallel machines provide a wonderful opportunity for applications with large computational requirements. Effective use of these machines, though, requires a keen understanding of how they work. This chapter provides an overview of both the software and hardware.
1.1 Platforms
For parallel computing, one must always keep in mind what hardware and software environments we will be working in. Our hardware platforms here will be multicore, GPU and clusters. For software we will use C/C++, OpenMP, MPI, CUDA and R.
1.1.1 Why R?
Many algorithms are just too complex to understand or express easily in C/C++. So, a scripting language will be very handy, and R has good parallelization features (and is a language I use a lot).
Appendix C presents a quick introduction to R.
1.2 Why Use Parallel Systems?
1.2.1 Execution Speed
There is an ever-increasing appetite among some types of computer users for faster and faster machines. This was epitomized in a statement by the late Steve Jobs, founder/CEO of Apple and Pixar. He noted that when he was at Apple in the 1980s, he was always worried that some other company would come out with a faster machine than his. But later at Pixar, whose graphics work requires extremely fast computers, he was always hoping someone would produce faster machines, so that he could use them!
A major source of speedup is the parallelizing of operations. Parallel operations can be either within-processor, such as with pipelining or having several ALUs within a processor, or between- processor, in which many processor work on different parts of a problem in parallel. Our focus here is on between-processor operations.
For example, the Registrar’s Office at UC Davis uses shared-memory multiprocessors for processing its on-line registration work. Online registration involves an enormous amount of database computation. In order to handle this computation reasonably quickly, the program partitions the work to be done, assigning different portions of the database to different processors. The database field has contributed greatly to the commercial success of large shared-memory machines.
As the Pixar example shows, highly computation-intensive applications like computer graphics also have a need for these fast parallel computers. No one wants to wait hours just to generate a single image, and the use of parallel processing machines can speed things up considerably. For example, consider ray tracing operations. Here our code follows the path of a ray of light in a scene, accounting for reflection and absorbtion of the light by various objects. Suppose the image is to consist of 1,000 rows of pixels, with 1,000 pixels per row. In order to attack this problem in a parallel processing manner with, say, 25 processors, we could divide the image into 25 squares of size 200×200, and have each processor do the computations for its square.
Note, though, that it may be much more challenging than this implies. First of all, the computation will need some communication between the processors, which hinders performance if it is not done carefully. Second, if one really wants good speedup, one may need to take into account the fact that some squares require more computation work than others. More on this below.
We are now in the era of Big Data, which requires Big Computation, thus again generating a major need for parallel processing.