Amdahl's Law

Java Parallel Programming Part 1: Introduction

The first question you have is probably: What is parallel programming and why? Parallel programming or computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved at the same time. In short, its main aims are to:

  • Increase speed
  • Process huge amount of data
  • Solve problems in real time
  • Solve problems in due time

Why now?

Moore’s Law

People argue whether Moore’s Law still holds. Moore’s law is the observation that the number of transistors in a dense integrated circuit has doubled approximately every two years (some say 18 months). Moore’s Law is named after Gordon E. Moore, the co-founder of Intel and Fairchild Semiconductor. It is this continuous advancement of integrated circuit technology that has brought us from the original 4.77 Mhertz PC to the current multi Ghertz processors. The processor architecture has also changed a lot with multiple execution pipelines, out-of-order execution, etc.

Assuming Moore’s Law still applies, we are still faced with big problems in improving our single CPU performance:

  • The Power Wall

Power = C * Vdd ^ 2 * Frequency

We cannot scale transistor count and frequency without reducing Vdd (supply voltage)

Voltage scaling has already stalled

  • The Complexity Wall

Debugging and verifying large OOO (Out-Of-Order) cores is expensive (100s of engineers for 3-5 years)

Caches are easier to design but can only help that much…

As an example of the power (frequency) wall, it has been reported that:

  • E5640 Xeon (4 cores @ 2.66 GHz) has a power envelope of 95 watts
  • L5630 Xeon (4 Cores @ 2.13 GHz) has apower envelope of 40 watts

This means 137% more electrical power for 24% more CPU power. It is not going to scale.

Enter multi-core design. A multi-core processor implements multi-processing in a single physical package. Instead of cranking up the frequency to achieve higher performance, more cores are put in a processor so that programs can be executed in parallel to gain performance. These days all INTEL processors are multi-core. Even the processors used in mobile phone are all multi-core processors.

How much improvements can I expect for my application to gain running on a multi-core processor? The answer is: it depends. You application may not have any performance gain at all if it has not been designed to take advantage of multi-core capability. Even if it does, it still depends on the nature of your program and the algorithm it is using.

Amdahl’s Law

Amdahl’s law states that if P is the proportion of a program that can be made parallel, and (1−P) is the proportion that cannot be parallelised, then the maximum speedup that can be achieved by using N processors = 1/[(1-P) + (P/N)]

The speedup in relation to the number of cores or processors at specific values of P is shown in the graph at the top of this post. The following table gives the best case improvement at specific values of P.

Proportion of code can be made Parallel (P)

Maximum Speedup (number of Times)

0.1

1.11

0.2

1.25

0.3

1.43

0.4

1.67

0.5

2.00

0.6

2.50

0.7

3.33

0.8

5.00

0.9

10.00

0.95

20.00

This gives you some perspectives on how much performance you may be able to gain by writing your program to take advantage of parallelism instead og having unreal expectations.

Why Parallel Programming in Java?

I prefer to write parallel programs in Java for the following reasons:

  • Write once, run anywhere
  • Large pool of software developers
  • OO programming abstractions
  • Compile time and runtime checking of code
  • Automatic garbage collection
  • Supports multi-threading in language
  • Rich collection of libraries

Java supports multi-threading since version 1, so what is new? Java multi-threading uses the Shared Memory Model meaning that it cannot be scaled to use multiple machines using the Distributed memory model on its own.

Distributed Memory refers to a multiprocessor computer system in which each processor has its own private memory. Computational tasks can only operate on local data, and if remote data is required, the computational task must communicate with one or more remote processors. In contrast, a Shared Memory multiprocessor offers a single memory space used by all processors. Processors do not have to be aware where data resides, except that there may be performance penalties, and that race conditions are to be avoided.

Future Posts on the Subject

In the forthcoming posts on the subject, I shall delve into the programming side using different mechanisms and techniques to implement a solution to the same problem for both Shared memory and Distributed Memory Models. The sub-topics for each category are listed below.

Shared Memory Model

  • Java SE Tools
  • Java EE Tools
  • 3rd Party Libraries

Distributed Memory Model

  • Building a HPC
  • 3rd Party Libraries
  • JBoss Middleware

So Stay Tuned!