Mandelbrot #2

Java Parallel Programming Part 2: Fork/Join

This is the first of three posts on Java Parallel Programming using a Shared Memory Model after my introductory post on the subject . In this article, I shall show you how to use Java 7’s Fork/Join framework for parallel programming. As you are aware, multi-threading has been available since Java was first introduced in the last millennium. What’s the fuss regarding Fork/Join then? According to Herbert Schildt in his book “Java The Complete Reference Eighth Edition” (ISBN: 978-0-07-160631-8) on page 1077:


...multithreading is used to allow two or more tasks to share the CPU. This type of multithreading is typically supported by an object of type Thread (as described in Chapter 11). Although this type of multithreading will always remain quite useful, it was not optimized for situations in which two or more CPUs are available (multicore computers)...

In contrast to the usual multithreading model, the Fork/Join Framework introduced in Java 7 automatically scales to make use of multiple processors available. In addition, it simplifies the creation and use of multiple threads as you will see in the code segments later on.

Java 7 Fork/Join Framework

During execution of a parallel program written using the Fork/Join framework, execution starts out sequentially, branches off in parallel at designated points in the program and join at a subsequent point and resume sequential execution. Parallel sections of a program forks recursively until a certain task size is reached. The Fork/Join framework uses recursive decomposition: dividing a problem into subproblems, solving a subproblem sequentially to produce a partial result, and combining the partial results into a complete solution.

One interesting fact is that Fork/Join uses work-stealing technique for parallel execution according to the Java documentation:


“...An ExecutorService (is used) for running ForkJoinTasks. A ForkJoinPool provides the entry point for submissions from non-ForkJoinTask clients, as well as management and monitoring operations.

A ForkJoinPool differs from other kinds of ExecutorService mainly by virtue of employing work-stealing: all threads in the pool attempt to find and execute subtasks created by other active tasks (eventually blocking waiting for work if none exist). This enables efficient processing when most tasks spawn other subtasks (as do most ForkJoinTasks). When setting asyncMode to true in constructors, ForkJoinPools may also be appropriate for use with event-style tasks that are never joined...”

Mandelbrot Generation

The problem we are solving using different Java parallel programming approaches in this series is Mandelbrot generation. First the theoretical. The Mandelbrot set is the set of values of c in the complex plane for which the orbit of 0 under iteration of the complex quadratic polynomial:

remains bounded. That is, a complex number c is part of the Mandelbrot set if, when starting with z0 = 0 and applying the iteration repeatedly, the absolute value of zn remains bounded however large n gets.

Mandelbrot set images are made by sampling complex numbers and determining for each number whether the result tends towards infinity when the iteration of a particular mathematical operation is performed. The real and imaginary parts of each number are converted into image coordinates for a pixel colored according to how rapidly the sequence diverges, if at all. The Java parallel program assigns each available core to compute one vertical slice of the Mandelbrot set image at a time. Consequently, the more cores are available, the more work can be performed in parallel, limited only by Amdahl’s Law.

Writing Fork/Join Code

To use Fork/Join, you need to create a subclass based on either RecursiveAction and RecursiveTask. The difference between the two is that the former does not return a result while the latter does. The subclass must implement the abstract method compute(). This is shown is the code snippet below:


package com.kardinia.forkjoin;


import java.awt.Color;
import java.util.concurrent.ForkJoinPool;
import java.util.concurrent.RecursiveAction;

import com.kardinia.imaging.Picture;
import com.kardinia.math.Complex;
import com.kardinia.math.MandelbrotHelper;

/**
 * Mandelbrot uses the fork/join parallel processing capability of Java to utilise the power of  multicore processors
 *
 */
public class MandelbrotFJ extends RecursiveAction {

    ...

    public MandelbrotFJ(double x, double y, double size, int width, int height, int iterations, int threshold,
            int column, int interval, int[][] buffer) {
        this.buffer = buffer;
        this.width = width;
        this.height = height;
        this.iterations = iterations;
        this.pointX = x;
        this.pointY = y;
        this.stepSize = size;
        this.column = column;
        this.interval = interval;
        this.threshold = threshold;
        
        System.out.println("column=" + column + " interval=" + interval);
    }

    @Override
    // split computation into segments
    protected void compute() {
        if (interval <= threshold) {
            computeNow();
            return;
        }

        // split problem into two to be executed in parallel
        int newInterval = interval / 2;
        invokeAll(new MandelbrotFJ(pointX, pointY, stepSize, width, height, iterations, threshold, column, newInterval, buffer),
                new MandelbrotFJ(pointX, pointY, stepSize, width, height, iterations, threshold,
                        column + newInterval, interval - newInterval, buffer));
    }

    ...


}

The subclass is named MandelbrotFJ and compute() is the place where Fork/Join uses recursive decomposition to execute code in parallel. In this Mandelbrot generation implementation, the image is divided into vertical stripes for each task to compute. You will notice that when the interval (size of the stripe) is less than or equal a particular size (threshold), it will compute that stripe immediately. Otherwise, it will create 2 instances of MandelbrotFJ and let each instance handle one-half of the remaining stripe. ComputeNow() is the method for calculating and determining if a point is inside the Mandelbrot set and the result saved in the buffer array. Since there is no overlap in calculation (each MandelbrotFJ is assigned to compute non-intersecting area of the image), no synchronisation is needed. The buffer array is passed from the calling program.



    // compute Mandelbrot
    protected void computeNow() {
        for (int i = column; i < interval + column; i++) {
            for (int j = 0; j < height; j++) {
                
                double x = pointX - stepSize / 2 + stepSize * i / width;
                double y = pointY - stepSize / 2 + stepSize * j / height;
                Complex z0 = new Complex(x, y);
                // colour is mapped to number of iterations
                buffer[i][j] = MandelbrotHelper.mandelbrotPoint(z0, iterations);
            }
        }
    }

To execute your Fork/Join class, you need to create a ForkJoinPool and then use it to invoke your MandelbrotFJ instance as shown below in the static method generate(…).


    // generate the Mandelbrot at point and step size specified for a specified screen size
    public static void generate(double x, double y, double stepSize, Color[] colours, int width, int height, int iterations,
            int threshold) {
        // print image size
        System.out.println("Image size: " + width + "-by-" + height);
        
        // buffer to hold Mandelbrot computation results for image generation later
        int[][] buffer = new int[width][height];

        // invoke Mandelbrot
        MandelbrotFJ mandelbrot = new MandelbrotFJ(x, y, stepSize, width, height, iterations, threshold, 0, width, buffer);

        ForkJoinPool pool = new ForkJoinPool();

        // print execution time
        long startTime = System.currentTimeMillis();
        pool.invoke(mandelbrot);
        long endTime = System.currentTimeMillis();

        System.out.println("Generation of Mandelbrot took " + (endTime - startTime) + 
                " milliseconds.");

        // create mandelbrot display
        MandelbrotHelper.display(width, height, colours, buffer);

    }

main() handles the house-keeping tasks by reading from the command line parameters for the execution including the starting x and y position, step size, number of threads to be used. It also reads in the colour map which has a different colour for different number of iterations that a point calculation in computeNow() has taken to decide whether that point is in the Mandelbrot set.


    // main program to invoke the Mandelbrot
    public static void main(String[] args) throws Exception {
        final int WIDTH = MandelbrotHelper.DEFAULT_WIDTH;
        final int HEIGHT = MandelbrotHelper.DEFAULT_HEIGHT;
   
        // read in mandelbrot parameters and colour map
        double xc   = Double.parseDouble(args[0]);
        double yc   = Double.parseDouble(args[1]);
        double size = Double.parseDouble(args[2]);
        int threads = Integer.parseInt(args[4]);
        
        // read in colour map
        Color[] colours = MandelbrotHelper.readColourMap(args[3]);
        final int ITERS = colours.length;
        System.out.println("Colour array size: " + ITERS);
        
        // print out environment and setting
        System.out.println("Starting point: (" + xc + ", " + yc + "); step size: " + size);
        System.out.println("Colour map \"" + args[3] + "\" read in successfully");


        int processors = Runtime.getRuntime().availableProcessors();
        System.out.println(Integer.toString(processors) + " processor"
                + (processors != 1 ? "s are " : " is ")
                + "available");

        // generate Mandelbrot - no need splitting work into more chunks than number of processors
        generate(xc, yc, size, colours, WIDTH, HEIGHT, ITERS, WIDTH / threads );

        
    }

Performance

The parallel program written using Java 7 Fork/Join was executed mutliple times for 1, 2, 4 and 8 cores. The results are shown i the graph below:

Performance Graph
Performance Graph

 

The performance gain (compared to using 1 core) is not a straight line with constant slope. There may be multiple reasons for this:

  1. sampling size – may be the number of samples is not big enough
  2. background tasks – the computer is running background tasks
  3. Amdahl’s Law – there is an upper limit to parallel program performance which varies based on the portion of code that can be parallelised and the portion of code that must be run serially as described in my previous post
  4. big.LITTLE architecture – my machine is an ARM octa-core machine with 4 X A15 and 4 X A7 cores. The A15 cores are much faster than the A7 cores. Since the load is spread evenly over all 8 cores, the slower A7 cores take longer to complete their tasks and slow down the overall performance

Conclusion

Using the Join/Form framework to write parallel programs is much easier than using the legacy Java threads by simplifying the creation and use of multiple threads as shown in the code segments. And the performance increase using multiple cores in parallel computation is noticeable. In the next post, I shall use a Java 8 lambdas and streams implementation for comparison.