My Home Compute Cluster

Home Compute Cluster Revamp – Part 1

Introduction

High Performance Computing or HPC refers to the practice of aggregating computing power in a way that delivers much higher performance than one could get out of a single machine in order to solve large problems in science, engineering, or business. In the past, it was quite costly to build a HPC cluster. But with the ever diminishing cost of hardware, I built one some months ago for experimentation. It is now due for a revamp. But before I describe the revamp details, here is a recap on the existing HPC cluster which I shall refer to as my home Compute Cluster.

Hardware

My home Compute Cluster is built using 4 credit card size Linux computers called ODROID-U3 based on the energy efficient Samsung’s 1.7GHz quad-core ARM Cortex-A9 Exynos 4412 SoC (System On a Chip) with 2 GBytes of RAM from Hardkernel, a Korean company. The Samsung Exynos 4412 SoC can be found in mobile devices including: Samsung Galaxy Note 10.1, Samsung Galaxy Light, Hyundai T7 Tablet, etc. The bill of materials include:

  • 4 X ODROID-U3s,
  • 1 X Ethernet switch,
  • 1 X 10-Amp at 5V power supply
  • 4 X short Ethernet cables

The components are shown in photo below.

Bill of Materials
Bill of Materials

My 4 ODROID-U3 computers are named:

  • odroid-m
  • odroid-s1
  • odroid-s2
  • odroid-s3

Each computer is assigned a static IP address. Odroid-m is the master which also function as compute node. Odroid-s1 through Odroid-s3 are compute nodes. All they do is listen for requests from the master node to start computations.

I custom built a tiny stand to hold the computers, switch and power supply using plywood. The U3 cluster is shown in in the photo at the top of this post.

Test Application and Cluster Performance

I am a Java person, although I could, I do not want to program in C/C++ using MPI (Message Passing Interface) implementations such as MPICH or OpenMPI. Instead I am using MPJ Express, an open source Java message passing library that allows Java application developers to write and execute parallel applications to run on multicore processors and/or HPC clusters. I installed MPJ Express on one computer and duplicated the SD card for use by the other 3 computers. The one requirement for MPJ Express to work in cluster mode is that the master or head node can SSH to all slave or compute nodes without entering a password. This is easily set up using OpenSSH with SSH keys.

In order to test the cluster, I wrote a MPJ Express parallel program to generate a 1024 by 512-pixel Mandelbrot set image at coordinate (0.1015, -0.633) with a maximum iterations of 256 and a step size of 0.01using Java. Mandelbrot set images are made by sampling complex numbers and determining for each number whether the result tends towards infinity when the iteration of a particular mathematical operation is performed. The real and imaginary parts of each number are converted into image coordinates for a pixel colored according to how rapidly the sequence diverges, if at all. My MPJ Express parallel program assigns each available core to compute one vertical slice of the Mandelbrot set image at a time. Consequently, the more cores are available, the more work can be performed in parallel. Mandelbrot images at the mentioned and other coordinates are shown in the following images.

To assess the performance and behaviour of the cluster, I ran the program in MPJ Express Multicore Configuration (shared memory mode) on odroid-m using 1, 2, 3 and 4 cores. Then I ran the program in Hybrid Configuration (cluster of multicore machines) using up to 4 computers and all the 4 cores on each computer.

Here is a brief description of the two configurations used in the tests. MPJ Express Multicore Configuration starts a thread for each available core and uses efficient inter-thread mechanism to communicate. Hybrid Configuration uses multicore configuration for intra-node communication and cluster configuration (Distributed Memory mode) based on Java NIO for inter-node communication.

The performance was recorded and graphed below.

Throughput as # of cores increases
Throughput as # of cores increases

For this particular parallel program, the cluster increases its throughput linearly as more computers/cores are added until around 12 cores where the performance increase plateaued.

Areas for Improvement

It is a valuable learning experience to actually build and program a HPC cluster using MPJ Express to assess its performance and behaviour. After doing it for a while, it was discovered that there is room for improvement on the current setup. They include:

  • Having to copy the MPJ Express program (.jar) file to every compute node before starting to run the parallel program – running the MPJ Express program in the multicore mode is simple. However, running it in the Cluster mode is more cumbersome. I have to copy it to each of the compute nodes. Although I can write a script to do that, it is still an extra step that I have to do. Now that I have an Odroid XU4/CloudCell, I can use it as a NFS server and let all the compute nodes access the MPJ Express program from the NFS server without copying it to each node.
  • The Head Node doubles as a compute node – A head node is a computing system that is configured to act as a middle point between the actual cluster and the outside network. When you are on a head node, you are not actually running or working on the cluster proper. In the current setup, machine odroid-m serves as both head node and compute node. Although there is no hard rules disallowing it, it is still best to have a separation of concerns on the role each node plays.
  • Limited space on the compute node SD cards to install additional applications – I am in the process of writing a multi-part post on “Java Parallel Programming” which covers developing and running parallel programs on both Shared Memory and Distributed Memory systems. I shall be using open source middleware in my examples. Having the compute nodes able to access storage on the NFS server allows me to install additional open source frameworks and middleware solutions on them for experimentation and demonstration purposes.

It appears that all of these improvements depend on the use of a NFS server. In Part 2 of this of this article, I shall describe each of those improvements listed above.

STAY TUNED FOR THE NEXT INSTALLMENT!!!

 

2 thoughts on “Home Compute Cluster Revamp – Part 1”

  1. Hi Andy,

    I am thinking about building an Odroid-based HPC kernel.
    Your post is quite interesting, and I have 2 specific questions:

    1) Power Supply: which one do you use, any specific requirement, and what is the price?
    2) Which OS do you use?

    By the way, this is the first picture I see of such a cluster with the boards in a vertical orientation, which I find quite clever as it optimises heat dissipation by convection.

    Bruno

  2. Bruno,

    Here are the answers to your questions:

    1) I bought the power supply on ebay. I just check, the seller does not carry it any more. For your information, the description says: Single Output Switch Power Supply Driver AC 220V 5V 50W 10A for LED Illuminated. As long as the power supply can supply 5V at 10A (50W), it is fine for 4 X Odroid U3s. It should work for Odroid XU4 too if you don’t connect anything to their USB ports.

    2) I use the Ubuntu that comes with Odroid U3. For me, any Linux distribution or version is fine as long as it supports Java 7 or 8 as I am using MPJ Express to write parallel programs which only requires Java to work.

    Hope this helps. Let me know if you have further questions.

    Andy

Comments are closed.