How to build a supercomputer – part 1

For the past few years an incredible movement has been underway. In sheds, basements and labs, millions of amateurs have put together creations ranging from simple entry-level gadgets to others which previously only belonged in textbooks, university labs and government-funded programs. The rule now seems to be ‘with enough effort, you can make pretty much anything’. This has been hailed by some as the ‘makers’, or the ‘hackers’ revolution. It has come about due to:

 

Some hobbyist’s projects are so remarkable I think they are hoaxes when I see the titles! Building a microscope that can clearly resolve individual atoms, satellites performing actual and valid experiments in orbit as I type this, free eye tracking software to give paralysed people a voice again, UAVs capable of searching for lost hikers, a fully functioning radar.…….

All those and many more which are equally impressive have been built by people like you, multiple times.

With that in mind. I wanted to have a go at something I have a great interest in but had limited knowledge of. I wanted to know how to build a supercomputer.

Don’t get too excited

Don’t worry, I’m not planning on building an all powerful system. The objective of this project was to learn more about how computers use ‘parallel processing’ to speed up calculations.

Here I am attempting to build a small, cheap cluster of computers and run a suitable program on them to demonstrate the parallel processing capability. This ‘cluster’ may also be used as a test bed for programs which are intended to run on larger networks.

Hence the title ‘how to build a supercomputer’. I am not a bond villain.

What is parallel processing?

I’ll just give an overview here and go into more details in part 2.

Parallel processing is what a supercomputer relies on to perform calculations at high speed. It’s simple in principle – it simply involves the ‘main computer’ (or header)splitting up the task into smaller ‘chunks’, sending each of these to the individual computers (or nodes as we’ll call them). The nodes then send the result back to the header which puts the results together to get an overall result.

The beauty of parallel processing is it can often be advantageous to use multiple less powerful processors than a single high power processor. I like to think of it like this:

  • Think of the task as a bunch of boxes which require delivering from A to B
  • You have 2 options:
    • Use a single fighter jet to deliver the boxes one by one from A to B. This is analogous to a single high speed CPU (central processing unit – the main processor in your computer).
    • Split the boxes into groups of 10 and use a fleet of 10 vans to deliver the boxes in groups of 10. This is analogous to a program running parallel on a cluster of less powerful CPUs, or a GPU (Graphics processing unit – the processor responsible for graphics. graphics processing is essentially done in parallel)
IMG_0630

Analogy for parallel processing. Is it quicker to deliver boxes one by one at high speed, or to deliver them in groups of ten (or 100, or 1000)?

 

Of course, the choice of which method is used depends on the actual problem. Using the ‘boxes’ analagy above, if the reciever at B needs to inspect the contents of the first box they recieve prior to requesting the second box is sent, the multiple vans (or parallel) method is useless. Likewise, if the boxes can be recieved in any manner, and the time to deliver 10 by van is less than the time to complete 10 round trips by jet, the parallel option looks better.

The time taken to split the boxes into groups of 10, and sort them at the other end may also take a disproportionately large amount of time.

Problems where splitting a task into smaller sub-tasks to be completed by nodes take very little time are (hilariously) called ’embarrasingly parallel’ and are well suited to parallel processing. A classic case is a brute force search. If we are searching  1 billion candidates, it is easy to see that splitting the problem into groups of 100 million and sending each group to one of 10 processors is trivial.

Parallel processing for this project

It is also worth noting that in the analogy above, we don’t necessarily have to use cars as our parallel nodes. We could use 10 fighter jets (ie 10 high power CPUs). The issue here is obviously the cost. Sometimes it’s hugely disadvantageous too. If we use a single GPU for parallel processing it is more analogous to having a thousand cars. This is why people used to mine Bitcoin with their GPUs.

There is another way though. Most processors nowadays are multicore. That means each CPU is actually a ‘collection’ of more than 1 individual CPUs. Multicore processors are great but tend to be woefully under-utilised.

I’ll be using quad-core processors for this project.

If we could split the program between the separate cores of a CPU and have the multiple cores run in parallel, that’d be like using four fighter jets in parallel in the above analogy.

If we then do that across multiple computers (say 4 computers), we would have 16 cores running in parallel…or 16 fighter jets:

IMG_0626

Splitting a process into tasks to run across 16 CPU cores. This is what my cluster will do when fed the appropriate program.

 

I’ll have a think about doing GPU processing (ie, using thousands of vans in the above analogy) in a future post.

Parallel processing on a budget

Hardware

Remember the maker’s revolution I was talking about at the start? One of the real icons of it is the Raspberry pi. For anyone unfamiliar with the Pi, it is a single board computer capable of pretty much anything you can throw at it. They cost about £25 each. I’m going to network four of them together to build my ‘baby supercomputer’. The terminology used by the Pi community is a ‘bramble’ by the way.

At the time of typing, the Raspberry pi 3 has just come out. I am using 3 of these as ‘worker nodes’ and one Raspberry Pi 2 as a ‘header node’.

Rather than diving in and making a total mess of cables, I decided to actually put the thing together first. The Pis are simply stacked using PCB spacers. They are powered with a cheap USB hub I had lying around.

Obviously they need to be able to talk to each other. This required a bit of networking. From the hardware side  it’s simple. Each is connected via ethernet patch cable (any good engineer should be able to make a patch cable!) to an ethernet switch. Prior to this post I had never really been involved with networking but it’s dead easy to pick up. The ethernet switch is simply a box that routes the signal to and from each Pi to where it is required.The setup is basically this:

IMG_0629

How the cluster is hooked up to form a network

For now I just glued the stack of Pis onto the switch. I’ll tap some proper screw holes in later. Here’s how the setup looks in reality while it’s up and running.

IMG_0621

The Pi cluster running away nicely. Note, the USB attachments are a flash drive and keyboard dongle. The latter was used to setup the Pi before I enabled SSH access.

 

 Software

I won’t go into too much detail here. There is a fantastically clear guide in the links at the bottom of the page with all the details. I suggest anyone unfamiliar with SSH and networking go there for a better understanding

I logged into each of the Pis in turn and, through ‘raspi-config’ renamed them Pi1 to Pi4. I gave each a password and expanded the filesystem.

To talk to the cluster, and allow them to talk to each other, I used the SSH protocol to accesss the nodes in turn. SSH is just a way of accessing a computer remotely from another on in the network.

I needed the Pis to be able to communicate with each other without a password so they can pass tasks to each other. In order to do that, I generated RSA keys for each Pi and passed the public key of each to all the other Pis. Again, instructions for this can be found at the Raspberry Pi site in the links. The keys were passed via SCP (Secure communications protocol – more details at the raspberry pi website)

In order to SSH into any node in the cluster, a password is still required. But, once in, it is possible to SSH between nodes without a password, since each has the public key for each of the other nodes saved in the ‘authorized keys’ folder. Keys saved in this folder are automatically allowed access to the node.

To remotely access the cluster, I just use ‘Putty’ from my laptop. Putty is a small piece of software which allows you to enter a devices IP address and SSH into it provided you have the password. To allow this, I configured my router to give each of the Pis a static IP address. That means they will always have the same IP address, so I can log in via Putty and ensure the address I enter is always correct.

Summary

What we have now is a network of four individual computers which can pass information (tasks) to one another freely over SSH. They are accessed remotely via Putty on my laptop.

It’s not a ‘supercomputer’ yet. In the next, Pi themed, post (coming up on Pi day!) we’ll write a program specially designed for parallel processing and run it across the cluster. If everything works, I expect to see the program run significantly faster across the cluster than on just a single node.

As a final thought, congratulations to the Deepthought team who, at the time of writing, have just beat Lee Sedol, the world Go champion with a computer for the first time. I hope to eventually use my cluster to run some (far, far simpler) neural network programs.

 

 Further reading

http://makezine.com/projects/build-a-compact-4-node-raspberry-pi-cluster/     Excellent article by Alisdair Allan doing pretty much the same thing but with a lot more detail. I followed this guide for a lot of this build.

https://www.raspberrypi.org/documentation/remote-access/ssh/passwordless.md                Raspberry Pi guide to generating RSA keys to allow for SSH access between nodes

http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html                                          Download Putty from here.

 

Advertisement

3 thoughts on “How to build a supercomputer – part 1

  1. Pingback: How to build a supercomputer – part 2 | brain -> blueprint -> build

  2. Pingback: GPU programming part 1 – building a system | brain -> blueprint -> build

  3. Pingback: SETI Neural Network on a Raspberry Pi – Part 2 | brain -> blueprint -> build

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s