This is the most difficult post I’ve written so far. Explaining a neural network isn’t too difficult. Building one is relativley easy too. Explaining it in a way that isn’t cripplingly boring is a nightmare! Also, this project involved quite a lot more faff than I anticipated.
In my previous post on this topic, I wrote a Python script to take visual/Infrared spectrographs, produced by SETI. My program extracted the useful information from these spectrographs. At the end of that post, I stated I would write a neural network to analyse this information and look for signs of artificial signals.
I’m pleased to say, I have managed this. Let’s go through my system step by step. Don’t worry if you’ve never heard of a neural network, we’ll go through that in detail.
What are we looking for?
The code presented in my last post broke the spectrograph up into 20X20 pixel blocks. Each block contained a short segment of the captured spectrum, normalised for brightness, like this:
We are looking for signs of narrowband (AKA Laser) light in each of these blocks. Refer to the previous post for an explanation of why narrowband light is likely to be of artificial origin.
A back of an envelope calculation convinced me that the type of signal used for interstellar communication would probably only be a pixel or two in width. So, we are looking for blocks with a ‘stripe’ down the middle. The stripe would fade towards the edges, where the detector picks up less light. Like this:
A bright source would also be likely to partially saturate some of the adjacent pixels, so I reckon an artificial signal may have some ‘graduation’ either side, like this:
So that’s two examples of the type of thing we are looking for. However, it is a little more complicated than that.
The cosmic ray problem
This is the main difficulty with writing software to automatically detect artificial signals. At first sight, it appears easy to write something to look for bright, localised points on a specrograph. However, any spectrograph contains hundreds of bright localised points. They are caused by ionised particles striking the detector. They usually cause several pixels to become fully saturated, and nearby ones to become partially saturated.
That description sounds awfully like the description I gave above of an artificial signal…
In fact, that summarises the issue nicely. It’s extremely difficult to give a quantitive description of the difference between an artificial signal and a natural one due to cosmic rays. Accordingly, it’s difficult to write an algorithm that seperates the two.
However, to a human, the difference is obvious when just looking at an image. Despite minimal knowledge of spectrography, anyone can identify which of the signals below are natural and which are artificial:
We need to write an algorithm that thinks like a human…
A lot of prominent computer scientists believe neural networks are going to change everything in the near future. They are already capable of some pretty impressive tasks, but let’s not get into that now.
Like the genetic algorithms I have previously covered, neural networks are inspired by nature. They are quite literally roughly modelled on a brain, which is composed of millions of inter-connected neurons (let’s call them nodes). Intregingly, like a brain, we can feed some info in and get a result, but we don’t really know the mechanics of how the system got to that result.
This sounds scary but it isn’t.
Let’s start off with our input. In our case, our input to the neural network is a 20X20 pixel grayscale image. This is stored in the computer memory as a list of 400 values between 0 and 256. 256 denotes a white pixel, and 0 denotes a black pixel.
For now, let’s forget the 20X20 image and just image a 3X3 image.
Let’s assign 9 locations in the memory. Each of these can store a single pixel’s value. Each of these locations is called an ‘input node’.
How a simpler algorithm would approach this
A naive approach would take each of these nodes and compare them to the nodes of a sample image. If the value was close to that of the sample image, we may increase the output number by 1. If the input node’s value is far away from that on the sample image, it may decrease the output value by one.
A sufficiently high output value would be classified as a detection.
Let’s draw this for a 3X3 image.
The problem here is pretty apparent. An actual image of an artificial signal can vary wildly. The overall brightness can differ, it can be speckled with cosmic rays, the detection stripe can vary somewhat.
The algorithm described above can only really look for a very specific image. It is likely to reject a true detection because other parts of the image vary. Or, if the output sensitivity is set lower, it will probably give an unmanageably high number of false positives.
That’s just for a 3X3 image. The difficulty increases exponentially with image size. We need something smarter.
A Hidden Layer
Rather than just taking inputs and feeding directly to the output, let’s add a layer of nodes between the input and output nodes:
For now, we can say the output of each input node is 1 if the node is close to the corresponding node’s value in the example image. It is 0 if the node value differs significantly from that of the corresponding node of the example image. We can define the output of each hidden node as the sum of all the inputs to it. Then, the sum of all the outputs from the hidden layer can be our output value. Feeding and input and allowing it to propagate through the network to give an output is known as feeding forward.
Since each of the hidden nodes considers all the inputs, it is clear that this is more selective than the naive model above. However, it is still not very useful. We gave the criteria for the links between the input and hidden nodes to ‘activate’ only if the corresponding pixel was similar to that of the example image.
However, there are no such criteria for the links between the hidden and output node. We need to train the network.
This is the part I find difficult to explain.
First, let’s re-configure our network slightly. Now the output of each input node will be an array of that node’s input multiplied by a series of link weights:
It then follows that the input of each hidden node will be the sum of all the outputs from the input nodes that connect to it….that sentence is horrible to read – just look at this image:
Then, the output of each hidden node will be an array of it’s input multiplied by another series of link weights
The link weights are the key to creating a working neural network.
At first, we just assign them all a random value, say 0.2. If the then input a random image, we will get a random output value
This is useless. However, if we can somehow change the values of all the link weights appropriately, the output will only be above a certain value if the correct type of input is recieved. We change the value of these link weights by training the network. We feed it a series of inputs which we know are ‘correct’. Then a process called backpropogation is used to alter the link weights accordingly.
I’m not going to go into full details of how backpropogation works, there are plenty of guides of youtube for those wanting more detail. In summary:
- The example input is fed through the network.
- The output value is compared with the desired output value (for example we may define an output of 1 as a detection and -1 as a rejection).
- The link weights between the output and hidden layer are then adjusted to alter the actual overall output, making it as close to the desired output as possible.
- The degree to which the weights are altered is limited by a ‘learning rate’.
- The links between the hidden and input nodes are then altered to make the outputs of each hidden node as close to the desired output necessary to generate the desired overall output as possible.
- Steps 1 to 5 are repeated numerous times with both example detections (with desired outputs of 1) and example rejections (with desired outputs of -1).
If you don’t understand this first time, don’t worry. It’s quite a lot to take in in one go. It’s sufficient to accept that backpropogation makes the network’s output similar for similar inputs. It trains the network.
One last addition is necessary to create a working neural network. Until now, the output of each node has been equal to the sum of its inputs. This tends to make the network overly sensitive to small variations. This approach can lead to a small number of hidden nodes becomming dominant, and other nodes being ignored. What we relly want is an output which increases less and less as the sum of the input values increase. Something like this:
That curve is the tanh curve. We just multiply all the inputs by tanh to get the output of each node:
Hidden node output = tanh(sum of(incoming input nodes * link weights))
And that’s it. A working neural network!
Applying it to my problem
Let’s go back to the signal detection problem.
The first thing I tried was scaling this neural network up. The number of input nodes was increased to 400 (one for each pixel of the input). The number of hidden nodes was scaled up to 200. This is a very simple neural network, but I saw no reason it wouldn’t give reasonable results.
I wrote a function to train the network using backpropogation, as described above.
Next I needed some sample images to train the network with. These were created as follows:
Sample ‘reject’ images
These are examples of a ‘non-detection’. These were simply created by using the script from my last post to extract images from a sample spectrograph. One sample spectrograph provides about 20000 images. The backpropogation function used each of these images in turn, feeding them into and then backpropogating them through the network, with a target output of -1 for each image.
Sample ‘accept’ images
These are simulated examples of a detection. I wrote four seperate functions to create four different detection types:
- Detection type 1 is a narrowband signal with no ‘blurring’ of adjacent pixels.
- Detection type 2 is a narrowband signal with blurred adjacent pixels.
- Detection type 3 is a narrowband signal with no blurring and a simulated cosmic ray strike.
- Detection type 4 is a narrowband signal with blurring and a simulated cosmic ray strike.
Initially the backpropogation function fed forward and backpropogated 5000 of each of these samples through the network with a target output value of 1.
I copied all the code across to one of the Raspberry Pis on my cluster and trained it with the 20000 simulated detection and 20000 ‘reject’ images with a learning rate of 0.5. This took about 8 hours because the Raspberry Pi is not particularly powerful.
With a fully trained network, I fed in a sample ‘accept’ image (generated with one of the functions described above). If correctly trained, the output for this image should have been 1…and it was.
Then I fed through a sample ‘reject’ image, picked at random from a spectrograph. If correctly configured, the output would have been -1. It wasn’t – it was +1.
This did not surprise me in the slightest. Not only was this the first neural network I had ever built for real-world use, it was also a very simple one.
As I mentioned earlier, properly configured neural networks tend to give the correct answer, but it is not possible to see the mechanics of how they arrive at that answer. This means things can get a little ‘trial and error’; a major disadvantage of using a neural network.
I could change the following to reconfigure the network without adding too much complexity:
- Change the number of hidden nodes.
- Use 2 output nodes instead of 1 (ie have an ‘accept’ node and a ‘reject’ node).
- Add more hidden layers. Most practical networks use more than one layer.
- Change the learning rate.
- Change how example images are generated and input into the network.
Since I didn’t really know which of these would give a better answer, and it took at least 8 hours to implement each one, I just tried them in turn.
This time I re-ran the backpropogation with 400 hidden nodes. I wasn’t surprised that this made no difference to the output result. It was worth trying since increasing the number of nodes is a trivial task.
I decided it was important to make sure the data used to train the network was fed through properly.
Rather than feeding through and backpropogating all the ‘accept’ examples and then all the ‘reject’ samples, I alternatley fed through a reject sample followed by an accept sample. The backpropogation algorithm was altered to first provide a ‘reject’ sample and then pick one of the four random ‘accept’ sample types and provide it.
I wasn’t too surprised that this didn’t work either. I think the backpropogation process is associative (meaning it doesn’t matter what order we provide the inputs in). Still, it was worth a try.
Still trying out the simpler things first, I changed the number of output nodes to 2. Now, there were 2 output nodes, one which would only give a value of ‘1’ for a detection. The other would only give a value of ‘1’ for a rejection.
In other words, the output for a detection would be [1,-1]
The output for a rejection would be [-1,1]
This still didn’t work, every sample I fed in just gave an output of [1,1]
I now decided to make things more complex by including 2 more hidden layers. This wasn’t too difficult to accomplish. I made the number of nodes in each layer half of the number in the previous layer.
Increasing the number of layers increases the sensitivity of the network but increases runtime. It now took more than 12 hours to train.
And it still didn’t work…
At this point, I realised I’d missed out one of the simpler options. The learning rate had been set at 0.5 throughout. I was feeding 40,000 samples through the network and this seemed intuitivley high.
So, I decreased the learning rate to 0.005. That way each sample would have a far smaller effect on the network as a whole.
This time it worked. The network distinguished between all the samples I fed into it.
Finalising the code
Before doing anything else, I saved the list of values for the link weights to the disk. These collectivley are quite literally the neural network. Just a list of numbers. There is a function in the code to ‘restore’ the network. This just takes these numbers and builds the network ready for running.
I wrote a function into the code to ‘run’ the network automatically. This allows me to load in a spectrograph image and leave the network to extract each 20X20 sample and feed it through the network. If the value of the ‘accept’ output node is greater than 0.99, the coordinates of the respective input sample are printed so I can inspect it manually.
At the time of writing I have to load in each image manually to be analysed – it takes around a day to analyse an image. I’ll write a function to automatically download and load in images from the Breakthrough website at some point.
Then I can leave the Raspberry Pi to automatically run the neural network, constantly searching for an artificial signal. It hasn’t found one yet, and it’s overwhelmingly unlikely it ever will…