Picture looking for a needle in a haystack. But the haystack keeps getting bigger and bigger. And you don’t know how many needles are in there, or what they look like. Some people reckon there are millions of needles in there, while a lot of people think there are none at all. Other people say wouldn’t even recognize the needle if you found it. Some say the needle is made of gold and some people tell you the needle is laced with poison and will kill you.
That is what SETI (the Search for Extraterrestrial Intelligence) is about. The haystack is all the electromagnetic radiation which falls on Earth. The needle is, an artificial signal in that radiation.
In the last few years, their methods for finding needles have improved drastically. And, their haystack just got a whole lot bigger…
Let’s clear something up right away
You can think of SETI as the biggest search ever. It’s there to answer perhaps the biggest question ever: ‘are we alone in the universe?’. They are a fantastic organisation and employ some of the most ingenious people on the planet.
Unfortunately, thanks in part to terrible media coverage and a prevalence of awful ‘documentaries’, if you mention extraterrestrial intelligence, people think you are this guy:
When it comes to SETI, it’s best to just go straight to their website, or check out some of their excellent videos on Youtube. In short, this isn’t daft quack science, SETI is one of the longest running and most exciting scientific collaborations in history.
With that cleared up:
What exactly is the haystack and how are we searching in it?
A lot of readers are probably familiar with the electromagnetic spectrum. For those that aren’t here is a quick summary.
Let’s start with visible light. To clarify, light is made of waves, and these waves can have different lengths, or wavelengths. Everyone is familiar with sending a beam of light through a prism – it splits up into its constituent wavelengths. Light with a shorter wavelength is violet, as the wavelength increases, the colour changes until we get to red light.
But, it doesn’t stop at red and violet. The spectrum goes way, way further in each direction as the wavelength increases. About a century or so ago, we named parts of the spectrum according to the wavelengths they represent:
So, radio waves, X-rays, visible light, etc are all really the same thing – the wavelength just differs.
At the moment, if you want to send a message wirelessley through a vacuum (and you don’t have access to a couple of black holes or a particle accelerator), you have to use electromagnetic waves. Humans generally use radio and microwaves. We’re pretty good at it too – we have spacecraft more than 20 billion kilometers away which talk to us with radio waves…using less power than a lightbulb.
We sometimes use lasers (visible and infra red) for this too, but more on that later.
What if Extraterrestrials are doing this too?
If our galaxy is indeed teeming with intelligent life, maybe they also communicate with electromagnetic waves. I say ‘maybe’ because, although this seems like a fair assumption, it isn’t. Who knows how other civilisations would communicate. I am not going into the what ifs in this post – plenty other websites cover those questions. It’s easiest to just assume that when searching for extrarrestrial signals absolutely nothing is guaranteed.
But if they do communicate remotely like us, how could we see it?
Let’s plug in a software defined radio reciever and take a look at a very narrow band of the radio spectrum on Earth:
That peak is a local FM radio station. It’s pretty narrow, on the order of 0.1MHz. Just thinking about things logically, it is far less energy intensive to transmit a signal in a very narrow range of frequencies than across a broad band of them.
When we transmit to and from spacecraft, we use a bandwidth far narrower than that of an FM radio station. The transmissions we receive from Voyager 1 look like this:
When you’re transmitting for billions of miles with a 20Watt transmitter (you read that right!), you’d better make sure your bandwidth is as narrow as possible!
Let’s zoom out and see what that looks like in relation the the spectrum as a whole:
So, if another civilization communicates in a similar way to us, we’d have to find a tiny peak like that except:
- We have no idea what frequency they’d be transmitting at. We can make some good guesses, but they will always just be guesses.
- We’d have to be able to detect the peak about background noise. In the image above, the peak was way above the background. As the source gets weaker, and the distance further, that peak would drop off until it becomes indistinguishable.
- We’d have to hope we were listening while they were transmitting.
- We’d have to distinguish that peak from our own sources on Earth. There are LOADS of sources on Earth!
Let’s address those problems in order:
1) What’s the frequency (Kenneth)?
This is always going to be a guess but there are some pretty strong candidates. On frequency that has been searched for years is the Hydrogen line. This is the name for the 21cm wavelength. It propagates well through interstellar dust and emissions at this frequency are produced naturally by Hydrogen, so it’d be a good ‘Hello, we’re here’ frequency.
Another band which hasn’t been explored that much, until now, is the visible spectrum. Basically, the shorter the wavelength, the more info you can pack into a transmission. Lasers can carry a tremendous about of information compared to radio waves. Lasers using wavelengths in the near infrared propogate extremely well through interstellar dust. After a bit of reading around, it becomes easy to convince yourself that any extraterrestrial civilisation would definitley use this method!
Regardless, it’ still always going to be a guess. SETI search a range of frequencies, chosen using information such as that above. Interestingly, within a few years, we should have the computing power available to search the entire radio spectrum, so this may not be an issue for too much longer.
2) Our detection capability over background noise
I found this fascinating.
On one hand, our current search instruments are more sensitive than most people would think. If a civilisation on one of the 1000 nearest stars was using something like the GRAVES Radar , we’d be able to spot it with the Green Bank Telescope. When it comes to lasers, the case sounds even more convincing. We regularly fire off lasers into the sky that would outshine the entire sun to someone in the path of them. We have the capability to detect a 100W laser shone from the nearest star. That is amazing.
On the other hand, the nearest 1000 stars in nothing in the grand scheme of things. And the next point kind of puts a damper on things…
3) Would we be in the path of a transmission?
Planets are tiny. If you want to get an idea of what I’m talking about, just download Celestia and have a play around. If another civilization like us randomly fired off laser beams and directional radio waves into the sky, what would be the chance of our planet passing through the beam. Even then, what would be the chance we would have a telescope pointed at their star at exactly the right moment.
Doesn’t seem likely does it? That said, we just have no idea how often, or how other civilizations would transmit data. Maybe they use some futuristic technology which produces huge amounts of radio energy in every direction continuously. This would need a high amount of energy…but maybe they have a huge amount of energy freely available. Perhaps a setup like this wouldn’t even be intended to communicate, sodium vapour street lamps look a lot like laser light on a spectrograph. Who knows, maybe some other civilisation uses something similar for some other purpose and it’s sending huge amounts of narrow band electromagnetic radiation in every direction.
Or perhaps they have laser beacons trained continuously on thousands of nearby stars. If someone wanted to make contact with us this wouldn’t be unfeasible at all. We could signal thousands of nearby stars continuously with laser beacons that our technology could easily detect with the power from a small gas power station.
So the answer to this question is, once again, ‘we have no idea’. SETI do use some clever tricks to maximise our chances though, for example:
SETI will aim their receivers at systems when two planets align relative to us. This aims to detect transmissions sent between two planets of a civilisation more advanced than us. If you aren’t aware of the rate we are finding planets around other start, get over to the Kepler website and prepare to be blown away!
4) Could we separate an extraterrestrial source from our own?
For radio sources, this is pretty difficult but we certainly have the computing power to do it. I’ll leave a link to a page on the SETI website at the end of this article explaining how this is done.
For visible light sources, this is actually kind of easier. There is no terrestrial interference since lasers are directional. However, the detectors still get a lot of noise from cosmic rays. My project focuses largely on filtering these out and part two will discuss this extensively.
Couple all of the above with the sheer number of stars to search and the task becomes pretty daunting. However, there is a part of me that says ‘how hard can it be to find a little peak in that spectrum?’ So, I’ve written some code to do just that.
I said SETI’s haystack just got a lot bigger. That is thanks to Russian billionaire Yuri Milner. You may have heard of him recently – he just donated $100 million to kick start a project to send a probe at Alpha Centauri. Needless to say, that makes me very excited, I wanted to do a post on that but I couldn’t crowbar a project of my own into it.
However, a year ago, he started a similar project- Breakthrough listen. It didn’t get as much media coverage but he basically donated $100 million to drastically increase the amount of data (read, telescope time) and resources SETI have access to. By the end of this year, SETI will be producing more data in a day than they did in an entire year.
Best of all, all this data is publically available. And the first batch went online last month.
Of course, this huge amount of data needs an absolutely colossal amount of computing power to process. Fortunately, not only do SETI have access to this through SETI@home, but machine learning methods have improved dramatically in the last few years. We are getting better at looking for the needle every day.
Anyone can analyse this data using SETI’s free software called SETI@Home – I’ve been running it for about 10 years. If you want to get involved just install it and it runs in the background when you aren’t using your computer.
But I wanted to write my own code to analyze their data. This project isn’t about finding aliens, or improving SETIs search techniques (the folk there are way, WAY smarter than me!). Instead it’s an excercise in data science and will be the first time I’ve written a neural network that actually does something useful.
The Breakthrough listen project is searching both in the microwave section of the radio spectrum (from 1 – 10GHz, which is ten ten times the bandwidth o previous SETI studies) , and the entire visible light spectrum.
Initially I planned to process data from the radio data but things didn’t exactly go to plan.
The data is posted online in a format called FITS – ‘Flexible Image Transfer System’. It is widely used by astronomers for data and images and is clearly excellent at what it does. Unfortunately, it’s extremely difficult to process for those who are unfamiliar. Here’s what a tiny section of a raw FITS file looks like:
Fortuntely, there is a Python library called AstroPy which can deal with FITS files. The SETI team have also posted some guides on Github to get people started. These guides aren’t very comprehensive, but they are a start. I’m definitley not complaining – they are busy enough as it is!
The radio data FITS files are in raw format – otherwise known as ‘baseband’ format. This needs converting to ‘filterbank’ format. Unfortunately, the Python class they provide doesn’t seem to work. Since I barely understand what these formats are, and exactly what data is embedded in them, so re-writing this class would take me ages and I probably wouldn’t learn much useful. I would like to have a try in the future though.
So, I decided to have a shot at the optical band data instead. The datasets for optical data are smaller and definitlely easier to process.
The optical data is collected by the Automated Planet Finder (APF) which is simply a telescope with a spectrograph fitted.
Recall how I spoke above passing light through a prism to split it into a spectrum. That’s essentially what the APF does. However, rather than using a prism, it uses an arrangement called an Echelle Spectrogram to produce a stretched out spectrum so we can get a high level of detail. If the spectrum was just a single line, it would be a very long thin line, so the arrangement essentially ‘chops it up’ into a series of shorter lines. These are then ‘stacked’ on top of one another to form a spectrum which can be photographed with a high resolution digital camera:
Let’s load a spectrum from the Breakthrough listen data into Python and open it in Astropy. I’ve plotted it as an image using Matplotlib (an excellent plotting library in Python). Here’s what an Echelle specrograph looks like in real life:
A black rectangle? No, look closely. There are extremely faint horizontal lines in the image.Although the APF produced hours long exposures on each star it examines, the amount of light which falls on the camera is tiny. So, the spectrum is extremely faint.
Astropy saves the spectrum as an array of numbers, so let’s take the logarithm of each element to make the differences in brightness more pronounced:
That’s better. Now we need to extract the bright lines from the image so we can inspect them for features which look like narrowband signals (we’ll get to what a signal looks like in part 2).
It’s worth noting that the spectrum is from 374-970nm for APF data. Refer to the above diagram of the electromagnetic spectrum for some perepsctive on where that range lays.
Taking samples from the image
The conventional method to extract each spectrum line seems to be to plot a line of best fit across each specrum line and extract the data along this line.
I wasn’t too keen on that method because of how I plan to process the data later on with a neural network. In order to use a neural network, I will need to split each line into a series of small segments. Each of these segments will be 20X20 pixels in size.
To extract the spectral lines I started off with a 20X20 square example image. This example image is a typical example of what we’d expect a segment of a spectral line to look like. It is normalised so the brightest pixel has a value of 1 and the dimmest has a value of zero:
I then took a 20X20 pixel segment of the overall Echelle spectrum, starting in the top right.
I normalised it and then summed across each row and compared the sum of rows to that from the exampe image. I took the sum of the square of the difference of each for from the example and sample image and saved it.
Then I moved the sample image down one pixel and repeated the above, saving the new sum of the squares of the differences.
I repeated this, moving the sample image down one pixel at a time for 20 pixels. The position which corresponded to the lowest sum of the square of the differences was determined to be the sample with the closest resemblance to the example image. This is otherwise known as the least squares method and is a standard method to find a best fit.
This sample was the starting point of the first row and was assumed to fall on a spectral line. I then moved the sample array across one pixel at a time. Each move to the right was accompanied by the least squares fitting method for one pixel above, below and inline with the previous y value. The was necessary because the spectra are not straight lines. Constantly re-fitting the sample in the y direction ensures we follow the spectrum along and extract all the data.
When we reach the end of a row, the code moves the sample array goes back to x=0 and moves down by 20 pixels, and sweeps another 20 pixels down to search for the next row.
For now, the Python script simply prints out the coordinates of each sample, but in part two we’ll process each sample to determine whether it contains anything which resembles an artificial signal.
Plotting the samples from some random coordinates returned by the script gave the following images:
So it’s clearly working. The code is able to repeatedly traverse across the raw echelle spectrograph and extract segments of the spectral lines for processing.
Progress so far
For this project, half of the challenge is understanding what we are actually processing, and handling the data. In this post, we have got to grips with the problem faced by SETI. We have learned to read their optical data in Python and extract the useful information from it.
In part two, we’ll learn what a neural network is and set one running on a Raspberry Pi to process this data to look for patterns which may be artificial.
https://www.youtube.com/c/setiinstitute SETI’s youtube channel. Lots of really interesting talks here.
http://www.shatters.net/celestia/ Celestia – an actual size ‘universe simulator. Don’t download this if you have things to do.
http://breakthroughinitiatives.org/OpenDataSearch The data page from the breakthrough initiatives website. All the SETI data for breakthrough listen can be downloaded from here.
https://seti.berkeley.edu/listen/ SETI’s introduction to how to process the data
https://github.com/UCBerkeleySETI/breakthrough Further info and tutorials for processing the data. If anyone can figure out how to open the GBT data in Astropy, just give me a shout.
http://waitbutwhy.com/2014/05/fermi-paradox.html No post about extraterrestrial intelligence would be complete without the fantastic waitbutwhy post about the Fermi Paradox.