How the SETI@Home Project Works - by Ricky Leon Murphy:
Introduction
What is SETI@Home and Why Use
It?
SETI@Home - The large supercomputer
Who is involved
How is the search performed?
How does the program work?
Data Collection
Finding Candidates
Testing Data Integrity
Removing Radio Interference
Identify Final Candidates
Verification - What is Next?
Summary
References
Back to Top |
Back to Astrobiology
Introduction
On a clear night, you can see literally
hundreds of stars. That number increases greatly when looking though a
telescope. I recall a statement made by the late Dr. Carl Sagan: the
number of stars in the known Universe outnumbers all of the grains of
sand on every beach on Earth. That is an enormous number of stars! If a
small fraction of those stars are capable of supporting a system of
planets, and if a fraction of those planets are capable of supporting
life, and if a fraction of the life bearing planets are capable of
supporting intelligent life, there will still be an enormous number of
civilizations within the Universe (if that phrase sounds somewhat
familiar, a variation of this statement is from the movie Contact).
The problem is will any of those civilizations make an attempt to send
out a signal to alert other civilizations of its existence? This is the
foundation of the Search for Extra-Terrestrial Intelligence, or SETI.
The founding father of SETI is Frank Drake. As a newly graduated student
of Astronomy, Drake worked at the National Radio Astronomy Observatory
in Green Bank, West Virginia. During telescope quiet time, he was
allowed to use the telescope to search for a signal of extra-terrestrial
origins (Shostak, page 153-154). Using home-made equipment, Drake
scanned the frequencies above and below the radiation emitted by the
Hydrogen atom emitting at 1420MHz (Shostak, page 151, 154). This first
SETI was named Project Ozma; and while no signal was detected, his
efforts demonstrated that such an effort can take place. Since Project
Ozma, there have been several SETI efforts by various organizations and
universities. While most of these searches are performed by professional
astronomers and those with the means to search on their own, there is
one that anyone with a computer can participate:
SETI@Home.
Back to Top |
Back to Astrobiology
So what exactly is SETI@Home
and why should I use it?
SETI@Home is software that is designed to
operate as a screen saver for a personal computer while processing work
units issued by the University of California at Berkeley and performing
as a participant in a very large network of computers behaving like a
supercomputer. However, before explaining what SETI@Home is, the first
important question is why the program should be used in the first place.
Without knowing why, the rest of the questions are meaningless. If you
are considering using SETI@Home, you have already answered the question!
If you are like me, you want to know what and who is out there. The
probability of life was first demonstrated by the famous Drake Equation.
Authored by Frank Drake, the Drake equation is not an actual math
problem; the equation is just an illustration of the probability that
life can exist elsewhere in our Universe. This is the equation:
N
=
R*
fp
ne
fl
fi
fc
L
N
= the number of intelligent civilizations
R*
= the birthrate of suitable, long-lived stars in our galaxy (between 1
and 10)
fp
= fraction of stars that have planets – about 50%
ne
= fraction of planets where life can be sustained – at least 1
fl
= fraction of planets (from
ne)
where life can be sustained – at least 1
fi
= fraction of
fl
where intelligent life evolves – at least 1
fc
= fraction of
fi
that communicates (or is willing to communicate) – at least 1
L= fraction of planet’s life that the civilization can communicate –
can be any number
(Equation and parameters borrowed from Shostak, page 180 to 181)
The above values can be any number, and
the results can certainly be argued; there is a host of variables to
consider – such as the presence of water, what type of star a planet
orbits, and what is truly required for life. One thing to remember is
Earth counts as part of the equation.
Even if you are not one to believe there
may be life outside our boundaries of Earth, being a part of a worldwide
network of computer users shows that such a network can be used to help
solve other problems requiring intensive computer involvement. A good
example is a study out of Stanford University called the Protein Folding
project, called Folding@Home (http://www.stanford.edu/group/pandegroup/folding/).
The mission statement on the SETI
Institute website (www.seti.org)
says it best: “The mission of the SETI Institute is to explore,
understand and explain the origin, nature and prevalence of life in the
universe.”
Back to Top |
Back to Astrobiology
SETI@Home is a large network of computers acting like a large
supercomputer.
A standard supercomputer is a device used
to process large amounts of data and to solve problems that are too
difficult and time intensive for a single user or computer. A
supercomputer is a network of several computers controlled by a single
server using special server software (Microsoft makes a version of this
called Advanced Server). Generally the other computers are not
accessible by a user, but given instructions by the server. A very good
example is the supercomputer at Swinburne University. This $700,000
machine boasts 1080 Gigaflops (http://supercomputing.swin.edu.au).
FLOP is an acronym for floating point operation – the more FLOP’s the
better. A floating point is a type instruction built into a processor –
like your garden variety Pentium processor – that adjusts its ability to
perform mathematical calculations in an accurate yet efficient way. For
example, if you have two numbers to process that has a varying amount of
numbers following a decimal point, every number is included. If a group
of numbers has no numbers past a decimal point, nothing beyond the
decimal point is used. The movement of the decimal point is the
floating point that has adjusted itself to efficiently process data.
Because of the large amount of processing power of the Swinburne
supercomputer, this computer is used to simulate galaxy formation and
collisions by mapping out the motions of millions of simulated stars
individually - although a program can be written to analyze SETI work
units or any other project deemed fit by the programmer and the computer
owner. While the Swinburne supercomputer has 160 computers in its
arsenal, there are over four million personal computers
processing SETI@Home work units (as of November 3, 2003). This comes to
about 50,000 gigaflops! A dedicated supercomputer with this processing
power would cost about $35,000,000 (R1, slide 26). Such a network is
capable of processing large amounts of data and saves a tremendous
amount of money.
Back to Top |
Back to Astrobiology
Who is involved with SETI@Home?
While it is easy to associate SETI@Home
with the SETI Institute, the SETI@Home project is an extension to the
SERENDIP (Search for Extraterrestrial Radio Emissions from Nearby
Developed Intelligent Populations) project designed and performed by the
University of California at Berkeley. As the name suggests, SERENDIP is
looking for radio emissions that are not natural, but produced either
deliberately via radio beacon, or by emission by a civilizations
technology like our TV and radio emissions. SERENDIP is an ongoing
project, and is already on its fourth version, called SERENDIP IV. All
of their listening equipment is housed at the Arecibo Radio Observatory,
and is continuously recording and analyzing data (unless the telescope
is closed for routine repair). While Berkeley University is brains
behind this project, everyone who downloads and uses the SETI@Home
client software is a part of the SETI@Home project.
Back to Top |
Back to Astrobiology
What are we looking for exactly, and how is the search performed?
This is the fun part. A civilization of
intelligence at least equal to our own will either deliberately send out
some type of beacon, or emanate radio noise as a result of technology.
These signals can be sent optically, through radio waves, or by some
other method; however, our own atmosphere limits us to only optical or
radio wave detection (Universe, page 140). An optical pulse detector can
be used, but visible light suffers from extinction (Universe, page 457)
– meaning the signal weakens as it travels through the interstellar
medium as it is absorbed by the interstellar dust and debris. While
searching for an optical pulse (called Optical SETI, or OSETI) is
gaining favor, telescope and other equipment is required to perform such
a search, and the cost will alienate just about anyone who wants to join
the search (more on OSETI here:
http://www.coseti.org/radobs31.htm). A radio telescope can be tuned
to a specific frequency, such as above and below the emission of the
Hydrogen spectra. An example of this is our ability to map the structure
of our own galaxy using a radio telescope – something that is not
possible using an optical telescope (Universe, page 568). By scanning
above and below the frequency emitted by Hydrogen, we may have luck
detecting a deliberate signal. Hydrogen is the most abundant element in
the Universe, and any intelligent civilization would know this.
Intelligent civilizations would also know that Hydrogen emits radio
waves at 21cm (see figure 1).
Figure 1.
A 21cm photon is release
when the hydrogen atom goes from higher energy state to lower energy
state. Think of this frequency as the interstellar dial-tone. A radio
telescope is tuned to 1420MHz to listen to this frequency
(Image from:
http://instruct1.cit.cornell.edu/courses/astro101/lec08.htm).
By using the Hydrogen frequency as an
interstellar dial-tone, an intelligent civilization may send a signal
using a frequency above or below the frequency of Hydrogen. This gives
us a place to start looking – or listening (Shostak, page 151).
To give an idea to how we listen to this,
let’s examine the pieces required to perform this task. First we need
something to gather the signal so we will need a radio telescope. Radio
is a portion of the electro-magnetic spectrum, but we cannot see radio
since the wavelengths are much longer than visible light. A radio dish
is used to collect the longer wavelengths, which is why a radio dish is
so large in diameter. Like all things in astronomy, the larger the
diameter, the more sensitive the dish becomes. The choice for such a
dish is the largest radio dish currently available: the Arecibo Radio
Observatory. The diameter of this telescope is a whopping one thousand
feet.
Figure 2. (Image
borrowed from:
http://www.naic.edu/aisr/sas/sashomeframe.html)
Radio signals are received by the dish at
the bottom of the photo (Figure 2) and are reflected to the feed horn
hanging above the dish. The feed horn gathers the radio signals and
sends them through wires to an amplifier (because sometimes the signal
is very weak and needs to be amplified). The amplifier sends the signal
to special instrument called a spectrum analyzer (sometimes the spectrum
analyzer can operate without an amplifier – such decisions are left to
the engineer setting up the equipment). This examines frequencies from
1418.5 MHz to 1421.5 MHz
(http://setiathome.ssl.berkeley.edu/newsletters/newsletter7.html)
at every 0.6Hz. That comes to 168
million channels. The SERENDIP IV project uses this spectrum analyzer
along with a supercomputer to examine the frequencies in real time.
Since data analysis is in real time, only strong signals are looked for,
and each frequency is only analyzed for 1.7 seconds. Regardless of the
amount of processing performed in real-time, it is still not enough.
This is the reason for the initial concept of SETI@Home. There is so
much information that the supercomputer evaluating the signal in real
time cannot possibly process the excess data. This immense data stream
is a direct result of SERENDIP IV operating constantly, using a
technique called “piggyback” (R1, slide 4). This means that no matter
what type of scientific research is performed at Arecibo, the SEREBDIP
IV spectrum analyzer is receiving information with no effect to the
current research project in progress. The result is 35 gigabytes of data
every day. To put this size into perspective, I have my entire CD
collection totaling 850 titles in MP3 format on my computer. For over
6,000 songs, only 26 gigabytes is used. Just like I cannot possible
listen to every song in one day, there is no way for the computers at
SERENDIP to evaluate all of this data.
Back to Top |
Back to Astrobiology
How does the program work?
Because so much more processing power is
required, there had to be an efficient yet cost effective way to create
a virtual supercomputer. It was suggested that a screen saver program be
designed that would analyze this data and anyone who wanted to download
this program could do so for free. This idea was a huge success. Within
three months, there were 1,000,000 users worldwide (R1, slide 10). As of
today, the number exceeds four million (http://setiathome.ssl.berkeley.edu/totals.html).
Interested in signing up? The first thing
that must be done is to download the program. The download is available
here:
http://setiathome.ssl.berkeley.edu/download.html. During
installation, you will be asked to create an account if you wish. Unlike
most free software on the Internet, SETI@Home is not one to send you
junk e-mail, so feel free to provide your data. The good news is that
you will be given credit for locating a signal, so go ahead and give
your real name. Once completed with the installation, the program is
pre-configured to run as a screen saver after 10 minutes of idle time.
Since computers today are very fast, I
suggest changing the program settings to analyze data all the time.
Unless you edit video on your PC, you will not notice a dramatic
decrease in performance.
Once installed, the program is ready to
analyze. There are five steps to the entire SETI@Home process: (http://setiathome.ssl.berkeley.edu/process_page).
Back to Top |
Back to Astrobiology
Step One:
Data Collection
The SERENDIP computers record data from
the Arecibo Observatory on digital laser tapes (these are like really
big cassettes, but store digital or binary information). The tapes are
sent through the mail to the SETI@Home center at Berkeley University.
The tapes are analyzed quickly for recording problems and gaps and other
errors, and those errors are removed. The remaining data is broken into
numerous 348 kilobytes units called work units. These work units are
sent to the SETI@Home client software for analysis. It is important to
know that one work unit can be sent to several computers to be analyzed.
This is important for step three, which will be discussed later.
Back to Top |
Back to Astrobiology
Step Two:
Finding Candidate Signals
Before embarking on a guided tour of the
SETI@Home program, it is important to help define some of the key words
used in this section. Most of the terms here are used primarily in radio
astronomy, so even the most adept amateur astronomer may not know these
terms.
·
Fast Fourier
Transform, or FFT – this is a mathematical algorithm that translates
signals based on time to signals based on frequency.
·
Baseline
Smoothing – for the SETI@Home screen saver to delve deeply into a single
signal, the broadband signal received by SERENDIP needs to be weaned
down to a narrow band. This is the first step of Baseline Smoothing. The
second step is the removal of any obvious noise, and ensures each
frequency is the same level in volume (like the volume knob on your
stereo).
·
Chirping –
this is the added Doppler effect of a signal from the rotating Earth.
·
De-Chirping
– the removal of the rotation effects of the Earth of a Doppler shifted
signal.
·
Doppler
Shift – the act of a spectrum being shifted towards the lower
frequencies if a signal is moving away from us, or the spectrum shifted
to higher frequencies if a signal is moving toward us. Think of the
noise an automobile makes as it speeds past your ear.
·
Gaussian –
the effect of a signal traveling through the beam of a radio telescope,
gradually increasing as it enters the beam to gradually decreasing as it
leaves the beam. Celestial objects take about 12 seconds to travel the
duration of the Arecibo dish.
·
Gaussian fit
– the length of time it takes a signal to enter and leave the telescope
beam.
·
Gaussian
power – the strength of the Gaussian signal as it enters and leaves the
telescope beam.
·
Pulses – an
oscillating signal at a particular duration
·
Triplet –
three equal spaced pulses
·
Radio
Frequency Interference, or RFI – interference from the Earth or from a
source near Earth.
Now that we have identified some key
words, let’s tour the program!
This is the SETI@Home screen saver
program.
It can be divided into the following
sections: Data Analysis, Data Info, User Info, and the pretty spectrum
on the bottom.
First of all, let’s examine the pretty
spectrum at the bottom of this screen:
This actually serves no scientific
purpose. What it does show is a graphical representation of the Fast
Fourier Transform, or FFT, currently in progress. It also demonstrates
how the signal strength is over time. We’ll discuss the FFT function
under the Data Analysis header.
This box is the Data Info box:
This gives demographic information of the
current work unit. This shows the exact location in the sky using the
Right Ascension and Declination coordinate system. It also shows the
date at which this signal was recorded, and the source of the signal;
usually the Arecibo Radio Observatory. A radio telescope is tuned to a
particular frequency to listen; in this case, the base frequency being
1.420859375 GHz (1420.0859375MHz).
The User Info box, shown here,
gives the users total statistics since
using SETI@Home. If getting credit is important to you, be sure to give
your name and e-mail address when creating your account.
The most important box is the Data
Analysis. Everything happens in this portion of the SETI@Home screen.
The following information is fun to read, but remember you do not have
to remember any of the processes. The program does it for you. When an
account is created and the program is ready to receive its first work
unit, the work unit is downloaded with the progress shown here:
Once the data is downloaded, the program
performs a Baseline Smooth:
What this does is eliminate any broadband
interference and normalizes the level of each signal. Sometimes when
data is collected, the level of intensity can vary. The Baseline
Smoothing changes the level so they are all the same intensity. This
allows the FFT’s to perform their work equally on each signal. This also
helps eliminate any ambient noise picked up by the interstellar Hydrogen
(http://www.computer.org/cise/articles/seti.htm).
Chirping is a method of removing any
problems associated with the Doppler shift.
This is very important since the Earth
rotates on axis, and revolves around the Sun. In addition, the source
location – if it were a planet – is also rotating on its axis and around
its star. This can add additional Doppler shift to an already shifted
signal. To understand the effects of a Doppler shift, stand near a
freeway and listen to the automobiles speed past. As an auto approaches,
the sound is slightly higher in pitch then the normal sound when the
auto is next to you. The pitch drops as the auto speeds past. This is an
example of a Doppler shift. In the case of SETI, the sound is the signal
sent by some intelligence. This “chirped” signal changes over time. The
signals are “de-chirped” using FFT’s through trial and error – points
between plus and minus 50Hz – in an effort to smooth out, or normalize
the signal.
Once the data has been “de-chirped,” each
frequency resolution is processed by the FFT mathematics process,
converting the signal time to a signal
frequency. Notice the Doppler shift rate and the Resolution portions of
the Data Analysis window. The Doppler drift rate is the current Doppler
shift for the actual signal. The resolution is the current frequency
undergoing the FFT. Each FFT scans for frequencies, between 0.075 to
1,221 Hz and is looking for Gaussians, pulses, and triplets.
While the program is performing the FFT’s,
Gaussian changes are processed.
As the Earth rotates the Arecibo dish
across the sky, a signal creeps across the dish’s beam. A signal appears
on one side of the dish as a faint signal. As the signal reaches the
center, the level of the signal is increased only to begin decreasing
once again as the signal leaves the beam on the opposite side. The time
it takes the signal to pass through the beam is a Gaussian fit. The
intensity of the signal is called the Gaussian power. For a signal to be
marked as a candidate signal, the Gaussian fit must be a small number
(12 seconds or less), and the Gaussian power must be high (3.5 times the
normal background noise). The data analysis window will display the
current best Gaussian fit and Gaussian power. The significance of 12
seconds is simple: this is how long it takes for a star to pass through
the beam of the Arecibo dish. Anything longer than 12 seconds is
probably man made, or something very close to Earth.
Other signal properties are also analyzed.
The signaling civilization may send a radio signal that is pulsed in
nature.
The program performs what is called a Fast
Folding Algorithm to look for weak, repeating pulses. Interference of
some localized variety can result in an artificial pulse so a limit has
been set. A pulse score greater than 1 is flagged.
A triplet is three equally spaced pulses.
If the center pulse is found to be equidistant from the other two, the
results are flagged.
The image above is of chirping data, but
notice the Best Pulse below the Doppler drift rate. With a score of
1.11, this work unit will be flagged when it is sent back to the
computers at Berkeley.
When one work unit is processed, Doppler
effects removed, Gaussians searched for on frequencies between 0.7Hz to
1200Hz, there are about 175,000,000,000 mathematical operations (R1,
slide 14).
Signals are found on a routine basis while
running the program. An example is the peak Pulse value found on the
images above. There are many sources of signals processed by the
program, and they are mostly terrestrial in origin. Other objects in the
Universe can also be responsible for a Gaussian signal or pulse, an
example of which is a pulsar. This rapidly spinning neutron star
can have a pulsed signal (R1, slide 39). Regardless of the source of a
signal or a pulse, all results must go through the verification process.
Back to Top |
Back to Astrobiology
Step Three:
Testing Data Integrity
A single work unit may be processed by
several different client computers. This is a very important tool for
data verification. If a signal is found by one client, that work unit is
compared to the results of the same work unit from other clients. If the
signal is present in the results posted by all of the clients, then the
signal is marked for stage 4. Because the properties of each client
machine are different, there is some leeway given to the comparison. The
varying properties can be processing errors or different versions of the
client software. Either way, a signal is verified if the signal
properties match 70% of each other.
Additionally, the Arecibo Radio
Observatory scans over a particular area of sky two or three times. All
of these results are compared to the verified work units. This helps
rule out any equipment malfunction that might contribute to a false
signal.
If a signal does not match the other
results by other clients, then the signal is not verified; that
particular signal does not make it to the next level. However, because
of Radio Frequency Interference, there are a large number of work units
that make it to stage 4.
Back to Top |
Back to Astrobiology
Step Four:
Removing Radio Interference
The verified signals are passed though
this fourth phase. Radio Frequency Interference, or RFI, is a reality
when dealing with radio astronomy or SETI. There are two common types of
interference: the “always on” interference as a result of the system
hardware or software, and the short period interference. The short
period interference can be a host of things from microwave ovens, a car
starting, cell phone transmission, or even satellites. Luckily, both of
these types of interference can be removed. The “always on” interference
occurs at only 5 frequencies (1418.75, 1419.00, 1420.00, 1421.00 and
1421.25) and can therefore easily be removed (SETI@Home:
http://setiathome.ssl.berkeley.edu/process_page/removing_rfi.html).
Notice the 1420.00MHz frequency. It is
the same frequency as molecular Hydrogen, the most abundant element in
the Universe. Short period interference is removed by comparing the
multiple incidences of the same work unit. The Gaussian and pulse signal
may be identical, but ambient noise in one version of the work unit may
mask the results as compared to another version of the work unit. The
more samples of the same work unit, the better.
Back to Top |
Back to Astrobiology
Step Five:
Identifying final signal
candidates
Once a candidate signal passes the data
integrity and radio interference removal, the signals are re-verified.
Further examination is given to rule out Earth bound RFI. The signal
undergoes a Persistency Check, which means a Gaussian, pulse or triplet
must be consistent in location and frequency across time. This purpose
is to further eliminate and additional RFI. Once RFI has been ruled out
and a signal verified, a SETI@Home team will need to re-observe the
candidate signals. A team from Berkeley University will travel to
Arecibo to examine each of the signals location. If the signal is
verified by this observation, the location of the signal is relayed to
other observatories for continued verification.
By having other observatories review
signal verification, and system errors or localized interference is
ruled out. Additionally, more scientific weight is granted to the
verified signal if two or more alternate observatories are able to
duplicate the results.
Back to Top |
Back to Astrobiology
A signal was or was not
verified. What happens next?
Regardless of what signal is detected on
your computer, it is very important to not get excited. For a signal to
be a verified signal from an extra-terrestrial intelligence, it must
pass through all five steps of the data collection, data analysis and
verification processes. SETI@Home has made a declaration of their policy
on releasing candidate signal information. It is available here:
http://setiathome.ssl.berkeley.edu/declaration.html. It states
specifically that no candidate signal will be released as a signal from
an extra-terrestrial intelligence unless is has passed strict
verification processes. This five step method described above meets this
requirement.
Once the signal is verified, every
computer running the SETI@Home software responsible for processing that
unit will receive an official telegram for the purpose of notification
(R1, slide 40). Those individuals will also receive credit by having
their names attached to the discovery.
Processing work units does not guarantee a
verified result, but processing work units is very important. One of the
many sponsors of the SETI@Home project is the Planetary Society. They
have made arrangement so that anyone who runs the SETI@Home software can
print out a certificate. You can get yours here:
http://www.planetary.org/html/UPDATES/seti/seti_certificate_instructions.html.
The SETI@Home website continues to update
information on a regular basis. To view the current overall status,
check here:
http://setiathome.ssl.berkeley.edu/process_page/
For the current map of signal candidates,
check here:
http://setiathome.ssl.berkeley.edu/candidates.html
Back to Top |
Back to Astrobiology
Summary:
For decades now, we have been pointing our
radio dishes into space hoping to detect a signal from an
extra-terrestrial intelligence. Since Frank Drake demonstrated that such
a search can be performed, we have collected an enormous amount of data
to analyze. It may seem like the search is in vain, but that is far from
the truth. Even with such a large network of computers running the SETI@Home
software analyzing signal data collected at Arecibo, there is still
plenty of data left to examine. Most off all, work units still need to
be analyzed more than once to help with the verification process and
areas of sky left to scan more than once to also help with the
verification process. There is still much work to be done. By creating
this world-wide virtual supercomputer, SETI@Home has demonstrated that
such a network can exist. Over four million people are willing to be a
part of science, and that speaks volumes about human nature. And most
importantly, one work unit can help make a difference in the
verification or detection of that one special signal: the one that could
be the first interstellar phone call. Even if a signal never makes it
through the entire process, a world-wide network or computers like this
one could be used to perform analysis for some other type of problem
such as other SETI experiments or even medical research, like the
Folding@Home project sponsored by Stanford University. So let’s keep
those computers running!
Back to Top |
Back to Astrobiology
References:
Computer Society:
http://www.computer.org/cise/articles/seti.htm
Folding@Home, Stanford
University: http://www.stanford.edu/group/pandegroup/folding
Freedman, Roger A.
Universe: 6th Edition. W.H. Freeman and Company, 2002
The Nature of the
Universe:
http://instruct1.cit.cornell.edu/courses/astro101/lec08.htm SETI@Home:
http://setiathome.ssl.berkeley.edu/index.html
SERENDIP:
http://seti.berkeley.edu/serendip/oldindex.html
Shostak, Seth. Sharing
the Universe: Perspectives on Extraterrestrial Life. Berkeley Hills
Books. Berkeley California, 1998.
Swinburne Centre for Astrophysics and
Supercomputing:
http://supercomputing.swin.edu.au
[R1] Swinburne University
of Technology. “Let’s Get Technical” HET608, Module 17, Activity 2.
SAO, 2003.
Back to Top |
Back to Astrobiology |