Published on (
 See this if you're having trouble printing code examples

What if SETI@home Gets Lucky?

by Brian McConnell, author of Beyond Contact

From March 18 to March 20, SETI@home researchers will be conducting follow-up observations of the 200 most promising signals detected by the SETI@home project (out of more than 5 billion radio sources identified). Since its launch in 1999, the SETI@home system has enlisted millions of computers worldwide to analyze data collected by the Arecibo telescope in the search for transmissions from extraterrestrial civilizations.

University of California at Berkeley scientists Dan Werthimer, David Anderson, and Eric Korpela led the effort to narrow more than 5 billion hits to a list of the 200 most promising candidate signals.

Phase II of the SETI@home project will focus on those candidates deemed most likely to be bona fide extraterrestrial transmissions. With billions of candidates to choose from, the team narrowed its search to focus on signals that met the following criteria:

O'Reilly Emerging Technology Conference.

The Planetary Society has published a more detailed description of the process used to identify the most promising candidates.

With three days of telescope time, the SETI@home team hopes to re-observe at least 100-150 of the best candidates in detail. The team leaders estimate that the odds of confirming a signal are low, perhaps 1 in 10,000 according to project leader Dan Werthimer; or less than 1 percent according to SETI@home developer David Anderson. The team expects the vast majority of the candidate signals to be the result of man-made interference.

A long shot, to be sure, but this is the first time SETI researchers have conducted a targeted search to evaluate known sources with a real chance of success. Phase I of the search was not targeted. The SETI@home project simply observed whatever the Arecibo telescope was looking at as the Earth rotated. This allowed the project to observe a large percentage of the northern sky several times over four years. Interesting signals were scored and cataloged for future analysis and re-observation.

Phase II is the first targeted SETI search following up on known signals that met the criteria above. Even if this search fails, it marks an important step forward in the search for evidence of extraterrestrial civilizations. SETI researchers are no longer blindly looking at randomly chosen targets in the hope of picking up a signal, but are now methodically working through many candidate signals that may not pass the final tests required to confirm an ET signal. If the search does fail, it is not a definitive failure, as 25 million candidates were discarded for each one that will be re-observed in this short run. Only when facilities dedicated to SETI come online, such as the Allen Telescope Array, will researchers be able to examine a significant number of these candidate signals.

Confirming an ET Signal

What will happen if the SETI@home team discovers a promising candidate this week? The team will be recording data for detailed offline analysis in the weeks following the re-observations. SERENDIP IV, another Berkeley-based SETI program, will provide real-time results as the team conducts its observations. The SERENDIP system will guide the team in its observations, and will provide an early indication if the team is on to something interesting. A signal may not be visible to SERENDIP as the SETI@home system can conduct a more sensitive search for certain types of signals. If a signal is picked up by SERENDIP, the confirmation may come quickly. If not, we will have to wait several weeks for the offline analysis to be completed.

If they detect a signal that matches prior observations, the next step will be to train other SETI-capable telescopes on the target. This step is necessary to provide third-party confirmation of the Arecibo observations, though the SETI@home/SERENDIP re-observation will be strong evidence since the candidate signal would have been observed at several different times. Because the media is watching this experiment closely, it is unlikely this step would remain a secret for very long. If the signal passes this test, we will know fairly soon that a radio signal from another solar system has been detected.

Although unofficial word will spread quickly, official confirmation of a signal will take time. There will be strenuous debate about the veracity of the claim, and about other explanations for the signal's origin. It is possible that instead of an ET origin, they will have discovered a rare and previously unknown astronomical phenomenon. It is interesting to recall that when the first pulsar was detected, scientists initially speculated that it was an extraterrestrial beacon because of the precise timing of the signal pulses. They later determined that the source was a neutron star whose rotation was slowing ever so gradually (this deceleration was the giveaway that it was a natural radiation source instead of an extraterrestrial beacon).

How quickly this debate is settled will depend on the type of signal intercepted. If the signal is obviously artificial, we will know fairly quickly. Such a signal would do something that clearly indicates an intelligent origin (for example, by cycling through a series of special numbers such as primes, or numbers with integer square roots). If, on the other hand, the signal appears to contain no such information, we will know nothing about its purpose or origin, except that it appears to be generated by technology instead of a natural process. In this scenario, it will take longer to rule out explanations besides extraterrestrial communication.

What Happens If SETI Confirms an Alien Signal?

Merely confirming the existence of such a signal would have profound implications, some good, some quite ominous.

Most people assume that if SETI succeeds in detecting an ET signal that this will be a benign event. We'll finally know that we're not alone in the universe, science books will be rewritten, and so on. This view naively ignores human nature, specifically the ability of charlatans to con gullible people into doing stupid things. Rubes have never been in short supply and are currently more plentiful than ever. Successful detection could be a very good thing, if the contents of the signal are comprehensible. But what happens if we discover a signal, only to find that its contents are completely unrecognizable? All we would know is that there is an alien signal at 1.42 GHz, but what it says would be a mystery.

Fairly quickly an industry will form around the business of claiming to interpret the contents of this signal. Many people will see an alien civilization as a god-like entity, and will be easily conned by people who offer convincing explanations of what the signal says. Even today, in the absence of real evidence of alien contact, millions of people believe that aliens have contacted us, or are even living among us today.

What will happen if a real ET signal is added to this mix? Will the cults that form around it be benign groups of people with a shared interest in New Age mysticism, polygamy, and bad fashion? Or will some of them assume a more malevolent form? If history is any guide, it is a bad idea to bet on good intentions. (In my view, the worst-case scenario is that SETI succeeds in detecting a signal, and then fails in deciphering its contents.)

Detecting an ET signal will merely be the first step in an ongoing process. Once a signal is verified, the focus will shift to determining whether the signal is conveying data, and if so, what that data represents. The best way to insure that SETI's success is not hijacked by frauds is to determine the basic structure and contents of the signal as rapidly as possible.

This may not be possible depending on the type of transmission we intercept. Success will depend on whether the signal is an intentional attempt to communicate with other civilizations. If it is an intentional signal, it is possible that it will be coded so that it can be deciphered relatively easily (at least the basic parts of the message). If it is a randomly intercepted transmission, its purpose will probably remain a mystery. If "ET" were to intercept a digital cellular phone call, with no knowledge about our cellular networks or what a human voice sounds like, would they have any idea what they were looking at? Probably not.

If SETI succeeds, we should hope that they find an intentional transmission that is designed to be decoded relatively easily. Most people assume that meaningful communication between civilizations would be impossible, and that, even if it were, the long delays imposed by the speed of light would make interstellar communication impractical and pointless. This is not true. In principle, it is possible to construct messages based upon universal physical and mathematical concepts, and that can interact with their recipients, without requiring two-way communication between sites.

Between Worlds, an upcoming book from the SETI Institute and MIT Press, discusses the challenge of interstellar message composition at length. While this book focuses on how we might compose a message to send to other civilizations, it is also informative about what we should look for in any message we receive.

Demodulating an ET Signal

The first step following detection will be to examine the signal closely to determine if it is modulated (changes over time in a structured manner). For example, if the signal hops between two nearby frequencies, this would enable it to transmit a sequence of binary numbers (frequency A = 0, frequency B = 1). There are many ways to modulate a radio carrier to transmit analog and digital information, each with relative merits and disadvantages.

One way to embed data within a signal is to modify its strength over time (high power = 1, low power = 0). This is known as amplitude modulation (AM). It can be used to transmit both digital and analog information. AM is simple to read, but is not ideal for interstellar communication, where the carrier may be weak and vary in strength due to atmospheric effects and interference. An AM-encoded carrier will appear to vary in strength or cycle on and off, probably at regularly timed intervals.

Frequency modulation (FM) works by modifying a carrier's frequency while maintaining constant power output. FM can also be used to transmit both digital and analog information. When used to transmit digital information, this is known as frequency shift keying. In principle, FM is a simple system, but it too poses problems for interstellar communication because shifting frequencies makes the carrier itself harder to detect.

To compensate for weak signals and background noise, SETI searches filter radio signals into very narrow channels, usually 1 Hz or less per channel. A precisely tuned carrier will appear to be much stronger than background noise in such a narrow channel, even if the signal itself is weak. However, if the carrier jumps across many frequencies, its energy is no longer so concentrated by frequency, and it is more likely to be confused with background noise.

Phase modulation works by modifying the carrier's phase. A precisely tuned carrier is described mathematically by a sine wave. In phase modulation (or phase shift keying), the carrier jumps between a discrete number of states (in phase, or 180 degrees out of phase). The frequency and power output remain constant, only the phase of the sine wave varies over time.

This method is thought to be one of the more promising ways to embed information in an interstellar signal because it does not make the carrier itself harder to detect, even if information is transmitted at a rapid pace.

Polarity modulation is another method that may be used to transmit information. Radio transmissions, although they are invisible, are light-wave signals comprised of photons. Photons vibrate in a specific plane as they travel. A polarized receiver or filter will allow photons that vibrate in one plane to pass through while photons that vibrate in a perpendicular plane are blocked.

To see this for yourself, look at a flat-panel LCD monitor with polarized glasses. As you tilt your head, you'll notice that the light from the display is almost completely blocked at some angles. The polarity of a signal can also be used to represent a state or digit, and thus to convey information.

How much information could an ET signal transmit? A phase or polarity modulated carrier could transit information at a high rate without making the carrier itself much harder to detect in the first place. However, because of the techniques used to reject background noise in the detection process, this information will probably be invisible during the initial analysis. Information transmitted very slowly, at a few bits per second or less, may be visible. Information transmitted at a faster pace will be blurred out by the algorithms used to reject background noise.

Hopefully the sender of such a signal will structure it so that some information is transmitted at a very slow pace using an obvious modulation scheme, and thus is visible immediately. This low-speed channel would serve primarily to notify any recipient that the signal contains additional information. It is important to note that all of this assumes the signal is designed to be easy to decode. If it is not, we will probably not detect it in the first place, or if we do, we would have no idea how to extract data from it.

This work will be conducted by astronomers in collaboration with telecommunications experts, and will be done in several phases. In the first phase, scientists will use existing equipment to look for obvious signs of low bit-rate AM and FM modulation. This work would not take much time to complete and may lead to early results. In the second phase, the detection equipment would be modified to look for other modulation schemes (phase modulation, polarity modulation, etc.). In a third, longer-term phase, new telescopes may be constructed to provide enough gain to amplify the signal to detect high-speed modulation or lower power side channels that may be used to transmit larger amounts of data.

Parsing the Message

If we have succeeded in cracking the signal's modulation schemes, we can then start collecting data from the signal. This data will most likely be in the form of a long series of apparently meaningless digits. The process of parsing and deciphering this data can best be compared to deciphering genetic information.

Before we attempt to decipher the message, we would first develop a system for archiving and tagging data as it is collected. By this point in time, telescopes worldwide will be trained on the signal. Each block of data would be tagged to indicate precisely when it was collected, its source carrier, and the facility that captured it. All of this information would be stored in a central database. Think of each block of data like a snippet of sequenced DNA, information that can later be stitched together to obtain the entire message.

As this work is being done, programs will continuously troll through the central database to compare blocks of data collected by different facilities. After correcting for discrepancies in the signal's arrival time at different sites, this program compares each series of digits to look for other discrepancies. For example, if two blocks of data collected simultaneously agree perfectly, the program can assign a high confidence score to that particular series of digits. If, on the other hand, there is a discrepancy between the blocks collected at two different telescopes, the program may assume that some or all of that data is corrupted. This analysis will provide a rough measure of confidence in transcription accuracy for various parts of the message.

In parallel with this analysis, a different program will analyze the data to determine how its entropy (randomness) varies over time. This analysis will reveal the segments of the message that are less random (more ordered) than others. This will be an important clue in discovering the location of highly ordered information that may serve as a primer in comprehending the bulk of the message.

This program counts the number of times that a particular series of digits occurs within the message, and then compares this frequency to the likelihood that the same series would occur through random chance. If the data is transmitted in a compressed format, it will appear to be nearly random, whereas highly ordered data will contain frequently recurring patterns (for example, long strings of zeroes, repeating strings, etc.). This analysis will not reveal anything about the meaning of the message, but it will tell us where to look for clues about how to read it.

These processes lend themselves nicely to distributed computing. Each participating computer would download a fairly small block of data, and would then analyze the data to search for repeating patterns, and to measure its entropy as a function of time. When this information is stitched together for the entire dataset, the result is a graph that displays the message's degree of order over time. Segments that are highly ordered (less random) compared to the bulk of the message will merit closer inspection in subsequent analysis. This analysis will also provide clues about the overall structure of the message.

Related Reading

Beyond Contact
A Guide to SETI and Communicating with Alien Civilizations
By Brian McConnell

This phase would proceed fairly quickly, assuming that useable data can be extracted from the signal in the first place.

Looking for a Primer

If the previous steps reveal highly ordered information, we can begin the search for instructions that may explain how to parse the bulk of the message. At this point, we will still know nothing about the contents of the message. We will not know how information is represented within the message. Is it a series of mathematical equations, similar to Hans Freudenthal's Lincos? Is it a series of bitmapped images, similar to the Arecibo message or Yvan Dutil and Stephane Duma's Cosmic Call message? Is it a set of algorithms? Any of these scenarios and others not yet thought of are possible. All we will know is that certain segments of the message appear to be less random than others.

In this stage of analysis, we would try different ways of visualizing the data from the segments that contain highly ordered information. This step also lends itself to distributed computing, except with the assistance of humans. The human eye is adept at detecting patterns. One method to detect images within the dataset will be to cycle through different ways of translating the data into two-dimensional or three-dimensional bitmaps. Volunteers would download a program that prompts the user to score an image based on how random it appears to be. If the user sees something like the static on a television set, he assigns a low score. If the user sees an orderly image, or fragments of an orderly image, he assigns a high score. With a sufficiently large user base, it will be possible to cycle through several billion permutations per day.

This analysis will allow us to determine if there are images within the dataset, and if so, what resolutions are used. While this work could be done by trained analysts, most of this can be done by untrained viewers simply asked to score candidate images. The best candidates are handed off to trained analysts for further inspection.

Of course, the message may not contain images at all. It may be rooted entirely in mathematics, or it may be written in an algorithmic form, or something else entirely. In this case, its visual representation will be meaningless, and the results of the analysis described above will be inconclusive.

In parallel with the visual analysis, we should also be inspecting the message for repeating series that may describe basic mathematical functions. If the primer were describing a math or programming language that is used throughout the message, its purpose would be to build up a vocabulary of basic symbols. If we were building such a primer ourselves, we would use expressions such as the following:

...therefore ? means addition (+).

The basic idea is to send a series of expressions that contain the unknown symbol, and thus to prompt the reader to infer its correct meaning. This technique makes it easy to build a vocabulary of basic math and logic symbols that can be used to create a rudimentary programming language.

This type of analysis cannot be automated so easily. Spotting expressions like this, either in numeric or image form, will require manual inspection. Software developers will be especially useful in this phase of the project, partly because they will be familiar with these types of expressions, and also because they are adept at building tools to manage and visualize data in various ways. With enough people looking at the data through different lenses, the chances are increased that someone will crack the basic structure and symbolic vocabulary of the message.

Of course, the message may be organized in a manner completely different from what we expect. It is important that the data from the message, as well as tools for parsing and visualizing this information, be made widely available. The greater the number of people who have access to this information, the greater the odds that someone will spot something useful. Fortunately, the Internet will make it easy to distribute this information to anyone who wants to participate in this process.

Limits of Communication

Most people assume that meaningful communication via such a message will be impossible, if not because of presumed differences in intellect, then because of the vast delays imposed by the speed of light. This turns out not to be true if the sender has designed the message to be comprehensible or to interact with its recipient.

While it is impossible to say what such a message might say, we can make some educated guesses about the manner in which a message might be organized, and about the techniques that may be used to convey information.

The SETI literature to date anticipates three basic types communication via an inter-stellar message: images, mathematical languages, and algorithmic communication. Each format has its strengths, and can be used alone or in conjunction with other methods.

Even low-resolution images can communicate a great deal. Early attempts at interstellar communication, such as the Arecibo message and Pioneer spacecraft plaques, use images to describe our appearance, location in space, and chemical composition. It is easy to transmit a black-and-white image (1 bit pixel depth) within such a message. The process of decoding the message is fairly easy, as the recipient merely needs to guess the horizontal and vertical dimensions of the image. Grayscale or multi-channel (color) images will be more difficult to decipher, but not impossible. Because images can be used to describe so many things, we may expect some type of imagery to play an important role in a message.

Messages crafted in mathematical languages, similar to Hans Freudenthal's Lincos (Lingua Cosmica), are another possibility. A mathematical language can be used to describe physical processes, sets and categories, and can also form the basis of a general-purpose symbolic language. For example, the sender may describe an extensive vocabulary of numeric symbols, and then use a mathematical language to describe how each symbol relates to the others. This type of semantic network is a potent tool in describing how ideas relate to each other. By coupling a mathematical language with images, the sender would be able to associate a numeric symbol with an image, and then describe how that symbol relates to others.

One of the most interesting possibilities to contemplate is an algorithmic message. This type of message would consist of a series of computer programs, as well as data to be processed by them. These programs are all derived from a small set of basic math and logic symbols. The recipient would run these programs on a virtual machine whose operations correspond to the basic symbols (just as a Windows program is ultimately reduced to the basic instruction set recognized by Intel class CPUs).

By communicating via algorithms, the sender can overcome the most basic limitation in inter-stellar communication, the speed of light. Although the message itself cannot travel faster than light, algorithms will make it possible to localize most of the communication. Instead of sending a response back to the sender and waiting decades for a response, the recipient may simply run one of these programs on a nearby machine, probe its behavior, and learn a great deal in the process. If this sounds far-fetched, stop for a moment to consider how extensively we use computers in our own communication.

These programs would not need to be especially sophisticated to allow in-depth communication. Imagine, for example, the sender wanted to describe the concept of evolution. Instead of attempting to describe the idea via diagrams or equations, the sender might send a program similar to the Tierra a-life simulation, and then refer to the output of this program elsewhere in the message. The recipient would be able to run this program and watch how it behaves. Although this program might be only a few kilobytes in size, it would say a great deal about the process of natural selection and evolution. Simulations will be powerful tools for describing how systems work, whether they are natural systems, such as a planet's climate, or a biological system, such as an ant colony.

Programs can, in turn, assist the recipient in parsing and comprehending the message itself. Some programs might, for example, extract compressed data throughout the message that has been encoded to provide very robust error correction. Other programs might look for the equivalent of metatags throughout the message, and then use this information to answer simple queries about how one symbol relates to others. Other programs might assist the recipient in visualizing data within the message, the equivalent of a JPEG viewer for example. Even if the programs are comparable to present day software in their sophistication, they would allow for many types of introspection and two-way interaction.

It is also possible that some programs contained within a message might be quite sophisticated when run on a fast enough computing grid. While one can only speculate about the capabilities of such programs, it is reasonable to assume that a program designed to run on a very fast computing grid, and authored by an advanced civilization, is probably going to be more sophisticated than anything we have built to date. If this is the case, the message itself may be more than evidence of intelligence, but may also contain elements that are themselves life-like and in some respects intelligent.

If SETI does succeed in detecting a signal from another civilization, tomorrow or in the distant future, software developers will play an important role in decoding any message that signal conveys, both in analyzing the message and in building tools that others can use to do so. If a signal is detected, its data will be available to anyone with a computer, and will tempt people worldwide to decipher its contents. It is hard to imagine a more interesting challenge. Indeed, it is possible that the computing industry would organize itself around this challenge, both because of the curiosity that drives many people who work in technology, and because of the public's demand for information. Who wouldn't want to download a program that displays images from an extraterrestrial civilization?

Don't get your hopes up, though. With so little telescope time, the SETI@home team can only look at one in 25 million of the candidate signals on this run. Maybe the team will get lucky this time. Most likely it will have to keep trying. Win or lose, it is a credit to the ingenuity of the people behind the SETI@home project and a sign of things to come.

Brian McConnell is an inventor, author, and serial telecom entrepreneur. He has founded three telecom startups since moving to California. The most recent, Open Communication Systems, designs cutting-edge telecom applications based on open standards telephony technology.

Return to

Copyright © 2009 O'Reilly Media, Inc.