oreilly.comSafari Books Online.Conferences.
Articles Radar Books  

What if SETI Gets Lucky?
Pages: 1, 2

Parsing the Message

If we have succeeded in cracking the signal's modulation schemes, we can then start collecting data from the signal. This data will most likely be in the form of a long series of apparently meaningless digits. The process of parsing and deciphering this data can best be compared to deciphering genetic information.



Before we attempt to decipher the message, we would first develop a system for archiving and tagging data as it is collected. By this point in time, telescopes worldwide will be trained on the signal. Each block of data would be tagged to indicate precisely when it was collected, its source carrier, and the facility that captured it. All of this information would be stored in a central database. Think of each block of data like a snippet of sequenced DNA, information that can later be stitched together to obtain the entire message.

As this work is being done, programs will continuously troll through the central database to compare blocks of data collected by different facilities. After correcting for discrepancies in the signal's arrival time at different sites, this program compares each series of digits to look for other discrepancies. For example, if two blocks of data collected simultaneously agree perfectly, the program can assign a high confidence score to that particular series of digits. If, on the other hand, there is a discrepancy between the blocks collected at two different telescopes, the program may assume that some or all of that data is corrupted. This analysis will provide a rough measure of confidence in transcription accuracy for various parts of the message.

In parallel with this analysis, a different program will analyze the data to determine how its entropy (randomness) varies over time. This analysis will reveal the segments of the message that are less random (more ordered) than others. This will be an important clue in discovering the location of highly ordered information that may serve as a primer in comprehending the bulk of the message.

This program counts the number of times that a particular series of digits occurs within the message, and then compares this frequency to the likelihood that the same series would occur through random chance. If the data is transmitted in a compressed format, it will appear to be nearly random, whereas highly ordered data will contain frequently recurring patterns (for example, long strings of zeroes, repeating strings, etc.). This analysis will not reveal anything about the meaning of the message, but it will tell us where to look for clues about how to read it.

These processes lend themselves nicely to distributed computing. Each participating computer would download a fairly small block of data, and would then analyze the data to search for repeating patterns, and to measure its entropy as a function of time. When this information is stitched together for the entire dataset, the result is a graph that displays the message's degree of order over time. Segments that are highly ordered (less random) compared to the bulk of the message will merit closer inspection in subsequent analysis. This analysis will also provide clues about the overall structure of the message.

Related Reading

Beyond Contact
A Guide to SETI and Communicating with Alien Civilizations
By Brian McConnell

This phase would proceed fairly quickly, assuming that useable data can be extracted from the signal in the first place.

Looking for a Primer

If the previous steps reveal highly ordered information, we can begin the search for instructions that may explain how to parse the bulk of the message. At this point, we will still know nothing about the contents of the message. We will not know how information is represented within the message. Is it a series of mathematical equations, similar to Hans Freudenthal's Lincos? Is it a series of bitmapped images, similar to the Arecibo message or Yvan Dutil and Stephane Duma's Cosmic Call message? Is it a set of algorithms? Any of these scenarios and others not yet thought of are possible. All we will know is that certain segments of the message appear to be less random than others.

In this stage of analysis, we would try different ways of visualizing the data from the segments that contain highly ordered information. This step also lends itself to distributed computing, except with the assistance of humans. The human eye is adept at detecting patterns. One method to detect images within the dataset will be to cycle through different ways of translating the data into two-dimensional or three-dimensional bitmaps. Volunteers would download a program that prompts the user to score an image based on how random it appears to be. If the user sees something like the static on a television set, he assigns a low score. If the user sees an orderly image, or fragments of an orderly image, he assigns a high score. With a sufficiently large user base, it will be possible to cycle through several billion permutations per day.

This analysis will allow us to determine if there are images within the dataset, and if so, what resolutions are used. While this work could be done by trained analysts, most of this can be done by untrained viewers simply asked to score candidate images. The best candidates are handed off to trained analysts for further inspection.

Of course, the message may not contain images at all. It may be rooted entirely in mathematics, or it may be written in an algorithmic form, or something else entirely. In this case, its visual representation will be meaningless, and the results of the analysis described above will be inconclusive.

In parallel with the visual analysis, we should also be inspecting the message for repeating series that may describe basic mathematical functions. If the primer were describing a math or programming language that is used throughout the message, its purpose would be to build up a vocabulary of basic symbols. If we were building such a primer ourselves, we would use expressions such as the following:

  • 1?1=2
  • 1?2=3
  • 2?1=3
  • 2?2=4
  • 2?3=5

...therefore ? means addition (+).

The basic idea is to send a series of expressions that contain the unknown symbol, and thus to prompt the reader to infer its correct meaning. This technique makes it easy to build a vocabulary of basic math and logic symbols that can be used to create a rudimentary programming language.

This type of analysis cannot be automated so easily. Spotting expressions like this, either in numeric or image form, will require manual inspection. Software developers will be especially useful in this phase of the project, partly because they will be familiar with these types of expressions, and also because they are adept at building tools to manage and visualize data in various ways. With enough people looking at the data through different lenses, the chances are increased that someone will crack the basic structure and symbolic vocabulary of the message.

Of course, the message may be organized in a manner completely different from what we expect. It is important that the data from the message, as well as tools for parsing and visualizing this information, be made widely available. The greater the number of people who have access to this information, the greater the odds that someone will spot something useful. Fortunately, the Internet will make it easy to distribute this information to anyone who wants to participate in this process.

Limits of Communication

Most people assume that meaningful communication via such a message will be impossible, if not because of presumed differences in intellect, then because of the vast delays imposed by the speed of light. This turns out not to be true if the sender has designed the message to be comprehensible or to interact with its recipient.

While it is impossible to say what such a message might say, we can make some educated guesses about the manner in which a message might be organized, and about the techniques that may be used to convey information.

The SETI literature to date anticipates three basic types communication via an inter-stellar message: images, mathematical languages, and algorithmic communication. Each format has its strengths, and can be used alone or in conjunction with other methods.

Even low-resolution images can communicate a great deal. Early attempts at interstellar communication, such as the Arecibo message and Pioneer spacecraft plaques, use images to describe our appearance, location in space, and chemical composition. It is easy to transmit a black-and-white image (1 bit pixel depth) within such a message. The process of decoding the message is fairly easy, as the recipient merely needs to guess the horizontal and vertical dimensions of the image. Grayscale or multi-channel (color) images will be more difficult to decipher, but not impossible. Because images can be used to describe so many things, we may expect some type of imagery to play an important role in a message.

Messages crafted in mathematical languages, similar to Hans Freudenthal's Lincos (Lingua Cosmica), are another possibility. A mathematical language can be used to describe physical processes, sets and categories, and can also form the basis of a general-purpose symbolic language. For example, the sender may describe an extensive vocabulary of numeric symbols, and then use a mathematical language to describe how each symbol relates to the others. This type of semantic network is a potent tool in describing how ideas relate to each other. By coupling a mathematical language with images, the sender would be able to associate a numeric symbol with an image, and then describe how that symbol relates to others.

One of the most interesting possibilities to contemplate is an algorithmic message. This type of message would consist of a series of computer programs, as well as data to be processed by them. These programs are all derived from a small set of basic math and logic symbols. The recipient would run these programs on a virtual machine whose operations correspond to the basic symbols (just as a Windows program is ultimately reduced to the basic instruction set recognized by Intel class CPUs).

By communicating via algorithms, the sender can overcome the most basic limitation in inter-stellar communication, the speed of light. Although the message itself cannot travel faster than light, algorithms will make it possible to localize most of the communication. Instead of sending a response back to the sender and waiting decades for a response, the recipient may simply run one of these programs on a nearby machine, probe its behavior, and learn a great deal in the process. If this sounds far-fetched, stop for a moment to consider how extensively we use computers in our own communication.

These programs would not need to be especially sophisticated to allow in-depth communication. Imagine, for example, the sender wanted to describe the concept of evolution. Instead of attempting to describe the idea via diagrams or equations, the sender might send a program similar to the Tierra a-life simulation, and then refer to the output of this program elsewhere in the message. The recipient would be able to run this program and watch how it behaves. Although this program might be only a few kilobytes in size, it would say a great deal about the process of natural selection and evolution. Simulations will be powerful tools for describing how systems work, whether they are natural systems, such as a planet's climate, or a biological system, such as an ant colony.

Programs can, in turn, assist the recipient in parsing and comprehending the message itself. Some programs might, for example, extract compressed data throughout the message that has been encoded to provide very robust error correction. Other programs might look for the equivalent of metatags throughout the message, and then use this information to answer simple queries about how one symbol relates to others. Other programs might assist the recipient in visualizing data within the message, the equivalent of a JPEG viewer for example. Even if the programs are comparable to present day software in their sophistication, they would allow for many types of introspection and two-way interaction.

It is also possible that some programs contained within a message might be quite sophisticated when run on a fast enough computing grid. While one can only speculate about the capabilities of such programs, it is reasonable to assume that a program designed to run on a very fast computing grid, and authored by an advanced civilization, is probably going to be more sophisticated than anything we have built to date. If this is the case, the message itself may be more than evidence of intelligence, but may also contain elements that are themselves life-like and in some respects intelligent.

If SETI does succeed in detecting a signal from another civilization, tomorrow or in the distant future, software developers will play an important role in decoding any message that signal conveys, both in analyzing the message and in building tools that others can use to do so. If a signal is detected, its data will be available to anyone with a computer, and will tempt people worldwide to decipher its contents. It is hard to imagine a more interesting challenge. Indeed, it is possible that the computing industry would organize itself around this challenge, both because of the curiosity that drives many people who work in technology, and because of the public's demand for information. Who wouldn't want to download a program that displays images from an extraterrestrial civilization?

Don't get your hopes up, though. With so little telescope time, the SETI@home team can only look at one in 25 million of the candidate signals on this run. Maybe the team will get lucky this time. Most likely it will have to keep trying. Win or lose, it is a credit to the ingenuity of the people behind the SETI@home project and a sign of things to come.

Brian McConnell is an inventor, author, and serial telecom entrepreneur. He has founded three telecom startups since moving to California. The most recent, Open Communication Systems, designs cutting-edge telecom applications based on open standards telephony technology.


Return to OpenP2P.com.



P2P Weblogs

Richard Koman Richard Koman's Weblog
Supreme Court Decides Unanimously Against Grokster
Updating as we go. Supremes have ruled 9-0 in favor of the studios in MGM v Grokster. But does the decision have wider import? Is it a death knell for tech? It's starting to look like the answer is no. (Jun 27, 2005)

> More from O'Reilly Developer Weblogs


More Weblogs
FolderShare remote computer search: better privacy than Google Desktop? [Sid Steward]

Data Condoms: Solutions for Private, Remote Search Indexes [Sid Steward]

Behold! Google the darknet/p2p search engine! [Sid Steward]

Open Source & The Fallacy Of Composition [Spencer Critchley]