Wolfram Alpha Computes Answers To Factual Questions. This Is Going To Be Big.

Editor’s note: Below is a guest post from Nova Spivack, CEO of Radar Networks, about a new computational knowledge engine called Wolfram Alpha being developed by computer scientist Stephen Wolfram. Spivack originally published it on Twine, and it is republished here with his permission. Some of the sections have been rearranged for clarity.


Stephen Wolfram is building something new — and it is really impressive and significant. In fact it may be as important for the Web (and the world) as Google, but for a different purpose.

Stephen was kind enough to spend two hours with me last week to demo his new online service — Wolfram Alpha (scheduled to open in May). In the course of our conversation we took a close look at Wolfram Alpha’s capabilities, discussed where it might go, and what it means for the Web, and even the Semantic Web.

Stephen has not released many details of his project publicly yet, so I will respect that and not give a visual description of exactly what I saw. However, he has revealed it a bit in a recent article, and so below I will give my reactions to what I saw and what I think it means. And from that you should be able to get at least some idea of the power of this new system.

A Computational Knowledge Engine for the Web

In a nutshell, Wolfram and his team have built what he calls a “computational knowledge engine” for the Web. OK, so what does that really mean? Basically it means that you can ask it factual questions and it computes answers for you.

It doesn’t simply return documents that (might) contain the answers, like Google does, and it isn’t just a giant database of knowledge, like the Wikipedia. It doesn’t simply parse natural language and then use that to retrieve documents, like Powerset, for example. Instead, Wolfram Alpha actually computes the answers to a wide range of questions — like questions that have factual answers such as “What country is Timbuktu in?” or “How many protons are in a hydrogen atom?” or “What is the average rainfall in Seattle?”

Think about that for a minute. It computes the answers. Wolfram Alpha doesn’t simply contain huge amounts of manually entered pairs of questions and answers, nor does it search for answers in a database of facts. Instead, it understands and then computes answers to certain kinds of questions.

How Does it Work?

Wolfram Alpha is a system for computing the answers to questions. To accomplish this it uses built-in models of fields of knowledge, complete with data and algorithms, that represent real-world knowledge.

For example, it contains formal models of much of what we know about science — massive amounts of data about various physical laws and properties, as well as data about the physical world.

Based on this you can ask it scientific questions and it can compute the answers for you. Even if it has not been programmed explicity to answer each question you might ask it.

But science is just one of the domains it knows about — it also knows about technology, geography, weather, cooking, business, travel, people, music, and more.

It also has a natural language interface for asking it questions. This interface allows you to ask questions in plain language, or even in various forms of abbreviated notation, and then provides detailed answers.

The vision seems to be to create a system wich can do for formal knowledge (all the formally definable systems, heuristics, algorithms, rules, methods, theorems, and facts in the world) what search engines have done for informal knowledge (all the text and documents in various forms of media).

Building Blocks for Knowledge Computing

Wolfram Alpha is almost more of an engineering accomplishment than a scientific one — Wolfram has broken down the set of factual questions we might ask, and the computational models and data necessary for answering them, into basic building blocks — a kind of basic language for knowledge computing if you will. Then, with these building blocks in hand his system is able to compute with them — to break down questions into the basic building blocks and computations necessary to answer them, and then to actually build up computations and compute the answers on the fly.

Wolfram’s team manually entered, and in some cases automatically pulled in, masses of raw factual data about various fields of knowledge, plus models and algorithms for doing computations with the data. By building all of this in a modular fashion on top of the Mathematica engine, they have built a system that is able to actually do computations over vast data sets representing real-world knowledge. More importantly, it enables anyone to easily construct their own computations — simply by asking questions.

The scientific and philosophical underpinnings of Wolfram Alpha are similar to those of the cellular automata systems he describes in his book, “A New Kind of Science” (NKS). Just as with cellular automata (such as the famous “Game of Life” algorithm that many have seen on screensavers), a set of simple rules and data can be used to generate surprisingly diverse, even lifelike patterns. One of the observations of NKS is that incredibly rich, even unpredictable patterns, can be generated from tiny sets of simple rules and data, when they are applied to their own output over and over again.

In fact, cellular automata, by using just a few simple repetitive rules, can compute anything any computer or computer program can compute, in theory at least. But actually using such systems to build real computers or useful programs (such as Web browsers) has never been practical because they are so low-level it would not be efficient (it would be like trying to build a giant computer, starting from the atomic level).

The simplicity and elegance of cellular automata proves that anything that may be computed — and potentially anything that may exist in nature — can be generated from very simple building blocks and rules that interact locally with one another. There is no top-down control, there is no overarching model. Instead, from a bunch of low-level parts that interact only with other nearby parts, complex global behaviors emerge that, for example, can simulate physical systems such as fluid flow, optics, population dynamics in nature, voting behaviors, and perhaps even the very nature of space-time. This is the main point of the NKS book in fact, and Wolfram draws numerous examples from nature and cellular automata to make his case.

But with all its focus on recombining simple bits of information and simple rules, cellular automata is not a reductionist approach to science — in fact, it is much more focused on synthesizing complex emergent behaviors from simple elements than in reducing complexity back to simple units. The highly synthetic philosophy behind NKS is the paradigm shift at the basis of Wolfram Alpha’s approach too. It is a system that is very much “bottom-up” in orientation.

Wolfram has created a set of building blocks for working with formal knowledge to generate useful computations, and in turn, by putting these computations together you can answer even more sophisticated questions and so on. It’s a system for synthesizing sophisticated computations from simple computations. Of course anyone who understands computer programming will recognize this as the very essence of good software design. But the key is that instead of forcing users to write programs to do this in Mathematica, Wolfram Alpha enables them to simply ask questions in natural language questions and then automatically assembles the programs to compute the answers they need.

This is not to say that Wolfram Alpha IS a cellular automata itself — but rather that it is similarly based on fundamental rules and data that are recombined to form highly sophisticated structures. The knowledge and intelligence it contains are extremely modularized and can be used to synthesize answers to factual questions nobody has asked yet. The questions are broken down to their basic parts and then simple reasoning takes places, and answers are computed on the vast knowledge base in the system. It appears the system can make inferences and do some basic reasoning across what it knows — it is not purely reductionist in that respect; it is generative, it can synthesize new knowledge, if asked to.

Wolfram Alpha perhaps represents what may be a new approach to creating an “intelligent machine” that does away with much of the manual labor of explicitly building top-down expert systems about fields of knowledge (the traditional AI approach, such as that taken by the Cyc project), while simultaneously avoiding the complexities of trying to do anything reasonable with the messy distributed knowledge on the Web (the open-standards Semantic Web approach). It’s simpler than top down AI and easier than the original vision of Semantic Web.

Generally if someone had proposed doing this to me, I would have said it was not practical. But Wolfram seems to have figured out a way to do it. The proof is that he’s done it. It works. I’ve seen it myself.

The Hairy Questions

Of course, questions abound. It remains to be seen just how smart Wolfram Alpha really is, or can be. How easily extensible is it? Will it get increasingly hard to add and maintain knowledge as more is added to it? Will it ever make mistakes? What forms of knowledge will it be able to handle in the future?

I think Wolfram would agree that it is probably never going to be able to give relationship or career advice, for example, because that is “fuzzy” — there is often no single right answer to such questions. And I don’t know how comprehensive it is, or how it will be able to keep up with all the new knowledge in the world (the knowledge in the system is exclusively added by Wolfram’s team right now, which is a labor intensive process). But Wolfram is an ambitious guy. He seems confident that he has figured out how to add new knowledge to the system at a fairly rapid pace, and he seems to be planning to make the system extremely broad.

And there is the question of bias, which we addressed as well. Is there any risk of bias in the answers the system gives because all the knowledge is entered by Wolfram’s team? Those who enter the knowledge and design the formal models in the system are in a position to both define the way the system thinks — both the questions and the answers it can handle. Wolfram believes that by focusing on factual knowledge — things like you might find in the Wikipedia or textbooks or reports — the bias problem can be avoided. At least he is focusing the system on questions that do have only one answer — not questions for which there might be many different opinions. Everyone generally agrees for example that the closing price of GOOG on a certain data is a particular dollar amount. It is not debatable. These are the kinds of questions the system addresses.

But even for some supposedly factual questions, there are potential biases in the answers one might come up with, depending on the data sources and paradigms used to compute them. Thus the choice of data sources has to be made carefully to try to reflect as non-biased a view as possible. Wolfram’s strategy is to rely on widely accepted data sources like well-known scientific models, public data about factual things like the weather, geography and the stock market published by reputable organizatoins and government agencies, etc. But of course even this is a particular worldview and reflects certain implicit or explicit assumptions about what data sources are authoritative.

This is a system that reflects one perspective — that of Wolfram and his team — which probably is a close approximation of the mainstream consensus scientific worldview of our modern civilization. It is a tool — a tool for answering questions about the world today, based on what we generally agree that we know about it. Still, this is potentially murky philosophical territory, at least for some kinds of questions. Consider global warming — not all scientists even agree it is taking place, let alone what it signifies or where the trends are headed. Similarly in economics, based on certain assumptions and measurements we are either experiencing only mild inflation right now, or significant inflation. There is not necessarily one right answer — there are valid alternative perspectives.

I agree with Wolfram, that bias in the data choices will not be a problem, at least for a while. But even scientists don’t always agree on the answers to factual questions, or what models to use to describe the world — and this disagreement is essential to progress in science in fact. If there is only one “right” answer to any question there could never be progress, or even different points of view. Fortunately, Wolfram is desigining his system to link to alternative questions and answers at least, and even to sources for more information about the answers (such as the Wikipeda for example). In this way he can provide unambiguous factual answers, yet also connect to more information and points of view about them at the same time. This is important.

It is ironic that a system like Wolfram Alpha, which is designed to answer questions factually, will probably bring up a broad range of questions that don’t themselves have unambiguous factual answers — questions about philosophy, perspective, and even public policy in the future (if it becomes very widely used). It is a system that has the potential to touch our lives as deeply as Google. Yet how widely it will be used is an open question too.

The system is beautiful, and the user interface is already quite simple and clean. In addition, answers include computationally generated diagrams and graphs — not just text. It looks really cool. But it is also designed by and for people with IQ’s somewhere in the altitude of Wolfram’s — some work will need to be done dumbing it down a few hundred IQ points so as to not overwhelm the average consumer with answers that are so comprehensive that they require a graduate degree to fully understand.

It also remains to be seen how much the average consumer thirsts for answers to factual questions. I do think all consumers at times have a need for this kind of intelligence once in a while, but perhaps not as often as they need something like Google. But I am sure that academics, researchers, students, government employees, journalists and a broad range of professionals in all fields definitely need a tool like this and will use it every day.

How Smart is it and Will it Take Over the World?

Wolfram Alpha is like plugging into a vast electronic brain. It provides extremely impressive and thorough answers to a wide range of questions asked in many different ways, and it computes answers, it doesn’t merely look them up in a big database.

In this respect it is vastly smarter than (and different from) Google. Google simply retrieves documents based on keyword searches. Google doesn’t understand the question or the answer, and doesn’t compute answers based on models of various fields of human knowledge.

But as intelligent as it seems, Wolfram Alpha is not HAL 9000, and it wasn’t intended to be. It doesn’t have a sense of self or opinions or feelings. It’s not artificial intelligence in the sense of being a simulation of a human mind. Instead, it is a system that has been engineered to provide really rich knowledge about human knowledge — it’s a very powerful calculator that doesn’t just work for math problems — it works for many other kinds of questions that have unambiguous (computable) answers.

There is no risk of Wolfram Alpha becoming too smart, or taking over the world. It’s good at answering factual questions; it’s a computing machine, a tool — not a mind.

One of the most surprising aspects of this project is that Wolfram has been able to keep it secret for so long. I say this because it is a monumental effort (and achievement) and almost absurdly ambitious. The project involves more than a hundred people working in stealth to create a vast system of reusable, computable knowledge, from terabytes of raw data, statistics, algorithms, data feeds, and expertise. But he appears to have done it, and kept it quiet for a long time while it was being developed.

Relationship to the Semantic Web

During our discussion, after I tried and failed to poke holes in his natural language parser for a while, we turned to the question of just what this thing is, and how it relates to other approaches like the Semantic Web.

The first question was could (or even should) Wolfram Alpha be built using the Semantic Web in some manner, rather than (or as well as) the Mathematica engine it is currently built on. Is anything missed by not building it with Semantic Web’s languages (RDF, OWL, Sparql, etc.)?

The answer is that there is no reason that one MUST use the Semantic Web stack to build something like Wolfram Alpha. In fact, in my opinion it would be far too difficult to try to explicitly represent everything Wolfram Alpha knows and can compute using OWL ontologies. It is too wide a range of human knowledge and giant OWL ontologies are just too difficult to build and curate.

It would of course at some point be beneficial to integrate with the Semantic Web so that the knowledge in Wolfram Alpha could be accessed, linked with, and reasoned with, by other semantic applications on the Web, and perhaps to make it easier to pull knowledge in from outside as well. In this area, the standards of the Semantic Web could be quite useful to the project. However for the internal knowledge representation and reasoning that takes places in the system, it appears Wolfram has found a pragmatic and efficient representation of his own, and I don’t think he needs the Semantic Web at that level. It seems to be doing just fine without it.

Wolfram Alpha is built on hand-curated knowledge and expertise. Wolfram and his team have somehow figured out a way to make that practical where all others who have tried this have failed to achieve their goals. The task is gargantuan — there is just so much diverse knowledge in the world. Representing even a small segment of it formally turns out to be extremely difficult and time-consuming.

It has generally not been considered feasible for any one group to hand-curate all knowledge about every subject. This is why the Semantic Web was invented — by enabling everyone to curate their own knowledge about their own documents and topics in parallel, in principle at least, more knowledge could be represented and shared in less time by more people — in an interoperable manner. At least that is the vision of the Semantic Web.

But doing anything as sophisticated as Wolfram Alpha on existing decentralized Semantic Web data would simply not be practical today, if ever. I think Wolfram’s approach is more pragmatic. The centralized hand-curation of Wolfram Alpha is simply more manageable and efficient for a project of this scale and complexity. It’s also a potential bottleneck and most certainly a cost-center. But it appears to be a tradeoff that Wolfram can afford to make, and one worth making as well.

Competition

Where Google is a system for FINDING things that we as a civilization collectively publish, Wolfram Alpha is for ANSWERING questions about what we as a civilization collectively know. It’s the next step in the distribution of knowledge and intelligence around the world — a new leap in the intelligence of our collective “Global Brain.” And like any big next-step, Wolfram Alpha works in a new way — it computes answers instead of just looking them up.

Wolfram Alpha, at its heart is quite different from a brute force statistical search engine like Google. And it is not going to replace Google — it is not a general search engine: You would probably not use Wolfram Alpha to shop for a new car, find blog posts about a topic, or to choose a resort for your honeymoon. It is not a system that will understand the nuances of what you consider to be the perfect romantic getaway, for example — there is still no substitute for manual human-guided search for that. Where it appears to excel is when you want facts about something, or when you need to compute a factual answer to some set of questions about factual data.

I think the folks at Google will be surprised by Wolfram Alpha, and they will probably want to own it, but not because it risks cutting into their core search engine traffic. Instead, it will be because it opens up an entirely new field of potential traffic around questions, answers and computations that you can’t do on Google today.

The services that are probably going to be most threatened by a service like Wolfram Alpha are the Wikipedia, Metaweb’s Freebase, and any natural language search engines (such as Microsoft’s upcoming search engine, based perhaps in part on Powerset’s technology among others), and other services that are trying to build comprehensive factual knowledge bases.

As a side-note my own service, Twine.com, is NOT trying to do what Wolfram Alpha is trying to do, fortunately. Instead, Twine uses the Semantic Web to help people filter the Web, organize knowledge, and track their interests. It’s a very different goal. And I’m glad, because I would not want to be competing with Wolfram Alpha. It’s a force to be reckoned with.

Future Steps

I think there is more potential to this system than Stephen has revealed so far. I think he has bigger ambitions for it in the long-term future. I believe it has the potential to be THE online service for computing factual answers. THE system for factual knowlege on the Web. More than that, it may eventually have the potential to learn and even to make new discoveries. We’ll have to wait and see where Wolfram takes it.

Maybe Wolfram Alpha could even do a better job of retrieving documents than Google, for certain kinds of questions — by first understanding what you really want, then computing the answer, and then giving you links to documents that related to the answer. But even if it is never applied to document retrieval, I think it has the potential to play a leading role in all our daily lives — it could function like a kind of expert assistant, with all the facts and computational power in the world at our fingertips.

I would expect that Wolfram Alpha will open up various API’s in the future and then we’ll begin to see some interesting new, intelligent, applications begin to emerge based on its underlying capabilities and what it knows already.

In May, Wolfram plans to open up what I believe will be a first version of Wolfram Alpha. Anyone interested in a smarter Web will find it quite interesting, I think. Meanwhile, I look forward to learning more about this project as Stephen reveals more in months to come.

One thing is certain, Wolfram Alpha is quite impressive and Stephen Wolfram deserves all the congratulations he is soon going to get.