Why artificial intelligence is succeeding: Then and now
Artificial intelligence has a checkered past. It has gone through multiple waves of huge expectations followed by incredible disappointments. We have seen the rise and fall of expert systems, neural networks, logic (hard and fuzzy) and the use of statistical models for determining reasoning.
We seem to be, once again, in an era of heightened expectations regarding A.I. We now have Siri, IBM Watson, self-driving cars and the proliferation of machine learning, data mining and predictive systems that promise an unprecedented, even frightening, level of machine intelligence.
But how is this current rise of A.I. different from what we have experienced before? What has changed to make us believe that the technology will make good on its countless promises?
I would argue that, ironically, the core technologies of A.I. have not changed drastically and today’s A.I. engines are, in most ways, similar to years’ past. The techniques of yesteryear fell short, not due to inadequate design, but because the required foundation and environment weren’t built yet. In short, the biggest difference between A.I. then and now is that the necessary computational capacity, raw volumes of data, and processing speed are readily available so the technology can really shine.
For example, IBM Watson leverages the idea that facts are expressed in multiple forms and that each match against each possible form equals evidence of the answer. The technology first analyzes the language input to extract the elements and relationships needed to determine what you might be looking for, and then uses thousands of patterns made up of the words in the original query to find matches in massive corpora of text. Each match provides a single piece of evidence, and each piece of evidence is added up to provide Watson with a number associated with each answer. While an exceptional system, there is not a lot of new A.I. technology at play in Watson. Watson tends to converge on the right answer because of the sheer volume of the different matches against the truth as it is expressed in the text.
If it only has a small number of documents to examine, however, the odds of it finding the information it needs, expressed in a way that it can understand, is small. In parallel, the odds that faulty information is going to get in the way of finding the truth is reduced as the corpus size grows. So, in much the same way that search results are improved as the data sets are expanded, the raw volume of text available to Watson directly correlates with the probability that the system will get to the right answer.
This means that in trying to figure out the “largest city in the U.S.,” Watson is not distracted by phrases like “Chicago is the city of big shoulders” or “Los Angeles is the biggest exporter of U.S. culture.” Because it can find thousands of variants of phrases like “New York is the largest city,” “the largest city in the U.S. is New York” and “no city is bigger than New York.”
It is the data that makes Watson work.
The same holds true of Google’s deep learning work. Deep learning algorithms make microscopic changes to networks that feed input values through layers of nodes that feed into a set of output values. A well-trained network can take a set of pixel values, feed those values through layers of transformations and then, produce a set of outputs that express “cat” or “dog” or “car.” The training needed to perform this action is based on small changes to the network driven by the examination of multiple examples.
While the core idea of how to train these networks has existed for over 30 years, the techniques were applied to problem sets of thousands or tens of thousands of examples and run on machines that by today’s standards would be considered painfully slow. Now this same technique can be applied to hundreds of billions of examples and run on machines with specialized chips that allow them to learn from these examples much faster.
This same dynamic holds true for recommendation systems, predictive learning algorithms, and the world of intelligent data mining. It is not just that the algorithms have been improved; it is the data available to them that allows them to reliably extract the signal out of the noise and act intelligently.
Yes, the current rise in expectations around A.I. seems much more robust than it has ever been before, but it is not the A.I. technology alone that is feeding the frenzy. It is the environment and foundation that exist today, providing the technologies with the required data and speed.
And the most exciting thing? The opportunity is boundless. The data keeps growing while machines become faster so yesterday’s, today’s and the future’s AI systems can only keep succeeding.