Monday, December 26, 2011

The agent learning loop

The great difficulty with software agents is the issue of how to program them to behave in an intelligent manner. Programming intelligence is difficult, tricky, and error-prone, when it is practical at all. Sometimes we are actually able to come up with simple, clever heuristics that seem to approximate intelligence (e.g., ELIZA), or maybe even heuristics for a very limited and constrained domain that come very close to approximating human-level intelligence (e.g., Deep Blue or Watson), but all too often our attempts at programming intelligence fall woefully short, are merely amusing, or are outright lame or horribly dysfunctional when situated in the real world. We will continue to pursue the vision of intelligent machines through programmed intelligence, but ultimately there is only one true path to true intelligence: the ability to learn.
Computer software that mimics intelligence focuses primarily on programming a library of encoded information and patterns that represent knowledge. That can enable a computer to answer questions or respond to environmental conditions, but only in a pre-programmed sense. The beauty of human-level intelligence is that the human mind has the ability to learn, to teach itself new facts, to recognize new patterns, to actually produce new knowledge.
We can also produce computers that embody quite a fair amount of the processing that occurs in the human mind, but we are still stymied by the vast ability of the mind to learn and produce knowledge and know-how itself.
Part of our lack of progress on the learning front is the simple fact that much of the demand for intelligent machines has been simply to replace humans for relatively mindless and rote activities. In other words, a focus on the economics of predictable production rather than creative and intuitive activities.
I would like to propose the overall sequence for a path forward towards intelligent machines. I call it the agent learning loop:
  1. We (humans) program an agent or collection of agents with some "basic" intelligence (knowledge and pattern recognition abilities.)
  2. We (humans) program those agents with the "basic" ability to "learn."
  3. We (humans) also program those agents with the ability to communicate their knowledge and learnings with other agents, as well as to simply be able to observe the activity of other agents and learn from their successes and failures.
  4. We (humans) program those agents with the ability to create new intelligent agents with comparable abilities of intelligence and abilities to learn. These agents are then able to learn and "know" incrementally more than us, their creators.
  5. These agents then create new agents, incrementally more intelligent than themselves, and the "loop" is repeated at step #1.
The theory is that each iteration of the loop incrementally increases the intelligence of the agents.
One key here is that multiple agents are needed at each step and that delegation, collaboration, and competition are critical factors in learning.
There is also a significant degree of Darwinian evolution in play here as well. True learning involves the taking of some degree of risk, such as with intuitive leaps, and sometimes even random selection when alternatives seem comparable in value, or even random selection on occasion when the choice might seem purely "rational." With a single agent risk is risky, but with multiple agents alternatives can be exploited in parallel. Agents that learn poorly or incorrectly will be at a disadvantage in the next generation and likely die off, although in some cases short-term "poor" behavior can sometimes flip over in future generations to have unexpected value as the environment itself evolves.
Communications between agents is critical, as is the ability to learn from failures. In fact, agents may "learn" to seek failure as a faster path to "successful" knowledge.