Untitled

Here are some popular AI techniques, roughly in ascending order of capability

* State machines. A collection of states, usually attached to bits of code that simple implement behaviors (move, chase etc). Each state has several possible transitions to other states triggered by simple pattern recognisers or timers. This is what 90% of game AI and most toy robotics uses. Sometimes used as a small component of general AI. Usually no learning ability.

* Production systems. This isn't really AI at all, they're just a fiddly parallel programming language. Nevertheless, there are plenty of academic 'model of the human mind' AI projects (usually started by second-rate psychologists) that consist of some hardcoded input mechanisms, some hardcoded output mechanisms, and a 'central cognition' module that consists of a production rule engine. It usually has no functional learning ability, so the researchers cobble together a few simple demos and leave it at that.

* Classic propositional logic engines, which work on statements of the form 'RABBITS ARE MAMMALS' and 'THERE IS AT LEAST ONE BLUE CUBE'. These were the original plan for general AI, in the first wave of hype (in the 60s). You feed in a bunch of axioms, from a handful to a huge knowledge base of millions, and let them do Boolean inference to generate conclusions, and possibly action plans. Several major problems with this; the search control mechanisms are weak, so it's hard to avoid exponential explosion (and slowdown into uselessness) during inference, the ability to process uncertainty is weak to nonexistent, it's difficult to interface with real sensory input (vision etc) and learning can only occur within the framework of the human-built concepts (if there is any learning at all). Worst of all, they can't perform 'rich' reasoning about properties of systems that don't fit neatly into compact pseudo-verbal descriptions (thus lots of criticism from philosophers and connectionists about how classic AI is full of 'empty symbols' - largely justified). Largely discredited in the late 80s, a lot of the researchers are still around but now working on 'semantic web' stuff (which doesn't work either) rather than general AI. There are still a few people trying to build general AIs based on this (e.g. Cyc). A transhuman AGI could probably use this approach to make a really good chatbot / natural language interface, e.g. the Virtual Intelligences in Mass Effect.

* Classic artificial neural networks (strictly the second wave of ANNs, but the original perceptrons weren't terribly relevant). These don't actually resemble biological nervous systems much, but hey, it helped get grant funding. Classic ANNs consist of a big 2D grid of 'neurons', usually with a lot more rows ('layers') than columns. Each neuron is either on or off. The first layer of neurons is set to match some input data. There is a connection between every neuron in the first layer and every neuron in the second layer, and each of these connections has a weight. For each neuron in the second layer, the weights of all connections to first layer neurons that are turned on are summed, and compared to a threshold. If the sum exceeds the threshold, the neuron is on, otherwise it is off. The same process occurs for each layer until you get to the output neurons; often there is just one, 'pattern present' vs 'pattern absent'. The NN is trained using input data with known correct output data, and using the error at each neuron to adjust the weights in a process called backpropagation.

That is the simplest possible ANN. In practice people have tried all kinds of weight-adjustment and threshold algorithms; some of the former don't backpropagate (e.g. simulated annealing) and some of the later use analogue levels of neuron activation rather than digital. The capabilities are similar though; classic ANNs are quite robust and widely applicable for recognising patterns within a specific context, but they're slow, relatively inaccurate and have a strict complexity limit after which they just don't work at all. Still, they were hugely popular in the 80s and early 90s because they could solve some new problems and because they had the black box mystique of 'real intelligence' to a lot of researchers (and journalists). Lots of people were trying to build general AIs out of these in the early 90s (and were very confident of success); a few amateurs still are. Since classic ANNs don't have any internal state, people either introduce loops (making a 'recurrent NN') or add some sort of external memory. Generally, this has been a miserable failure; in particular no one has come up with a training algorithm that can cope with large networks, non-uniform connection schemes and recurrency.

* Genetic algorithms. Basically you write some general-purpose transform function that takes a vector of control bits as an input along with the input data. You generate a few hundred of these control vectors at random, then for each vector run the function on all your training cases and score them based on how close the output is to the correct answer. Call that a 'generation' of 'individuals'. Pretend that the bit vectors are actually base sequences in biological genes. Create a new set of vectors by combining the highest scoring examples from the first generation; either pick bits at random from the two parents (it's always two for some reason, even though there's no such software limit) or use a crude simulate of crossover (usually single point). Flip a few bits at random to simulate mutation. Re-run the test set, rinse, repeat until the aggregate performance plateaus. If you want you can do this to state machines driving robots, or artificial neural networks (as well as or instead of backpropagation learning).

Genetic algorithms are extremely compute intensive. They work fairly well for tweaking a few parameters in functions designed by human experts, though they suffer from local optima. They're about equivalent to backprop for training NNs, but a lot slower. For general pattern recognition (without careful framing of the problem) they're pretty sucky, well behind NNs. They're very good at two things; making cute little 'artificial life' demos to show at the department open day, and justifying big grant requests for a prestigious compute cluster (or back in the day, a Connection Machine) to run them on. GAs got a lot of hype in the mid to late 90s as the average researcher desktop got powerful enough to run them, they were obviously the route to general AI since they mirror the process that produced humans etc etc. Like classic neural nets, they're actually horribly limited and a very crude reflection of the biological process they're named after. GAs are also the most unstable, unpredictable and generally finicky technique - they suck even as components of general AIs. Of course dyed-in-the-wool emergence fans love them.

* Support vector machines, and more generally, statistical space partioning techniques. A big grab bag of fairly simple algorithms that classify input into categories. Used by search engines, data mining, lots of simple machine learning applications. They usually outperform neural networks in both learning rate (for a given number of training examples) and computational cost, but are somewhat narrower in the range of problems they work on. When SVMs became popular in the mid 90s it was really funny to watch the ANN-boosters have their hype deflated. SVMs don't do any sort of inference, they are for fuzzy pattern recognition only, particularly in very unstructured data (although you can combine them with preprocessors to work better on images etc). Rational AGI designs will probably custom-design an algorithm like this for each individual low-level pattern processing task; certainly that's what we're aiming to do, along with a few projects trying to use GP to make them (see below).

There's another whole raft of approaches used for signal processing; for example many AI vision systems use very similar software technology to video codecs. These are the bits of AI closest to conventional software engineering; they're challenging, but essentially linear and predictable. Good for certain narrow problems, but not generalisable.

* Heuristic and hybrid logic systems. Symbolic AI has been around a long time, and unsurprisingly a lot of people have tried to improve it. A common technique is assigning weights to statements rather than just true/false - in the vast majority of cases this is not formal probability or utility. The last wave of commercially successful 'expert systems' at the end of the 80s tended to use these; heuristic systems are a bit less brittle than pure propositional ones, and having some 'meta-heuristics' to direct inference significantly improved reasoning speed. The immediate successor to that was 'case based reasoning', a buzzword which covers a number of approaches that were basically crude simulations of human analogy making (statistical and logical/structural). Finally there were people who saw symbolic AI doing some useful things, neural networks doing some other useful things, and tried to duct tape them together. The results were marginally better than useless. The whole thing was a bit of a last gasp of the AI push right before the 'AI winter', when a lot of companies failed and funding for academic projects was cut right back. Most of this is way out of fashion now, but you see some of the same approaches cropping up in modern 'hybrid' general AI designs.

* Semantic networks; essentially an attempt to directly combine the mechanics of a neural network with the semantics of symbolic logic. In practice these actually quite similar to heuristic logic systems, but more data-driven rather than code based, and with connectionist style high interconnectivity and weight adjustment mechanisms. There are lots of nodes, which are supposed to stand for concepts, properties, actions or objects. Each node has an activation level; a few elaborate designs have multiple kinds of activation and/or co-activation mechanisms (that provide context and temporary structures). Activation is injected into the system by sensory input and/or active goals, and it propagates along links, moderated by weights. Unlike neural networks, which almost always have a simple mathematical description, semantic network systems can have quite complex and varied activation spreading mechanisms. Eventually the activation propagates to nodes attached to output mechanisms, which makes something happen.

Psychologists, philosophers and people who like naive models of the mind in general love these. It has an intuitive appeal for how humans seem to reason. These kind of systems tend to produce really impressive and deep sounding books, while being spectacularly useless in practice (spewing random words is a favourite outcome). When the more capable researchers have forced their semantic network systems to do something useful, it's generally by carefully designing the network and treating the system as a horribly obfuscated programming language, or by abandoning the claimed semantics and letting it act as a classic NN or statistical system (on some small pattern recognition problem).

* Agent systems. This is another very general category, covering any system where you have lots of semi-independent chunks of code doing local processing and exchanging information. The original versions were chunked at quite a coarse level; these are called 'blackboard architectures' (with one or more shared data areas, called 'blackboards' after the analogy of researchers collaborating on a blackboard). I suppose neural net / symbolic hybrid systems are a degenerate case with only two modules. 'Classifier networks' and 'classifier committees' are along the same lines; usually this means 'we have lots of sucky pattern recognition algorithms, maybe if we run them all at once and average the results, perhaps with some rough heuristics to weight them based on the situation, that'll be better than running any one of them'. This does actually work, due to the mitigation of uncorrelated errors - the best Netflix prize entrant was a huge committee of assorted narrow AI algorithms. However it expends a large amount of computing power for a marginal performance improvement and virtually no improvement in generality.

At the other end of the scale there are very fine grained agent systems, such as the 'codelet architectures' Hofstadter's team used, and the more connectionist/pattern-based designs that Minsky advocated. The later blurs into semantic networks; in fact all of this is frequently combined into an 'bubbling emergent stew' (yes, people really say that, and proudly). The chunky bits are the heavyweight agents, doing things like vision recognition with large blocks of signal processing code.

* Bayesian networks. These attempt to model the causal structure of reality. They're a network of nodes and links kind of like neural networks, but instead of arbitrary 'activation' flowing through it you have real event probabilities. Toplogy can be a 'brute force' NN-like structure that connects everything to everything else, or it can be a sparse, carefully constructed one (either by hand or various learning algorithms). The 'weights' are conditional probabilities that are updated by Bayes rule. There are some equivalent Bayesian techniques for bulk data analysis that use the same theory but on bulk matrix processing rather than compact networks.

Bayesian networks are extremely effective at classification tasks, if the network structure matches the target domain. However the vast majority of networks use 'naive Bayes', where the conditional probabilities (of event A leading to event B, vs event C leading to event D) are considered independently, and boolean event occured/event didn't occur distinctions. This is fine for things like spam filters, where you can just have an independent conditional probability from every word occurence to the email being spam and get ok performance. It doesn't work at all in lots of other domains - there are various approaches to the 'hidden node problem', getting the network to auto-insert nodes to represent hidden parts of the event causality structure, but frankly none of the published ones work all that well. For example there's a lot of use of Bayesian nets in automated trading software, but it almost all has hand-optimised structure. The network structure problem gets even harder once you start trying to process complex probability distributions over variables (or spaces) rather than just single event probabilities. Some people have tried to use genetic algorithms on the Bayes net structure, but that tends to cause destructive interference that just breaks everything. On top of all that, there's the fact that simple Bayes nets combine some of the limitations of symbolic and connectionist processing; they work on arbitrary human-defined symbols the way symbolic logic is, yet are also restricted to fixed classification functions like NNs.


* Genetic programming. Earlier I mentioned genetic algorithms, which attempt to 'evolve' an intelligent agent with a (usually crude) simulation of natural selection. GAs use a fixed context and relatively small 'evolvable' segment, either just the parameters for a fixed algorithm or mapping table, or evolving a function tree (e.g. an equation, or signal processing kernel) that isn't Turing-complete. The input/output and control systems are fixed, and there is usually no storage or looping. Genetic programming goes further by applying the genetic operators to actual program code, with looping and storage. Unfortunately (or rather, fortunately, since making AGI this way is horribly dangerous) programs are far harder to evolve than algorithms, firstly because most possible programs do not work at all, and secondly because the 'fitness space' of program code is very, very uneven. Natural selection relies on relatively smooth gradients and shallow local optima, which is ok for indirectly encoded NNs (e.g. human brains), but incompatible with brittle program code. The workable GP systems operate on abstract syntax trees rather than raw code (i.e. the kind of intermediate representation that compilers use for optimisation), and all kinds of tricks have been tried with changing the GP operators (e.g. using templates and expression networks) and making the control system smarter. Even still, no one has gotten GP to work on systems larger in scope than a single data processing algorithm. Evolved neural networks have had some success, but they're not fundamentally any more capable than backpropagation or Hebbian-trained ones.

* General probabilistic reasoners. These are symbolic logic systems that actually treat limited information and uncertainty correctly, which is to say that they use Bayesian probability calculus. There aren't many of these around; Pei Wang's NARS is the only one I can think of that got much attention in the community (and it still didn't get much). The basic problem seems to be that the work being done on Bayesian networks is carried out by connectionists, who have already fundamentally rejected the grand symbolic general AI dream as infeasible. Anyway, going probabilistic solves some of the problems of classic symbolic AI, in that it hides some of the brittleness, improves search control, and allows limited learning (updating of probabilities for facts in the knowledge base) to work quite well. However, it is still fundamentally limited in that it cannot create its own representations, needed to capture new fields of knowledge, or new code needed to tackle compute-intensive tasks. NARS at least still suffers from the classic symbolic problems of lack of levels of detail and 'this symbol means apple because the designer called it 'apple''.

* Spiking neural networks. The original artificial neural networks really weren't very brainlike at all; they were either on or off (or in a few cases, had continuous activation levels) and were globally updated via a simple threshold function. Biological neurons process 'spike trains'; depolarisation waves come in along numerous axons at irregular intervals, which causes the neuron to fire at irregular intervals. Analysis has shown complex frequency structure in many spike trains and use of phase differences as a critical part of processing in local neural circuits; the exact timing is crucial. Spiking neural networks attempt to emulate that, by simulating neurons as real-time signal processors. This subfield is split into two camps, the people trying to do fully accurate brain simulation (currently ascendant and by far the best funded single approach in AI) and the people who still treat it merely as inspiration (e.g. the people messing about with crazy schemes for evolved spiking NNs, such as evolving a lossy wavlet description of the network). Both of these suffice for general AI with enough work. Neither of them are a good idea IMHO, since these designs are quite opaque and not guarenteed to produce anything terribly human even when explicitly neuromorphic. Still, they're a better idea than the next category.

* Recursive heuristic systems. This was the first real attempt at 'automated programming'. The prototypical system was Eurisko, which was essentially symbolic AI system combined with quite a sophisticated genetic programming system (for 1980), where the GP system could modify its own mutation operators and templates. Very few people followed up on this; Eurisko is bizarre in that it's a landmark program that got more attention outside of the field (e.g. Drexler's glowing description in Engines of Creation) than within it. I haven't seen any modern versions as well developed, but some people are playing around with quite dangerous improved variants, that use graph-based GP techniques, proper probability and utility theory, and some form of Kolmogorov prior. I was messing around with this stuff myself in first serious attempts to research AGI implementation strategy. The overwhelming problem is stability; with a full self-modification capability, the system can easily trash its own cognitive capabilities. Eurisko was actually the first AI program to discover the 'wireheading' problem, of self-modifying to simply declare the problem solved instead of actually solving the problem. There is another raft of techniques people have tried to try and enforce stability, but mostly they just replace obvious problems with subtle ones. Nevertheless this approach does suffice for general AI, will directly produce a rapidly-self enhancing 'seed' AI, and will almost certainly produce an uncontrollable and opaque one.

* High level approximations inspired by spiking NNs. Various researchers have put forward functional theories of microcolumns (and other brain structures) that they are highly confident in, and have claimed that this allows them to create a brain-like general AI without messing about with the details of simulating neurons (most famously, Jeff Hawkins at Numenta). So far every one of these has been a miserable failure. IMHO they are all the usual collection of 'cobble together a series of mechanisms that sound cool and might be adequate, but don't bother to actually prove capability or design and verify functionality the hard way'. They all rely on some degree of 'emergence', without sticking closely to the one known design where such emergence actually works (humans). This approach could work in principle, but frankly I doubt anyone is going to hit on a workable design until we've already got human uploads (or very neuromorphic AGIs) that we can abstract from.

* Kitchen sink designs. A popular approach in general AI, quite possibly /the/ most popular approach in terms of number of people who give it a try, is 'let's take everything that we've found to work in some domain or other, or even which just looks promising, and combine it all into one monster patchwork architecture'. Often there is a period of filtering afterwards when they actually try to implement this monstrosity; certainly there seems to have been with Goertzel, though his design is still pretty crazy. As AGI designs go these are usually relatively harmless, because the designer tends not to have a clue how to actually integrate everything into something that works, and all the elements just fail to mesh and work at cross purposes. However, the more dangerous geniuses might manage to create a working AI in the middle of the mess - most likely a Eurisko-style recursive heuristic system, though possibly something more like an abstracted spiking NN for the very connectionist ones. This would be bad news, because an evolving AI seed could actually make use of all the mismatched AI code around it as functional pieces in its own generated code, potentially taking it over the fast recursive self-enhancement threshold earlier than it otherwise would have managed. Fortunately no one has come close yet, though with a lot of projects it's very hard to tell from just the external PR...

* Recursive approximation of normative reasoning, driving rational code generation. This is the approach we're using, which I personally think is both the most powerful (in reasoning and computational performance) and the only AGI approach that can produce a safe, controlled outcome to recursive self-enhancement. It's essentially a general probabilistic reasoner with complex layered models replacing empty symbols, combined with a recursive heuristic system that uses 'constraint system refinement' (similar to formal verification but applied to the whole design process) to generate code. This ensures that AI-generated code actually works first time, or at the very least does not break anything and is not used until fully tested. Combined with a full reflective model and some actual work put into goal system analysis, it completely eliminates (so far, fingers crossed) the system stability issues formerly associated with self-modifying AI code. The major problems are the extremely high code complexity (codebase size, but more the interconnectedness and sheer difficultly of wrapping your head around the concepts involved) and the fact that the formal theory still stops well short of being able to dictate a full specification (though the amount of guesswork is low compared to 'emergence based' approaches).