0.1 What is a Metaheuristic?
Metaheuristics is a rather unfortunate1 term often used to describe a major subfield, indeed the primary subfield, of stochastic optimization. Stochastic optimization is the general class of algorithms and techniques which employ some degree of randomness to find optimal (or as optimal as possible) solutions to hard problems. Metaheuristics are the most general of these kinds of algorithms, and are applied to a very wide range of problems.
What kinds of problems? In Jacobellis v. Ohio (1964, regarding obscenity), the United States Supreme Court Justice Potter Stewart famously wrote,
I shall not today attempt further to define the kinds of material I understand to be embraced within that shorthand description; and perhaps I could never succeed in intelligibly doing so. But I know it when I see it, and the motion picture involved in this case is not that.
Metaheuristics are applied to I know it when I see it problems. They’re algorithms used to find answers to problems when you have very little to help you: you don’t know beforehand what the optimal solution looks like, you don’t know how to go about finding it in a principled way, you have very little heuristic information to go on, and brute-force search is out of the question because the space is too large. But if you’re given a candidate solution to your problem, you can test it and assess how good it is. That is, you know a good one when you see it.
For example: imagine if you’re trying to find an optimal set of robot behaviors for a soccer goalie robot. You have a simulator for the robot and can test any given robot behavior set and assign it a quality (you know a good one when you see it). And you’ve come up with a definition for what robot behavior sets look like in general. But you have no idea what the optimal behavior set is, nor even how to go about finding it.
The simplest thing you could do in this situation is Random Search: just try random behavior sets as long as you have time, and return the best one you discovered. But before you give up and start doing random search, consider the following alternative, known as Hill-Climbing. Start with a random behavior set. Then make a small, random modification to it and try the new version. If the new version is better, throw the old one away. Else throw the new version away. Now make another small, random modification to your current version (which ever one you didn’t throw away). If this newest version is better, throw away your current version, else throw away the newest version. Repeat as long as you can.
Hill-climbing is a simple metaheuristic algorithm. It exploits a heuristic belief about your space of candidate solutions which is usually true for many problems: that similar solutions tend to behave similarly (and tend to have similar quality), so small modifications will generally result in small, well-behaved changes in quality, allowing us to “climb the hill” of quality up to good solutions. This heuristic belief is one of the central defining features of metaheuristics: indeed, nearly all metaheuristics are essentially elaborate combinations of hill-climbing and random search.
The “I know it when I see it” problems tackled by metaheuristics are a subclass of inverse problems. An inverse problem is one in which you have a test function f which takes a candidate solution and produces an assessment of it, but in which it’s difficult or impossible to construct the inverse function f−1 which takes an assessment and returns a candidate solution which would have had that assessment.2 In our example, our robot simulator and test procedure is f . But what we really want is an inverse function f−1 which takes an assessment and returns a robot behavior set. That way, if we were lucky, we could plug in the optimal assessment value into f−1 and get the optimal robot behavior set.
Optimization methods (such as metheuristics) are designed to overcome inverse problems. But many classic optimization techniques, such as Gradient Ascent (Algorithm 1) make strong assumptions about the nature of f : for example, that we also know its first derivative f ′. Metaheuristics make far weaker assumptions, and sometimes make none at all. This means that metaheuristics are very general, but also means that they’re often best thought of as last-ditch methods, used when no other known technique works. As it so happens, that’s the case for an enormous, important, and growing collection of problems.