Simulated Annealing
General Principle
Simulated annealing is a probabilistic optimization technique inspired by the metallurgical process of annealing, where metals are heated and slowly cooled to reduce defects and reach a low-energy state.
The key idea is to allow the algorithm to:
- Explore the solution space through random movements (like thermal fluctuations)
- Exploit promising regions by moving toward better solutions
- Gradually reduce randomness over time (cooling), converging to a good solution
This balance between exploration and exploitation helps escape local minima—a critical advantage over greedy algorithms like Lloyd’s method.
Mechanisms in K-means Context
In kmeanssa-ng, cluster centers are dynamic entities that move through
the metric space using two complementary mechanisms:
Brownian Motion (Exploration)
Centers perform random walks in the metric space, characterized by:
- Random direction selection
- Distance traveled proportional to \(\sqrt{\Delta t}\) (diffusion scaling)
- Allows escape from local minima through stochastic exploration
For a center \(c\) and time parameter \(\Delta t\):
Drift (Exploitation)
Centers are pulled toward the observations assigned to their cluster:
- Each center drifts toward a randomly selected point in its cluster
- Distance traveled: proportion \(\alpha\) of the geodesic distance
- Reduces cluster energy by moving centers closer to their observations
For a center \(c\), target observation \(x\), and drift proportion \(\alpha \in [0,1]\):
(where the notation is geometric; on graphs this means moving along the geodesic path)
Temperature Schedule
The “temperature” controls the balance between exploration and
exploitation over time. kmeanssa-ng uses an inhomogeneous Poisson
process to generate a decreasing temperature schedule:
where \(E_i \sim \text{Exp}(\lambda)\) are exponential random variables.
Key properties:
- Temperature decreases over time (cooling)
- Controlled by parameter \(\lambda\) (intensity)
- Stochastic schedule adds robustness
Two Algorithm Variants
kmeanssa-ng implements two strategies for combining Brownian motion
and drift:
Interleaved (run_interleaved)
At each iteration:
- Randomly select one observation \(x_i\)
- Perform Brownian motion on all centers: \(c_j \leftarrow c_j + \text{Brownian}(\Delta t)\)
- Find nearest center \(c^*\) to \(x_i\)
- Apply drift: \(c^* \leftarrow c^* + \alpha \cdot (x_i - c^*)\)
This approach alternates exploration and exploitation at a fine-grained level.
Sequential (run_sequential)
Separates exploration and exploitation into distinct phases:
- Brownian phase: Iterate through all observations, performing only Brownian motion
- Drift phase: Iterate through all observations, performing only drift
This approach separates the two mechanisms temporally.
Choice: V1 (interleaved) is generally recommended as the default, but V2 can be useful for specific problem structures.
Algorithm Parameters
The behavior of the simulated annealing algorithm is controlled by three main parameters. Understanding these parameters is crucial for achieving good clustering results.
lambda0: Brownian Motion Intensity
Controls the initial strength of random exploration in the metric space.
Mathematical role: Scales the diffusion coefficient in the Brownian motion component. The standard deviation of each random step is proportional to \(\lambda_0 \sqrt{\Delta t}\).
Practical effects:
- Higher values (\(\lambda_0 \in [1.5, 3.0]\)):
- More exploration of the solution space
- Better escape from local minima
- Slower convergence
- Recommended for complex landscapes with many local optima
- Lower values (\(\lambda_0 \in [0.3, 0.8]\)):
- Less exploration, more focused search
- Faster convergence
- Higher risk of getting trapped in local minima
- Suitable for simpler problems or when speed is critical
- Default: \(\lambda_0 = 1.0\) provides a good balance for most use cases
beta0: Drift Intensity
Controls how strongly centers are pulled toward observations.
Mathematical role: The drift proportion at time \(t\) is computed as: \(\(\alpha(t) = \min(h \cdot \beta_0 \cdot \log(1 + t), 1)\)\) where \(h\) is the time interval. This controls the strength of attraction toward the nearest observation.
Practical effects:
- Higher values (\(\beta_0 \in [2.0, 5.0]\)):
- Stronger drift toward observations
- Faster convergence
- More exploitation of currently good positions
- Risk of premature convergence
- Lower values (\(\beta_0 \in [0.3, 0.8]\)):
- Weaker drift
- More time for exploration
- Slower convergence
- Better for difficult optimization landscapes
- Default: \(\beta_0 \in [1.0, 2.0]\) works well for most problems
step_size: Time Discretization
Controls the temporal resolution of the stochastic process simulation.
Mathematical role: Euler discretization step \(\Delta t\) for solving the stochastic differential equation (SDE). Smaller values provide more accurate simulation of the continuous process.
Practical effects:
- Smaller values (\(\Delta t \in [0.001, 0.01]\)):
- More accurate simulation of the theoretical process
- Slower computation (more steps needed)
- Better numerical stability
- Larger values (\(\Delta t \in [0.05, 0.1]\)):
- Faster computation
- Less accurate approximation
- Risk of numerical instability
- Default: \(\Delta t = 0.01\) provides a good accuracy/speed tradeoff
Rule of thumb: Choose step_size much smaller than the
characteristic time scale of your Poisson process (approximately
\(1/\lambda_0\)).
Choosing Parameter Combinations
Different parameter combinations suit different optimization scenarios:
Quick Convergence (when you trust your initialization)
sa = SimulatedAnnealing(
points, k=5,
lambda0=0.5, # Reduced exploration
beta0=3.0, # Strong drift
step_size=0.01
)
Thorough Search (complex problems, many local minima)
sa = SimulatedAnnealing(
points, k=5,
lambda0=2.0, # More exploration
beta0=1.0, # Gentler drift
step_size=0.01
)
Balanced (default, works well in most cases)
sa = SimulatedAnnealing(
points, k=5,
lambda0=1.0, # Moderate exploration
beta0=2.0, # Moderate drift
step_size=0.01
)
Theoretical Foundation
For detailed mathematical derivations and theoretical justification of these parameters, see:
Robustification
To improve stability, kmeanssa-ng uses robustification: instead of
returning the final centers, it averages results from the last \(p\%\) of
iterations (default: 10%). This reduces sensitivity to late-iteration
fluctuations.