Simulated Annealing

General Principle

Simulated annealing is a probabilistic optimization technique inspired by the metallurgical process of annealing, where metals are heated and slowly cooled to reduce defects and reach a low-energy state.

The key idea is to allow the algorithm to:

Explore the solution space through random movements (like thermal fluctuations)
Exploit promising regions by moving toward better solutions
Gradually reduce randomness over time (cooling), converging to a good solution

This balance between exploration and exploitation helps escape local minima—a critical advantage over greedy algorithms like Lloyd’s method.

Mechanisms in K-means Context

In kmeanssa-ng, cluster centers are dynamic entities that move through the metric space using two complementary mechanisms:

Brownian Motion (Exploration)

Centers perform random walks in the metric space, characterized by:

Random direction selection
Distance traveled proportional to \(\sqrt{\Delta t}\) (diffusion scaling)
Allows escape from local minima through stochastic exploration

For a center \(c\) and time parameter \(\Delta t\):

\[ c \leftarrow c + \text{Brownian}(\Delta t) \]

Drift (Exploitation)

Centers are pulled toward the observations assigned to their cluster:

Each center drifts toward a randomly selected point in its cluster
Distance traveled: proportion \(\alpha\) of the geodesic distance
Reduces cluster energy by moving centers closer to their observations

For a center \(c\), target observation \(x\), and drift proportion \(\alpha \in [0,1]\):

\[ c \leftarrow c + \alpha \cdot (x - c) \]

(where the notation is geometric; on graphs this means moving along the geodesic path)

Temperature Schedule

The “temperature” controls the balance between exploration and exploitation over time. kmeanssa-ng uses an inhomogeneous Poisson process to generate a decreasing temperature schedule:

\[ T(n) = \sqrt{\sum_{i=1}^{n} E_i + 1} - 1 \]

where \(E_i \sim \text{Exp}(\lambda)\) are exponential random variables.

Key properties:

Temperature decreases over time (cooling)
Controlled by parameter \(\lambda\) (intensity)
Stochastic schedule adds robustness

Two Algorithm Variants

kmeanssa-ng implements two strategies for combining Brownian motion and drift:

Interleaved (`run_interleaved`)

At each iteration:

Randomly select one observation \(x_i\)
Perform Brownian motion on all centers: \(c_j \leftarrow c_j + \text{Brownian}(\Delta t)\)
Find nearest center \(c^*\) to \(x_i\)
Apply drift: \(c^* \leftarrow c^* + \alpha \cdot (x_i - c^*)\)

This approach alternates exploration and exploitation at a fine-grained level.

Sequential (`run_sequential`)

Separates exploration and exploitation into distinct phases:

Brownian phase: Iterate through all observations, performing only Brownian motion
Drift phase: Iterate through all observations, performing only drift

This approach separates the two mechanisms temporally.

Choice: V1 (interleaved) is generally recommended as the default, but V2 can be useful for specific problem structures.

Algorithm Parameters

The behavior of the simulated annealing algorithm is controlled by three main parameters. Understanding these parameters is crucial for achieving good clustering results.

`lambda0`: Brownian Motion Intensity

Controls the initial strength of random exploration in the metric space.

Mathematical role: Scales the diffusion coefficient in the Brownian motion component. The standard deviation of each random step is proportional to \(\lambda_0 \sqrt{\Delta t}\).

Practical effects:

Higher values (\(\lambda_0 \in [1.5, 3.0]\)):
More exploration of the solution space
Better escape from local minima
Slower convergence
Recommended for complex landscapes with many local optima
Lower values (\(\lambda_0 \in [0.3, 0.8]\)):
Less exploration, more focused search
Faster convergence
Higher risk of getting trapped in local minima
Suitable for simpler problems or when speed is critical
Default: \(\lambda_0 = 1.0\) provides a good balance for most use cases

`beta0`: Drift Intensity

Controls how strongly centers are pulled toward observations.

Mathematical role: The drift proportion at time \(t\) is computed as: \(\(\alpha(t) = \min(h \cdot \beta_0 \cdot \log(1 + t), 1)\)\) where \(h\) is the time interval. This controls the strength of attraction toward the nearest observation.

Practical effects:

Higher values (\(\beta_0 \in [2.0, 5.0]\)):
Stronger drift toward observations
Faster convergence
More exploitation of currently good positions
Risk of premature convergence
Lower values (\(\beta_0 \in [0.3, 0.8]\)):
Weaker drift
More time for exploration
Slower convergence
Better for difficult optimization landscapes
Default: \(\beta_0 \in [1.0, 2.0]\) works well for most problems

`step_size`: Time Discretization

Controls the temporal resolution of the stochastic process simulation.

Mathematical role: Euler discretization step \(\Delta t\) for solving the stochastic differential equation (SDE). Smaller values provide more accurate simulation of the continuous process.

Practical effects:

Smaller values (\(\Delta t \in [0.001, 0.01]\)):
More accurate simulation of the theoretical process
Slower computation (more steps needed)
Better numerical stability
Larger values (\(\Delta t \in [0.05, 0.1]\)):
Faster computation
Less accurate approximation
Risk of numerical instability
Default: \(\Delta t = 0.01\) provides a good accuracy/speed tradeoff

Rule of thumb: Choose step_size much smaller than the characteristic time scale of your Poisson process (approximately \(1/\lambda_0\)).

Choosing Parameter Combinations

Different parameter combinations suit different optimization scenarios:

Quick Convergence (when you trust your initialization)

sa = SimulatedAnnealing(
    points, k=5,
    lambda0=0.5,    # Reduced exploration
    beta0=3.0,      # Strong drift
    step_size=0.01
)

Thorough Search (complex problems, many local minima)

sa = SimulatedAnnealing(
    points, k=5,
    lambda0=2.0,    # More exploration
    beta0=1.0,      # Gentler drift
    step_size=0.01
)

Balanced (default, works well in most cases)

sa = SimulatedAnnealing(
    points, k=5,
    lambda0=1.0,    # Moderate exploration
    beta0=2.0,      # Moderate drift
    step_size=0.01
)

Theoretical Foundation

For detailed mathematical derivations and theoretical justification of these parameters, see:

Robustification

To improve stability, kmeanssa-ng uses robustification: instead of returning the final centers, it averages results from the last \(p\%\) of iterations (default: 10%). This reduces sensitivity to late-iteration fluctuations.

Simulated Annealing

General Principle

Mechanisms in K-means Context

Brownian Motion (Exploration)

Drift (Exploitation)

Temperature Schedule

Two Algorithm Variants

Interleaved (run_interleaved)

Sequential (run_sequential)

Algorithm Parameters

lambda0: Brownian Motion Intensity

beta0: Drift Intensity

step_size: Time Discretization

Choosing Parameter Combinations

Quick Convergence (when you trust your initialization)

Thorough Search (complex problems, many local minima)

Balanced (default, works well in most cases)

Theoretical Foundation

Robustification

Interleaved (`run_interleaved`)

Sequential (`run_sequential`)

`lambda0`: Brownian Motion Intensity

`beta0`: Drift Intensity

`step_size`: Time Discretization