Core API
kmeanssa_ng.core
Core abstractions and algorithms for k-means on metric spaces.
Center
Bases: Point
Abstract base class for cluster centers.
A center is a special type of point that can move through the space using two mechanisms: - Brownian motion: Random exploration - Drift: Directed movement toward a target point
This class is used in simulated annealing for k-means clustering.
Source code in kmeanssa_ng/core/abstract.py
brownian_motion(time_to_travel)
abstractmethod
Perform random Brownian motion in the space.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
time_to_travel
|
float
|
Time parameter controlling the magnitude of motion. Typical distance traveled is proportional to sqrt(time_to_travel). |
required |
Source code in kmeanssa_ng/core/abstract.py
drift(target_point, prop_to_travel)
abstractmethod
Move toward a target point.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
target_point
|
Point
|
The point to move toward. |
required |
prop_to_travel
|
float
|
Proportion of the distance to travel (between 0 and 1). 0 means no movement, 1 means move all the way to target. |
required |
Source code in kmeanssa_ng/core/abstract.py
Point
Bases: ABC
Abstract base class for points in a metric space.
A point is an element of a metric space with a fixed location. Concrete implementations must define which space the point belongs to.
Source code in kmeanssa_ng/core/abstract.py
space
abstractmethod
property
The metric space this point belongs to.
Returns:
| Type | Description |
|---|---|
Space
|
The Space instance containing this point. |
SimulatedAnnealing
Simulated annealing for offline k-means clustering.
This algorithm solves the k-means problem on arbitrary metric spaces using simulated annealing. Centers perform Brownian motion (exploration) and drift toward observations (exploitation), with temperature controlled by an inhomogeneous Poisson process.
Attributes:
| Name | Type | Description |
|---|---|---|
space |
Space
|
The metric space containing the observations. |
k |
int
|
Number of clusters. |
observations |
list[Point]
|
List of points to cluster. |
centers |
list[Center]
|
Current cluster centers. |
Example
Source code in kmeanssa_ng/core/simulated_annealing.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 | |
centers
property
Current cluster centers.
k
property
Number of clusters.
n
property
Number of observations.
observations
property
List of observation points.
space
property
Metric space containing the observations.
__init__(observations, k, lambda0=1.0, beta0=1.0, step_size=0.1, energy_mode='uniform', random_state=None)
Initialize the simulated annealing algorithm.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
observations
|
list[Point]
|
List of points to cluster, all in the same metric space. |
required |
k
|
int
|
Number of clusters. |
required |
lambda0
|
float
|
Initial Brownian motion intensity parameter (must be > 0). Controls the magnitude of random exploration. Mathematical role: Scales the diffusion coefficient in the Brownian motion component. The standard deviation of each Brownian step is proportional to lambda0 * sqrt(step_size). Practical effect: - Higher values (1.5-3.0): More exploration, slower convergence, better escape from local minima - Lower values (0.3-0.8): Less exploration, faster convergence, higher risk of local minima - Recommended default: 1.0 for balanced exploration/exploitation TODO: Add reference to article on HAL/ArXiv when available. |
1.0
|
beta0
|
float
|
Initial drift intensity parameter (must be > 0). Controls how strongly centers are pulled toward observations. Mathematical role: The drift proportion at time t is computed as alpha(t) = min(h * beta0 * log(1 + t), 1) where h is the time interval. This controls the strength of attraction toward the nearest observation. Practical effect: - Higher values (2.0-5.0): Stronger drift, faster convergence, more exploitation of current best positions - Lower values (0.3-0.8): Weaker drift, more exploration, slower convergence - Recommended default: 1.0-2.0 for most cases TODO: Add reference to article on HAL/ArXiv when available. |
1.0
|
step_size
|
float
|
Time discretization step for the SDE solver (must be > 0). Controls the temporal resolution of the stochastic process. Mathematical role: Euler discretization step Δt for solving the stochastic differential equation. Smaller values give more accurate simulation at the cost of more computation. Practical effect: - Smaller values (0.001-0.01): More accurate simulation, slower - Larger values (0.05-0.1): Faster but less accurate - Recommended default: 0.01 for good accuracy/speed tradeoff - Rule of thumb: Use step_size much smaller than the typical time scale of the Poisson process (~ 1/lambda0) |
0.1
|
energy_mode
|
str
|
Energy calculation mode, either "uniform" or "obs". TODO: Document the difference between these modes. |
'uniform'
|
random_state
|
int | Generator | None
|
Controls randomness for reproducibility. Determines random number generation for all random operations: - Shuffling observations - Poisson process time generation - Brownian motion (via centers) - Initialization strategies (KMeansPlusPlus, RandomInit) - Space-specific random operations Pass an int for reproducible results across multiple function calls. When an int is provided, both random.seed() and np.random.seed() are set globally to ensure full reproducibility across all components. Pass a Generator instance for fine-grained control without affecting global state (advanced usage). Pass None for non-deterministic behavior (default). Example: >>> # Reproducible with seed (recommended) >>> sa1 = SimulatedAnnealing(points, k=3, random_state=42) >>> sa2 = SimulatedAnnealing(points, k=3, random_state=42) >>> # sa1 and sa2 will produce identical results >>> >>> # Advanced: use Generator without global state >>> rng = np.random.default_rng(42) >>> sa = SimulatedAnnealing(points, k=3, random_state=rng) >>> # Note: This only controls operations using self._rng >>> # Other components may still use global random state |
None
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If observations is empty, k <= 0, points are in different spaces, or hyperparameters are invalid. |
References
TODO: Add reference to your article: [1] Your Name. "Title of your paper". HAL/ArXiv, 2025. URL: https://...
Example
Quick convergence setup
sa = SimulatedAnnealing( ... points, k=5, ... lambda0=0.5, # Less exploration ... beta0=3.0, # Stronger drift ... step_size=0.01 ... )
Thorough search setup (avoid local minima)
sa = SimulatedAnnealing( ... points, k=5, ... lambda0=2.0, # More exploration ... beta0=1.0, # Gentler drift ... step_size=0.01 ... )
Source code in kmeanssa_ng/core/simulated_annealing.py
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 | |
calculate_energy(centers)
Calculate k-means energy for given centers based on the energy mode.
Source code in kmeanssa_ng/core/simulated_annealing.py
calculate_energy_fallback(centers, points)
Calculate k-means energy for given centers.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
centers
|
list[Center]
|
List of cluster centers. |
required |
points
|
list[Point]
|
List of points. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Average squared distance to nearest center. |
Source code in kmeanssa_ng/core/simulated_annealing.py
run_interleaved(initialization_strategy, robustification_strategy, robust_prop=0.0)
Run SA with interleaved drift and brownian motion.
Source code in kmeanssa_ng/core/simulated_annealing.py
run_sequential(initialization_strategy, robustification_strategy, robust_prop=0.0)
Run SA with sequential brownian motion then drift.
Source code in kmeanssa_ng/core/simulated_annealing.py
Space
Bases: ABC
Abstract base class for metric spaces.
A metric space provides: - Distance computation between points - Sampling of random points and centers - Cluster computation and energy calculation
Source code in kmeanssa_ng/core/abstract.py
66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 | |
calculate_energy(centers)
abstractmethod
Calculate the k-means energy (distortion) for given centers.
The energy is the sum of squared distances from each point to its nearest center.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
centers
|
list[Center]
|
List of cluster centers. |
required |
Returns:
| Type | Description |
|---|---|
float
|
The total energy (sum of squared distances). |
Source code in kmeanssa_ng/core/abstract.py
center_from_point(point)
abstractmethod
compute_clusters(centers)
abstractmethod
Assign points to their nearest center.
This method typically updates internal state or annotations indicating which cluster each point belongs to.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
centers
|
list[Center]
|
List of cluster centers. |
required |
Source code in kmeanssa_ng/core/abstract.py
distance(p1, p2)
abstractmethod
distances_from_centers(centers, target)
abstractmethod
Compute distances from multiple centers to a single target point.
This method is used by the simulated annealing algorithm to efficiently find the nearest center to a given observation point.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
centers
|
list[Center]
|
List of k centers to compute distances from. |
required |
target
|
Point
|
The target point. |
required |
Returns:
| Type | Description |
|---|---|
|
Array of shape (k,) with distances from each center to target. |
Example
Source code in kmeanssa_ng/core/abstract.py
sample_points(n, strategy)
Sample n points using the specified sampling strategy.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n
|
int
|
Number of points to sample |
required |
strategy
|
Sampling strategy defining the probability distribution. Must be a SamplingStrategy instance specific to the space type. |
required |
Returns:
| Type | Description |
|---|---|
list[Point]
|
List of n sampled points |
Example
# For quantum graphs
from kmeanssa_ng.quantum_graph.sampling import UniformNodeSampling
points = graph.sample_points(100, strategy=UniformNodeSampling())
# For Riemannian manifolds
from kmeanssa_ng.riemannian_manifold.sampling import UniformManifoldSampling
points = manifold.sample_points(100, strategy=UniformManifoldSampling())
Note
The strategy parameter is REQUIRED to avoid ambiguity about which probability distribution to use. Each space type has its own specific sampling strategies in space-specific modules.
Source code in kmeanssa_ng/core/abstract.py
run_parallel(space, n_points, k, sampling_strategy, initialization_strategy, robustification_strategy, n_runs=10, algorithm='interleaved', lambda0=1, beta0=1.0, step_size=0.1, energy_mode='uniform', robust_prop=0.0, n_jobs=-1, seeds=None, return_all=False, mp_context=None)
Run simulated annealing multiple times in parallel with different seeds.
This function executes n_runs independent simulated annealing runs in parallel, each with a different random seed. Each run samples its own observations, generates its own Poisson process, and initializes differently, ensuring complete independence between runs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
space
|
'Space'
|
The metric space to sample points from. |
required |
n_points
|
int
|
Number of points to sample for each run. |
required |
k
|
int
|
Number of clusters. |
required |
n_runs
|
int
|
Number of parallel runs to execute. |
10
|
algorithm
|
Literal['interleaved', 'sequential']
|
Which algorithm variant to use ("interleaved" or "sequential"). |
'interleaved'
|
lambda_param
|
Poisson process intensity parameter (must be > 0). |
required | |
beta
|
Inverse temperature parameter (must be > 0). |
required | |
step_size
|
float
|
Time step for updating centers (must be > 0). |
0.1
|
sampling_strategy
|
SamplingStrategy
|
Strategy for sampling points from the space (required). |
required |
initialization_strategy
|
InitializationStrategy
|
Strategy for initializing centers (required). |
required |
robustification_strategy
|
RobustificationStrategy
|
Strategy for robustifying results (required). |
required |
robust_prop
|
float
|
Proportion of final observations to use for robustification (0-1). |
0.0
|
n_jobs
|
int
|
Number of parallel jobs. -1 uses all available cores. |
-1
|
seeds
|
list[int] | None
|
Optional list of specific seeds to use. If None, generates random seeds. |
None
|
return_all
|
bool
|
If True, return all results; if False, return only the best. |
False
|
mp_context
|
Literal['fork', 'spawn', 'forkserver'] | None
|
Multiprocessing context to use ('fork', 'spawn', 'forkserver'). If None, uses the system default. Use 'fork' for Jupyter/Quarto compatibility. |
None
|
Returns:
| Type | Description |
|---|---|
list[Center] | tuple[list[Center], list[tuple[list[Center], float, int]]]
|
If return_all is False: List of best centers (lowest energy). |
list[Center] | tuple[list[Center], list[tuple[list[Center], float, int]]]
|
If return_all is True: Tuple of (best_centers, all_results) where all_results is a list of (centers, energy, seed) tuples sorted by energy. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If n_runs <= 0 or other parameters are invalid. |
Example
from kmeanssa_ng import run_parallel
# Generate a graph
graph = QuantumGraph(...)
# Run 10 parallel executions, each sampling its own 100 points
best_centers = run_parallel(graph, n_points=100, k=5, n_runs=10)
# Get all results for analysis
best, all_results = run_parallel(graph, n_points=100, k=5, n_runs=10, return_all=True)
for centers, energy, seed in all_results:
print(f"Seed {seed}: energy = {energy:.4f}")
Source code in kmeanssa_ng/core/parallel.py
101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 | |
run_parallel_with_callback(space, n_points, k, sampling_strategy, initialization_strategy, robustification_strategy, n_runs=10, algorithm='interleaved', lambda0=1.0, beta0=1.0, step_size=0.1, energy_mode='uniform', robust_prop=0.0, n_jobs=-1, seeds=None, callback=None, mp_context=None)
Run parallel simulated annealing with progress callback.
Similar to run_parallel but calls a callback function after each run completes, useful for progress tracking and real-time monitoring. Each run samples its own observations with its specific seed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
space
|
'Space'
|
The metric space to sample points from. |
required |
n_points
|
int
|
Number of points to sample for each run. |
required |
k
|
int
|
Number of clusters. |
required |
n_runs
|
int
|
Number of parallel runs to execute. |
10
|
algorithm
|
Literal['interleaved', 'sequential']
|
Which algorithm variant to use. |
'interleaved'
|
lambda_param
|
Poisson process intensity parameter. |
required | |
beta
|
Inverse temperature parameter. |
required | |
step_size
|
float
|
Time step for updating centers. |
0.1
|
robust_prop
|
float
|
Proportion for robustification. |
0.0
|
n_jobs
|
int
|
Number of parallel jobs (-1 = all cores). |
-1
|
seeds
|
list[int] | None
|
Optional list of specific seeds. |
None
|
callback
|
Callable[[int, int, float], None] | None
|
Optional function(run_index, seed, energy) called after each run. |
None
|
mp_context
|
Literal['fork', 'spawn', 'forkserver'] | None
|
Multiprocessing context to use ('fork', 'spawn', 'forkserver'). If None, uses the system default. Use 'fork' for Jupyter/Quarto compatibility. |
None
|
Returns:
| Type | Description |
|---|---|
list[Center]
|
List of best centers (lowest energy). |
Example
Source code in kmeanssa_ng/core/parallel.py
248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 | |