Generators
Graph generators are essential tools for testing, benchmarking, and
experimenting with clustering algorithms on quantum graphs. The
kmeanssa-ng library provides several types of generators, from simple
two-cluster test cases to complex stochastic block models and converters
for existing NetworkX graphs.
This guide covers the main generators and their use cases.
Simple Graphs
The generate_simple_graph() function creates symmetric two-cluster
graphs connected by a bridge. These graphs are ideal for quick testing
and algorithm development because they have a predictable, controlled
structure.
from kmeanssa_ng import generate_simple_graph
# Create a basic two-cluster graph
simple_graph = generate_simple_graph(
n_a=5, # 5 neighbors for each central node
n_aa=3, # 3 second-level neighbors for each first-level node
bridge_length=2.0 # Distance between clusters
)
print(f"Simple graph: {simple_graph.number_of_nodes()} nodes, {simple_graph.number_of_edges()} edges")
print(f"Bridge length: {simple_graph.get_edge_data('A0', 'B0')['length']}")
# Graph comes pre-computed and ready to use
from kmeanssa_ng.quantum_graph.sampling import UniformNodeSampling
points = simple_graph.sample_points(50, strategy=UniformNodeSampling())
print(f"Sampled {len(points)} points for clustering")
Simple graph: 42 nodes, 41 edges
Bridge length: 2.0
Sampled 50 points for clustering
The structure consists of two symmetric clusters (A and B), each with a
central node (A0 and B0) connected to first-level neighbors, which
themselves connect to second-level neighbors. The two clusters are
linked by a single bridge edge whose length controls the separation
between clusters. All nodes are named with prefixes A and B to
identify their cluster membership.
Stochastic Block Models (SBM)
Stochastic Block Models generate graphs with multiple blocks (clusters) where edge probabilities are controlled by a probability matrix. This allows you to create graphs with tunable clustering difficulty—perfect for benchmarking.
from kmeanssa_ng import generate_sbm
# Create a two-block SBM
sbm_graph = generate_sbm(
sizes=[40, 60], # 40 nodes in block 1, 60 in block 2
p=[[0.8, 0.1], [0.1, 0.7]] # Probability matrix
)
print(f"SBM graph: {sbm_graph.number_of_nodes()} nodes, {sbm_graph.number_of_edges()} edges")
SBM graph: 100 nodes, 2124 edges
The probability matrix p[i][j] controls the probability of edges
between blocks i and j. High diagonal values (like 0.8 and 0.7
above) create tight clusters with many internal edges, while low
off-diagonal values (like 0.1) create sparse connections between
clusters. Each node receives a block attribute indicating its true
cluster membership.
SBM with Custom Edge Lengths
By default, all edges have length 1.0. You can use
generate_random_sbm() to specify different edge length distributions
for intra-cluster and inter-cluster edges:
from kmeanssa_ng import generate_random_sbm
# Create SBM with different intra/inter-cluster distances
custom_sbm = generate_random_sbm(
sizes=[50, 50],
p=[[0.7, 0.1], [0.1, 0.7]],
weights=[1, 1], # Node weights (for sampling)
lengths=[[1.0, 4.0], [4.0, 1.0]] # Long inter-cluster edges
)
print(f"Custom SBM: {custom_sbm.number_of_nodes()} nodes, {custom_sbm.number_of_edges()} edges")
# Analyze edge length distribution
edge_lengths = [d['length'] for u, v, d in custom_sbm.edges(data=True)]
print(f"Edge lengths: min={min(edge_lengths):.1f}, max={max(edge_lengths):.1f}")
print(f"Short edges (~1.0): {sum(1 for l in edge_lengths if l < 2)}")
print(f"Long edges (~4.0): {sum(1 for l in edge_lengths if l > 3)}")
Custom SBM: 100 nodes, 1925 edges
Edge lengths: min=1.0, max=4.0
Short edges (~1.0): 1680
Long edges (~4.0): 245
This is particularly useful when you want to model scenarios where inter-cluster distances are significantly larger than intra-cluster distances, making the clustering problem more realistic.
Converting NetworkX Graphs
If you already have a NetworkX graph (from your domain, from a standard
benchmark, or generated by NetworkX), you can convert it to a quantum
graph using as_quantum_graph():
import networkx as nx
from kmeanssa_ng import as_quantum_graph
# Convert the famous Karate Club graph
karate = nx.karate_club_graph()
karate_qg = as_quantum_graph(
karate,
node_weight=1.0,
edge_length=1.5,
edge_weight=1.0
)
print(f"Karate Club: {karate_qg.number_of_nodes()} nodes, {karate_qg.number_of_edges()} edges")
print(f"All edges have length 1.5: {all(d['length'] == 1.5 for u,v,d in karate_qg.edges(data=True))}")
# Ready for clustering
from kmeanssa_ng.quantum_graph.sampling import UniformNodeSampling
karate_points = karate_qg.sample_points(60, strategy=UniformNodeSampling())
print(f"Sampled {len(karate_points)} points from Karate Club graph")
Karate Club: 34 nodes, 78 edges
All edges have length 1.5: True
Sampled 60 points from Karate Club graph
This is the most common approach when working with real-world networks or standard graph datasets. You can specify uniform attributes for all nodes and edges, or preserve existing attributes from the original graph.
Complete Graphs from Distance Matrices
When you have precomputed pairwise distances between objects (e.g., from a similarity metric or dissimilarity measure), you can create a complete quantum graph where every pair of nodes is connected:
import numpy as np
from kmeanssa_ng import complete_quantum_graph
# Create complete graph from pairwise distances
objects = ['A', 'B', 'C', 'D', 'E']
distances = np.array([
[0.0, 1.0, 3.0, 3.2, 3.1], # A close to B, far from others
[1.0, 0.0, 3.1, 3.0, 2.9], # B close to A, far from others
[3.0, 3.1, 0.0, 1.2, 1.0], # C close to D,E, far from A,B
[3.2, 3.0, 1.2, 0.0, 0.8], # D close to C,E, far from A,B
[3.1, 2.9, 1.0, 0.8, 0.0] # E close to C,D, far from A,B
])
complete_graph = complete_quantum_graph(
objects=objects,
similarities=distances,
true_labels=[0, 0, 1, 1, 1] # True clustering: {A,B} vs {C,D,E}
)
print(f"Complete graph: {complete_graph.number_of_nodes()} nodes, {complete_graph.number_of_edges()} edges")
print(f"Distance A-B: {complete_graph.get_edge_data(0, 1)['length']:.1f}")
print(f"Distance A-C: {complete_graph.get_edge_data(0, 2)['length']:.1f}")
Complete graph: 5 nodes, 10 edges
Distance A-B: 1.0
Distance A-C: 3.0
This approach is useful when your data doesn’t naturally form a graph
structure but you have a distance or similarity measure. The optional
true_labels parameter stores ground-truth cluster assignments as a
group node attribute, which is helpful for evaluation.
Best Practices
When working with graph generators, keep these guidelines in mind:
- Always call
precomputing()after generation. The clustering algorithm relies on precomputed shortest paths for efficient distance queries. - Check connectivity for random graphs. If using
as_quantum_graph()with random graph models like Erdős-Rényi, verify the graph is connected or extract the largest connected component. - Choose the right generator for your use case:
generate_simple_graph(): Quick testing and developmentgenerate_sbm()orgenerate_random_sbm(): Controlled benchmarkingas_quantum_graph(): Real-world networks or standard datasetscomplete_quantum_graph(): Distance/similarity matrices- Start small: Test with small graphs first (50-100 nodes) before scaling up to larger instances.