Generators

Graph generators are essential tools for testing, benchmarking, and experimenting with clustering algorithms on quantum graphs. The kmeanssa-ng library provides several types of generators, from simple two-cluster test cases to complex stochastic block models and converters for existing NetworkX graphs.

This guide covers the main generators and their use cases.

Simple Graphs

The generate_simple_graph() function creates symmetric two-cluster graphs connected by a bridge. These graphs are ideal for quick testing and algorithm development because they have a predictable, controlled structure.

from kmeanssa_ng import generate_simple_graph

# Create a basic two-cluster graph
simple_graph = generate_simple_graph(
    n_a=5,              # 5 neighbors for each central node
    n_aa=3,             # 3 second-level neighbors for each first-level node  
    bridge_length=2.0   # Distance between clusters
)

print(f"Simple graph: {simple_graph.number_of_nodes()} nodes, {simple_graph.number_of_edges()} edges")
print(f"Bridge length: {simple_graph.get_edge_data('A0', 'B0')['length']}")

# Graph comes pre-computed and ready to use
from kmeanssa_ng.quantum_graph.sampling import UniformNodeSampling
points = simple_graph.sample_points(50, strategy=UniformNodeSampling())
print(f"Sampled {len(points)} points for clustering")

Simple graph: 42 nodes, 41 edges
Bridge length: 2.0
Sampled 50 points for clustering

The structure consists of two symmetric clusters (A and B), each with a central node (A0 and B0) connected to first-level neighbors, which themselves connect to second-level neighbors. The two clusters are linked by a single bridge edge whose length controls the separation between clusters. All nodes are named with prefixes A and B to identify their cluster membership.

Stochastic Block Models (SBM)

Stochastic Block Models generate graphs with multiple blocks (clusters) where edge probabilities are controlled by a probability matrix. This allows you to create graphs with tunable clustering difficulty—perfect for benchmarking.

from kmeanssa_ng import generate_sbm

# Create a two-block SBM
sbm_graph = generate_sbm(
    sizes=[40, 60],                    # 40 nodes in block 1, 60 in block 2
    p=[[0.8, 0.1], [0.1, 0.7]]        # Probability matrix
)

print(f"SBM graph: {sbm_graph.number_of_nodes()} nodes, {sbm_graph.number_of_edges()} edges")

SBM graph: 100 nodes, 2124 edges

The probability matrix p[i][j] controls the probability of edges between blocks i and j. High diagonal values (like 0.8 and 0.7 above) create tight clusters with many internal edges, while low off-diagonal values (like 0.1) create sparse connections between clusters. Each node receives a block attribute indicating its true cluster membership.

SBM with Custom Edge Lengths

By default, all edges have length 1.0. You can use generate_random_sbm() to specify different edge length distributions for intra-cluster and inter-cluster edges:

from kmeanssa_ng import generate_random_sbm

# Create SBM with different intra/inter-cluster distances
custom_sbm = generate_random_sbm(
    sizes=[50, 50],
    p=[[0.7, 0.1], [0.1, 0.7]],
    weights=[1, 1],                    # Node weights (for sampling)
    lengths=[[1.0, 4.0], [4.0, 1.0]]  # Long inter-cluster edges
)

print(f"Custom SBM: {custom_sbm.number_of_nodes()} nodes, {custom_sbm.number_of_edges()} edges")

# Analyze edge length distribution
edge_lengths = [d['length'] for u, v, d in custom_sbm.edges(data=True)]
print(f"Edge lengths: min={min(edge_lengths):.1f}, max={max(edge_lengths):.1f}")
print(f"Short edges (~1.0): {sum(1 for l in edge_lengths if l < 2)}")
print(f"Long edges (~4.0): {sum(1 for l in edge_lengths if l > 3)}")

Custom SBM: 100 nodes, 1925 edges
Edge lengths: min=1.0, max=4.0
Short edges (~1.0): 1680
Long edges (~4.0): 245

This is particularly useful when you want to model scenarios where inter-cluster distances are significantly larger than intra-cluster distances, making the clustering problem more realistic.

Converting NetworkX Graphs

If you already have a NetworkX graph (from your domain, from a standard benchmark, or generated by NetworkX), you can convert it to a quantum graph using as_quantum_graph():

import networkx as nx
from kmeanssa_ng import as_quantum_graph

# Convert the famous Karate Club graph
karate = nx.karate_club_graph()
karate_qg = as_quantum_graph(
    karate,
    node_weight=1.0,
    edge_length=1.5,
    edge_weight=1.0
)

print(f"Karate Club: {karate_qg.number_of_nodes()} nodes, {karate_qg.number_of_edges()} edges")
print(f"All edges have length 1.5: {all(d['length'] == 1.5 for u,v,d in karate_qg.edges(data=True))}")

# Ready for clustering
from kmeanssa_ng.quantum_graph.sampling import UniformNodeSampling
karate_points = karate_qg.sample_points(60, strategy=UniformNodeSampling())
print(f"Sampled {len(karate_points)} points from Karate Club graph")

Karate Club: 34 nodes, 78 edges
All edges have length 1.5: True
Sampled 60 points from Karate Club graph

This is the most common approach when working with real-world networks or standard graph datasets. You can specify uniform attributes for all nodes and edges, or preserve existing attributes from the original graph.

Complete Graphs from Distance Matrices

When you have precomputed pairwise distances between objects (e.g., from a similarity metric or dissimilarity measure), you can create a complete quantum graph where every pair of nodes is connected:

import numpy as np
from kmeanssa_ng import complete_quantum_graph

# Create complete graph from pairwise distances
objects = ['A', 'B', 'C', 'D', 'E']
distances = np.array([
    [0.0, 1.0, 3.0, 3.2, 3.1],  # A close to B, far from others
    [1.0, 0.0, 3.1, 3.0, 2.9],  # B close to A, far from others  
    [3.0, 3.1, 0.0, 1.2, 1.0],  # C close to D,E, far from A,B
    [3.2, 3.0, 1.2, 0.0, 0.8],  # D close to C,E, far from A,B
    [3.1, 2.9, 1.0, 0.8, 0.0]   # E close to C,D, far from A,B
])

complete_graph = complete_quantum_graph(
    objects=objects,
    similarities=distances,
    true_labels=[0, 0, 1, 1, 1]  # True clustering: {A,B} vs {C,D,E}
)

print(f"Complete graph: {complete_graph.number_of_nodes()} nodes, {complete_graph.number_of_edges()} edges")
print(f"Distance A-B: {complete_graph.get_edge_data(0, 1)['length']:.1f}")
print(f"Distance A-C: {complete_graph.get_edge_data(0, 2)['length']:.1f}")

Complete graph: 5 nodes, 10 edges
Distance A-B: 1.0
Distance A-C: 3.0

This approach is useful when your data doesn’t naturally form a graph structure but you have a distance or similarity measure. The optional true_labels parameter stores ground-truth cluster assignments as a group node attribute, which is helpful for evaluation.

Best Practices

When working with graph generators, keep these guidelines in mind:

Always call precomputing() after generation. The clustering algorithm relies on precomputed shortest paths for efficient distance queries.
Check connectivity for random graphs. If using as_quantum_graph() with random graph models like Erdős-Rényi, verify the graph is connected or extract the largest connected component.
Choose the right generator for your use case:
generate_simple_graph(): Quick testing and development
generate_sbm() or generate_random_sbm(): Controlled benchmarking
as_quantum_graph(): Real-world networks or standard datasets
complete_quantum_graph(): Distance/similarity matrices
Start small: Test with small graphs first (50-100 nodes) before scaling up to larger instances.