kmeans_ari#

Perform K-means clustering and compute Adjusted Rand Index.

This function clusters the input data using FAISS K-means and computes the Adjusted Rand Index (ARI) between the predicted clusters and true labels. The ARI measures the similarity between two clusterings, adjusted for chance.

Parameters:

X (torch.Tensor or np.ndarray of shape (n_samples, n_features)) – Input data to cluster.
labels (torch.Tensor or np.ndarray of shape (n_samples,)) – True labels for computing ARI.
n_clusters (int, optional) – Number of clusters. If None, uses the number of unique labels.
niter (int, default=20) – Maximum number of K-means iterations.
nredo (int, default=1) – Number of times to run K-means with different initializations, keeping the best result (lowest objective).
device (str, optional) – Device to use for ARI computation. If None, uses the input device.
random_state (int, optional) – Random seed for reproducibility.
verbose (bool, default=False) – Whether to print progress information.

Returns:

ari_score (float or torch.Tensor) – Adjusted Rand Index between predicted clusters and true labels. Values range from -1 to 1, where 1 indicates perfect agreement, 0 indicates random labeling, and negative values indicate systematic disagreement. Returns numpy float if inputs are numpy, torch.Tensor if inputs are torch.
predicted_labels (np.ndarray or torch.Tensor of shape (n_samples,)) – Cluster assignments from K-means. Returns same type as input X.

Raises:

ImportError – If FAISS or torchmetrics is not installed.
ValueError – If n_clusters is less than 1 or greater than n_samples.

Examples

>>> import torch
>>> from torchdr.eval.kmeans import kmeans_ari
>>>
>>> # Generate sample data
>>> X = torch.randn(1000, 50)
>>> true_labels = torch.randint(0, 5, (1000,))
>>>
>>> # Compute ARI score
>>> ari_score, pred_labels = kmeans_ari(X, true_labels)
>>> print(f"ARI Score: {ari_score:.3f}")

Notes

The Adjusted Rand Index is a measure of clustering quality that: - Accounts for chance agreement between clusterings - Is symmetric (swapping predicted and true labels gives same result) - Has expected value of 0 for random clusterings - Has maximum value of 1 for identical clusterings

FAISS K-means uses Lloyd’s algorithm with optional multiple runs. GPU acceleration is automatically used if FAISS-GPU is installed and X is on GPU.

kmeans_ari#

This Page