kmeans_ari#
- torchdr.kmeans_ari(X: Tensor | ndarray, labels: Tensor | ndarray, n_clusters: int | None = None, niter: int = 20, nredo: int = 1, device: str | None = None, random_state: int | None = None, verbose: bool = False)[source]#
Perform K-means clustering and compute Adjusted Rand Index.
This function clusters the input data using FAISS K-means and computes the Adjusted Rand Index (ARI) between the predicted clusters and true labels. The ARI measures the similarity between two clusterings, adjusted for chance.
- Parameters:
X (torch.Tensor or np.ndarray of shape (n_samples, n_features)) – Input data to cluster.
labels (torch.Tensor or np.ndarray of shape (n_samples,)) – True labels for computing ARI.
n_clusters (int, optional) – Number of clusters. If None, uses the number of unique labels.
niter (int, default=20) – Maximum number of K-means iterations.
nredo (int, default=1) – Number of times to run K-means with different initializations, keeping the best result (lowest objective).
device (str, optional) – Device to use for ARI computation. If None, uses the input device.
random_state (int, optional) – Random seed for reproducibility.
verbose (bool, default=False) – Whether to print progress information.
- Returns:
ari_score (float or torch.Tensor) – Adjusted Rand Index between predicted clusters and true labels. Values range from -1 to 1, where 1 indicates perfect agreement, 0 indicates random labeling, and negative values indicate systematic disagreement. Returns numpy float if inputs are numpy, torch.Tensor if inputs are torch.
predicted_labels (np.ndarray or torch.Tensor of shape (n_samples,)) – Cluster assignments from K-means. Returns same type as input X.
- Raises:
ImportError – If FAISS or torchmetrics is not installed.
ValueError – If n_clusters is less than 1 or greater than n_samples.
Examples
>>> import torch >>> from torchdr.eval.kmeans import kmeans_ari >>> >>> # Generate sample data >>> X = torch.randn(1000, 50) >>> true_labels = torch.randint(0, 5, (1000,)) >>> >>> # Compute ARI score >>> ari_score, pred_labels = kmeans_ari(X, true_labels) >>> print(f"ARI Score: {ari_score:.3f}")
Notes
The Adjusted Rand Index is a measure of clustering quality that: - Accounts for chance agreement between clusterings - Is symmetric (swapping predicted and true labels gives same result) - Has expected value of 0 for random clusterings - Has maximum value of 1 for identical clusterings
FAISS K-means uses Lloyd’s algorithm with optional multiple runs. GPU acceleration is automatically used if FAISS-GPU is installed and X is on GPU.