kmeans_ari#

torchdr.kmeans_ari(X: Tensor | ndarray, labels: Tensor | ndarray, n_clusters: int | None = None, niter: int = 20, nredo: int = 1, device: str | None = None, random_state: int | None = None, verbose: bool = False)[source]#

Perform K-means clustering and compute Adjusted Rand Index.

This function clusters the input data using FAISS K-means and computes the Adjusted Rand Index (ARI) between the predicted clusters and true labels. The ARI measures the similarity between two clusterings, adjusted for chance.

Parameters:
  • X (torch.Tensor or np.ndarray of shape (n_samples, n_features)) – Input data to cluster.

  • labels (torch.Tensor or np.ndarray of shape (n_samples,)) – True labels for computing ARI.

  • n_clusters (int, optional) – Number of clusters. If None, uses the number of unique labels.

  • niter (int, default=20) – Maximum number of K-means iterations.

  • nredo (int, default=1) – Number of times to run K-means with different initializations, keeping the best result (lowest objective).

  • device (str, optional) – Device to use for ARI computation. If None, uses the input device.

  • random_state (int, optional) – Random seed for reproducibility.

  • verbose (bool, default=False) – Whether to print progress information.

Returns:

  • ari_score (float or torch.Tensor) – Adjusted Rand Index between predicted clusters and true labels. Values range from -1 to 1, where 1 indicates perfect agreement, 0 indicates random labeling, and negative values indicate systematic disagreement. Returns numpy float if inputs are numpy, torch.Tensor if inputs are torch.

  • predicted_labels (np.ndarray or torch.Tensor of shape (n_samples,)) – Cluster assignments from K-means. Returns same type as input X.

Raises:
  • ImportError – If FAISS or torchmetrics is not installed.

  • ValueError – If n_clusters is less than 1 or greater than n_samples.

Examples

>>> import torch
>>> from torchdr.eval.kmeans import kmeans_ari
>>>
>>> # Generate sample data
>>> X = torch.randn(1000, 50)
>>> true_labels = torch.randint(0, 5, (1000,))
>>>
>>> # Compute ARI score
>>> ari_score, pred_labels = kmeans_ari(X, true_labels)
>>> print(f"ARI Score: {ari_score:.3f}")

Notes

The Adjusted Rand Index is a measure of clustering quality that: - Accounts for chance agreement between clusterings - Is symmetric (swapping predicted and true labels gives same result) - Has expected value of 0 for random clusterings - Has maximum value of 1 for identical clusterings

FAISS K-means uses Lloyd’s algorithm with optional multiple runs. GPU acceleration is automatically used if FAISS-GPU is installed and X is on GPU.