knn_label_accuracy#

torchdr.knn_label_accuracy(X: Tensor | ndarray, labels: Tensor | ndarray, k: int = 10, metric: str = 'euclidean', backend: str | FaissConfig | None = 'faiss', exclude_self: bool = True, distributed: bool | str = 'auto', return_per_sample: bool = False, device: str | None = None)[source]#

Compute k-NN label accuracy to evaluate class structure preservation.

This metric measures how well local class structure is preserved in the data representation (original or embedded space). For each point, it computes the proportion of its k-nearest neighbors that share the same class label.

Parameters:
  • X (torch.Tensor or np.ndarray of shape (n_samples, n_features)) – Data representation (can be original features or embeddings).

  • labels (torch.Tensor or np.ndarray of shape (n_samples,)) – Class labels for each sample. Can be integers or any comparable type.

  • k (int, default=10) – Number of nearest neighbors to consider for each point.

  • metric (str, default='euclidean') – Distance metric to use for computing nearest neighbors. Options: ‘euclidean’, ‘sqeuclidean’, ‘manhattan’, ‘angular’.

  • backend ({'keops', 'faiss', None} or FaissConfig, default='faiss') – Backend to use for k-NN computation: - ‘keops’: Memory-efficient symbolic computations - ‘faiss’: Fast approximate nearest neighbors (recommended for large datasets) - None: Standard PyTorch operations - FaissConfig object: FAISS with custom configuration

  • exclude_self (bool, default=True) – Whether to exclude the point itself from its neighborhood. Usually True when evaluating on the same dataset used for k-NN search.

  • distributed (bool or 'auto', default='auto') – Whether to use multi-GPU distributed computation. - ‘auto’: Automatically detects if torch.distributed is initialized - True: Forces distributed mode (requires torch.distributed to be initialized) - False: Disables distributed mode When enabled: - Each GPU computes accuracy for its assigned chunk of samples - Device is automatically set to the local GPU rank - Backend is forced to ‘faiss’ for efficient distributed k-NN - Returns per-chunk results (no automatic gathering across GPUs) Requires launching with torchrun: torchrun --nproc_per_node=N script.py

  • return_per_sample (bool, default=False) – If True, returns per-sample accuracies instead of the mean. Shape: (n_samples,) or (chunk_size,) in distributed mode.

  • device (str, optional) – Device to use for computation. If None, uses input device.

Returns:

accuracy – If return_per_sample=False: Mean k-NN label accuracy across all samples. If return_per_sample=True: Per-sample k-NN label accuracies. Value between 0 and 1, where 1 means all neighbors have same label. Returns numpy array/float if inputs are numpy, torch.Tensor otherwise.

Return type:

float or torch.Tensor

Examples

>>> import torch
>>> from torchdr.eval import knn_label_accuracy
>>>
>>> # Generate example data with 3 classes
>>> X = torch.randn(300, 50)
>>> labels = torch.repeat_interleave(torch.arange(3), 100)
>>>
>>> # Compute k-NN label accuracy
>>> accuracy = knn_label_accuracy(X, labels, k=10)
>>> print(f"k-NN label accuracy: {accuracy:.3f}")
>>>
>>> # Get per-sample accuracies
>>> per_sample = knn_label_accuracy(X, labels, k=10, return_per_sample=True)
>>> print(f"Mean: {per_sample.mean():.3f}, Std: {per_sample.std():.3f}")
>>>
>>> # Distributed computation (launch with: torchrun --nproc_per_node=4 script.py)
>>> accuracy_chunk = knn_label_accuracy(X, labels, k=10, distributed=True)

Notes

This metric is useful for: - Evaluating how well embeddings preserve class structure - Comparing different DR methods on classification tasks - Assessing the quality of learned representations

Higher values indicate better preservation of local class homogeneity. The metric is sensitive to class imbalance and noise in labels.

In distributed mode, each GPU computes accuracy for its chunk of the dataset. To get the global accuracy, gather results from all GPUs and compute the mean.