neighborhood_preservation#

Compute K-ary neighborhood preservation between input data and embeddings.

This metric measures how well local neighborhood structure is preserved when reducing from high-dimensional input data (X) to low-dimensional embeddings (Z).

Parameters:

X (torch.Tensor or np.ndarray of shape (n_samples, n_features)) – Original high-dimensional data.
Z (torch.Tensor or np.ndarray of shape (n_samples, n_features_reduced)) – Reduced low-dimensional embeddings.
K (int) – Neighborhood size (number of nearest neighbors to consider).
metric (str, default='euclidean') – Distance metric to use for computing nearest neighbors. Options: ‘euclidean’, ‘sqeuclidean’, ‘manhattan’, ‘angular’.
backend ({'keops', 'faiss', None} or FaissConfig, optional) – Backend to use for k-NN computation: - ‘keops’: Memory-efficient symbolic computations - ‘faiss’: Fast approximate nearest neighbors (recommended for large datasets) - None: Standard PyTorch operations - FaissConfig object: FAISS with custom configuration
device (str, optional) – Device to use for computation. If None, uses input device.
distributed (bool or 'auto', default='auto') – Whether to use multi-GPU distributed computation. - ‘auto’: Automatically detects if torch.distributed is initialized - True: Forces distributed mode (requires torch.distributed to be initialized) - False: Disables distributed mode When enabled: - Each GPU computes preservation for its assigned chunk of samples - Automatically creates DistributedContext if torch.distributed is initialized - Device is automatically set to the local GPU rank - Backend is forced to ‘faiss’ for efficient distributed k-NN - Returns per-chunk results (no automatic gathering across GPUs) Requires launching with torchrun: torchrun --nproc_per_node=N script.py
return_per_sample (bool, default=False) – If True, returns per-sample preservation scores instead of the mean. Shape: (n_samples,) or (chunk_size,) in distributed mode.

Returns:

score – If return_per_sample=False: Mean neighborhood preservation across all samples. If return_per_sample=True: Per-sample neighborhood preservation scores. Value between 0 and 1, where 1 indicates perfect preservation. Returns numpy array/float if inputs are numpy, torch.Tensor otherwise.

Return type:

float or torch.Tensor

Examples

>>> import torch
>>> from torchdr.eval.neighborhood_preservation import neighborhood_preservation
>>>
>>> # Generate example data
>>> X = torch.randn(100, 50)  # High-dimensional data
>>> Z = torch.randn(100, 2)   # Low-dimensional embedding
>>>
>>> # Compute preservation score
>>> score = neighborhood_preservation(X, Z, K=10)
>>> print(f"Neighborhood preservation: {score:.3f}")

Notes

The metric computes the Jaccard similarity (intersection over union) between the K-nearest neighbor sets in the original and reduced spaces for each point, then averages across all points.

For large datasets, using backend=’faiss’ is recommended for efficiency. The metric excludes self-neighbors (i.e., the point itself).

neighborhood_preservation#

This Page