neighborhood_preservation#
- torchdr.neighborhood_preservation(X: Tensor | ndarray, Z: Tensor | ndarray, K: int, metric: str = 'euclidean', backend: str | FaissConfig | None = None, device: str | None = None, distributed: bool | str = 'auto', return_per_sample: bool = False)[source]#
Compute K-ary neighborhood preservation between input data and embeddings.
This metric measures how well local neighborhood structure is preserved when reducing from high-dimensional input data (X) to low-dimensional embeddings (Z).
- Parameters:
X (torch.Tensor or np.ndarray of shape (n_samples, n_features)) – Original high-dimensional data.
Z (torch.Tensor or np.ndarray of shape (n_samples, n_features_reduced)) – Reduced low-dimensional embeddings.
K (int) – Neighborhood size (number of nearest neighbors to consider).
metric (str, default='euclidean') – Distance metric to use for computing nearest neighbors. Options: ‘euclidean’, ‘sqeuclidean’, ‘manhattan’, ‘angular’.
backend ({'keops', 'faiss', None} or FaissConfig, optional) – Backend to use for k-NN computation: - ‘keops’: Memory-efficient symbolic computations - ‘faiss’: Fast approximate nearest neighbors (recommended for large datasets) - None: Standard PyTorch operations - FaissConfig object: FAISS with custom configuration
device (str, optional) – Device to use for computation. If None, uses input device.
distributed (bool or 'auto', default='auto') – Whether to use multi-GPU distributed computation. - ‘auto’: Automatically detects if torch.distributed is initialized - True: Forces distributed mode (requires torch.distributed to be initialized) - False: Disables distributed mode When enabled: - Each GPU computes preservation for its assigned chunk of samples - Automatically creates DistributedContext if torch.distributed is initialized - Device is automatically set to the local GPU rank - Backend is forced to ‘faiss’ for efficient distributed k-NN - Returns per-chunk results (no automatic gathering across GPUs) Requires launching with torchrun:
torchrun --nproc_per_node=N script.pyreturn_per_sample (bool, default=False) – If True, returns per-sample preservation scores instead of the mean. Shape: (n_samples,) or (chunk_size,) in distributed mode.
- Returns:
score – If return_per_sample=False: Mean neighborhood preservation across all samples. If return_per_sample=True: Per-sample neighborhood preservation scores. Value between 0 and 1, where 1 indicates perfect preservation. Returns numpy array/float if inputs are numpy, torch.Tensor otherwise.
- Return type:
Examples
>>> import torch >>> from torchdr.eval.neighborhood_preservation import neighborhood_preservation >>> >>> # Generate example data >>> X = torch.randn(100, 50) # High-dimensional data >>> Z = torch.randn(100, 2) # Low-dimensional embedding >>> >>> # Compute preservation score >>> score = neighborhood_preservation(X, Z, K=10) >>> print(f"Neighborhood preservation: {score:.3f}")
Notes
The metric computes the Jaccard similarity (intersection over union) between the K-nearest neighbor sets in the original and reduced spaces for each point, then averages across all points.
For large datasets, using backend=’faiss’ is recommended for efficiency. The metric excludes self-neighbors (i.e., the point itself).