neighborhood_preservation#

torchdr.neighborhood_preservation(X: Tensor | ndarray, Z: Tensor | ndarray, K: int, metric: str = 'euclidean', backend: str | FaissConfig | None = None, device: str | None = None, distributed: bool | str = 'auto', return_per_sample: bool = False)[source]#

Compute K-ary neighborhood preservation between input data and embeddings.

This metric measures how well local neighborhood structure is preserved when reducing from high-dimensional input data (X) to low-dimensional embeddings (Z).

Parameters:
  • X (torch.Tensor or np.ndarray of shape (n_samples, n_features)) – Original high-dimensional data.

  • Z (torch.Tensor or np.ndarray of shape (n_samples, n_features_reduced)) – Reduced low-dimensional embeddings.

  • K (int) – Neighborhood size (number of nearest neighbors to consider).

  • metric (str, default='euclidean') – Distance metric to use for computing nearest neighbors. Options: ‘euclidean’, ‘sqeuclidean’, ‘manhattan’, ‘angular’.

  • backend ({'keops', 'faiss', None} or FaissConfig, optional) – Backend to use for k-NN computation: - ‘keops’: Memory-efficient symbolic computations - ‘faiss’: Fast approximate nearest neighbors (recommended for large datasets) - None: Standard PyTorch operations - FaissConfig object: FAISS with custom configuration

  • device (str, optional) – Device to use for computation. If None, uses input device.

  • distributed (bool or 'auto', default='auto') – Whether to use multi-GPU distributed computation. - ‘auto’: Automatically detects if torch.distributed is initialized - True: Forces distributed mode (requires torch.distributed to be initialized) - False: Disables distributed mode When enabled: - Each GPU computes preservation for its assigned chunk of samples - Automatically creates DistributedContext if torch.distributed is initialized - Device is automatically set to the local GPU rank - Backend is forced to ‘faiss’ for efficient distributed k-NN - Returns per-chunk results (no automatic gathering across GPUs) Requires launching with torchrun: torchrun --nproc_per_node=N script.py

  • return_per_sample (bool, default=False) – If True, returns per-sample preservation scores instead of the mean. Shape: (n_samples,) or (chunk_size,) in distributed mode.

Returns:

score – If return_per_sample=False: Mean neighborhood preservation across all samples. If return_per_sample=True: Per-sample neighborhood preservation scores. Value between 0 and 1, where 1 indicates perfect preservation. Returns numpy array/float if inputs are numpy, torch.Tensor otherwise.

Return type:

float or torch.Tensor

Examples

>>> import torch
>>> from torchdr.eval.neighborhood_preservation import neighborhood_preservation
>>>
>>> # Generate example data
>>> X = torch.randn(100, 50)  # High-dimensional data
>>> Z = torch.randn(100, 2)   # Low-dimensional embedding
>>>
>>> # Compute preservation score
>>> score = neighborhood_preservation(X, Z, K=10)
>>> print(f"Neighborhood preservation: {score:.3f}")

Notes

The metric computes the Jaccard similarity (intersection over union) between the K-nearest neighbor sets in the original and reduced spaces for each point, then averages across all points.

For large datasets, using backend=’faiss’ is recommended for efficiency. The metric excludes self-neighbors (i.e., the point itself).