Torch Dimensionality Reduction#
TorchDR is a high-performance dimensionality reduction library built on PyTorch. It provides GPU and multi-GPU accelerated DR methods in a unified framework with a simple, scikit-learn-compatible API.
Key Features#
Feature |
Description |
|---|---|
High Performance |
Engineered for speed with GPU acceleration, |
Multi-GPU Support |
Scale to massive datasets with built-in distributed computing. Use the |
Modular by Design |
Every component is designed to be easily customized, extended, or replaced to fit your specific needs. |
Memory-Efficient |
Natively handles sparsity and memory-efficient symbolic operations. Supports PyTorch DataLoader for streaming large datasets. |
Seamless Integration |
Fully compatible with the scikit-learn and PyTorch ecosystems. Use familiar APIs and integrate effortlessly into your existing workflows. |
Minimal Dependencies |
Requires only PyTorch, NumPy, and scikit‑learn; optionally add Faiss for fast k‑NN or KeOps for symbolic computation. |
Getting Started#
TorchDR offers a user-friendly API similar to scikit-learn where dimensionality reduction modules can be called with the fit_transform method. It seamlessly accepts both NumPy arrays and PyTorch tensors as input, ensuring that the output matches the type and backend of the input.
from sklearn.datasets import fetch_openml
from torchdr import UMAP
x = fetch_openml("mnist_784").data.astype("float32")
z = UMAP(n_neighbors=30).fit_transform(x)
GPU Acceleration: Set device="cuda" to run on GPU. By default (device="auto"), TorchDR uses the input data’s device.
z = UMAP(n_neighbors=30, device="cuda").fit_transform(x)
Multi-GPU: Use the torchdr CLI to parallelize across GPUs with no code changes:
torchdr my_script.py # Use all available GPUs
torchdr --gpus 4 my_script.py # Use 4 GPUs
torch.compile: Enable compile=True for additional speed on PyTorch 2.0+.
Backends: The backend parameter controls k-NN and memory-efficient computations:
Backend |
Description |
|---|---|
|
Fast approximate k-NN via Faiss (Recommended) |
|
Exact symbolic computation via KeOps with linear memory |
|
Raw PyTorch |
DataLoader for Large Datasets: Pass a PyTorch DataLoader instead of a tensor to stream data batch-by-batch. Requires backend="faiss".
from torch.utils.data import DataLoader, TensorDataset
dataloader = DataLoader(TensorDataset(X), batch_size=10000, shuffle=False)
z = UMAP(backend="faiss").fit_transform(dataloader)
Methods#
Neighbor Embedding#
TorchDR provides a suite of neighbor embedding methods, optimal for data visualization.
Method |
Complexity |
Multi-GPU |
Paper |
|---|---|---|---|
O(n) |
✅ |
||
O(n) |
✅ |
||
O(n) |
✅ |
||
O(n) |
❌ |
||
O(n²) |
❌ |
||
O(n²) |
❌ |
||
O(n²) |
❌ |
||
O(n²) |
❌ |
Note: Quadratic methods support
backend="keops"for exact computation with linear memory usage.
Spectral Embedding#
TorchDR provides various spectral embedding methods: PCA, IncrementalPCA, ExactIncrementalPCA, KernelPCA, PHATE. PCA and ExactIncrementalPCA support multi-GPU distributed training via the distributed="auto" parameter.
Benchmarks#
Relying on TorchDR enables an orders-of-magnitude improvement in runtime performance compared to CPU-based implementations. See the code.
Examples#
See the examples folder for all examples.
MNIST. (Code) A comparison of various neighbor embedding methods on the MNIST digits dataset.
CIFAR100. (Code)
Visualizing the CIFAR100 dataset using DINO features and TSNE.
Advanced Features#
Affinities#
TorchDR features a wide range of affinities which can then be used as a building block for DR algorithms. It includes:
Affinities based on k-NN normalizations:
SelfTuningAffinity,MAGICAffinity,UMAPAffinity,PHATEAffinity,PACMAPAffinity.Doubly stochastic affinities:
SinkhornAffinity,DoublyStochasticQuadraticAffinity.Adaptive affinities with entropy control:
EntropicAffinity,SymmetricEntropicAffinity.
Evaluation Metrics#
TorchDR provides efficient GPU-compatible evaluation metrics: silhouette_score, knn_label_accuracy, neighborhood_preservation, kmeans_ari.
Installation#
Install the core torchdr library from PyPI:
pip install torchdr # or: uv pip install torchdr
Note: torchdr does not install faiss-gpu or pykeops by default. You need to install them separately to use the corresponding backends.
Faiss (Recommended): For the fastest k-NN computations, install Faiss. Please follow their official installation guide. A common method is using
conda:conda install -c pytorch -c nvidia faiss-gpu
KeOps: For memory-efficient symbolic computations, install PyKeOps.
pip install pykeops
Installation from Source#
If you want to use the latest, unreleased version of torchdr, you can install it directly from GitHub:
pip install git+https://github.com/torchdr/torchdr
Finding Help#
If you have any questions or suggestions, feel free to open an issue on the issue tracker or contact Hugues Van Assel directly.