Torch Dimensionality Reduction#
TorchDR
is an open-source dimensionality reduction (DR) library using PyTorch. It provides GPU-accelerated implementations of popular DR algorithms in a single unified framework.
DR aims to construct a low-dimensional representation (or embedding) of an input dataset that best preserves its geometry encoded via a pairwise affinity matrix. To this end, DR methods optimize the embedding such that its associated pairwise affinity matrix matches the input affinity. TorchDR
provides a general framework for solving problems of this form. Defining a DR algorithm solely requires choosing or implementing an Affinity object for both input and embedding as well as an objective function.
Benefits of TorchDR#
Speed: supports GPU acceleration, leverages sparsity and contrastive learning / negative sampling techniques.
Modularity: all of it is written in Python in a highly modular way, making it easy to create or transform components.
Memory efficiency: relies on sparsity and/or symbolic tensors to avoid memory overflows.
Compatibility: implemented methods are fully compatible with the
sklearn
andtorch
ecosystems.
Getting Started#
TorchDR
offers a user-friendly API similar to scikit-learn where dimensionality reduction modules can be called with the fit_transform
method. It seamlessly accepts both NumPy
arrays and PyTorch
tensors as input, ensuring that the output matches the type and backend of the input.
from sklearn.datasets import fetch_openml
from torchdr import PCA, UMAP
x = fetch_openml("mnist_784").data.astype("float32")
x_ = PCA(n_components=50).fit_transform(x)
z = UMAP(n_neighbors=30).fit_transform(x_)
TorchDR
is fully GPU compatible, enabling significant speed-ups when a GPU is available. To run computations on the GPU, simply set device="cuda"
as shown in the example below:
z_gpu = UMAP(n_neighbors=30 device="cuda").fit_transform(x_)
Methods#
Neighbor Embedding (optimal for data visualization)#
TorchDR
provides a suite of neighbor embedding methods.
Linear-time (Contrastive Learning). State-of-the-art speed on large datasets: UMAP
, LargeVis
, InfoTSNE
, PACMAP
.
Quadratic-time (Exact Repulsion). Compute the full pairwise repulsion: SNE
, TSNE
, TSNEkhorn
, COSNE
.
Remark. For quadratic-time algorithms,
TorchDR
provides exact implementations that scale linearly in memory usingbackend=keops
. ForTSNE
specifically, one can also explore fast approximations, such asFIt-SNE
implemented in tsne-cuda, which bypass full pairwise repulsion.
Spectral Embedding#
TorchDR
provides various spectral embedding methods: PCA
, IncrementalPCA
, KernelPCA
, PHATE
.
Benchmarks#
Relying on TorchDR
enables an orders-of-magnitude improvement in runtime performance compared to CPU-based implementations. See the code.
Examples#
See the examples folder for all examples.
MNIST. (Code) A comparison of various neighbor embedding methods on the MNIST digits dataset.
CIFAR100. (Code)
Visualizing the CIFAR100 dataset using DINO features and TSNE
.
Advanced Features#
Affinities#
TorchDR
features a wide range of affinities which can then be used as a building block for DR algorithms. It includes:
Affinities based on k-NN normalizations:
SelfTuningAffinity
,MAGICAffinity
,UMAPAffinityIn
,PHATEAffinity
,PACMAPAffinity
.Doubly stochastic affinities:
SinkhornAffinity
,DoublyStochasticQuadraticAffinity
.Adaptive affinities with entropy control:
EntropicAffinity
,SymmetricEntropicAffinity
.
Evaluation Metric#
TorchDR
provides efficient GPU-compatible evaluation metrics: silhouette_score
.
Backends#
The backend
keyword specifies which tool to use for handling kNN computations and memory-efficient symbolic computations.
Set
backend="faiss"
to rely on Faiss for fast kNN computations.To perform exact symbolic tensor computations on the GPU without memory limitations, you can leverage the KeOps library. This library also allows computing kNN graphs. To enable KeOps, set
backend="keops"
.Finally, setting
backend=None
will use raw PyTorch for all computations.
Installation#
Install the core torchdr
library from PyPI:
pip install torchdr
TorchDR
does not install faiss-gpu
or pykeops
by default. You need to install them separately to use the corresponding backends.
Faiss (Recommended): For the fastest k-NN computations, install Faiss. Please follow their official installation guide. A common method is using
conda
:conda install -c pytorch -c nvidia faiss-gpu
KeOps: For memory-efficient symbolic computations, install PyKeOps.
pip install pykeops
Installation from Source#
If you want to use the latest, unreleased version of torchdr
, you can install it directly from GitHub:
pip install git+https://github.com/torchdr/torchdr
Finding Help#
If you have any questions or suggestions, feel free to open an issue on the issue tracker or contact Hugues Van Assel directly.