Torch Dimensionality Reduction#

torchdr logo

TorchDR is an open-source dimensionality reduction (DR) library using PyTorch. Its goal is to provide fast GPU-compatible implementations of DR algorithms, as well as to accelerate the development of new DR methods by providing a common simplified framework.

DR aims to construct a low-dimensional representation (or embedding) of an input dataset that best preserves its geometry encoded via a pairwise affinity matrix. To this end, DR methods optimize the embedding such that its associated pairwise affinity matrix matches the input affinity. TorchDR provides a general framework for solving problems of this form. Defining a DR algorithm solely requires choosing or implementing an Affinity object for both input and embedding as well as an objective function.

Benefits of TorchDR#

Speed: supports GPU acceleration, leverages sparsity and sampling strategies with contrastive learning techniques.
Modularity: all of it is written in Python in a highly modular way, making it easy to create or transform components.
Memory efficiency: relies on sparsity and/or symbolic tensors to avoid memory overflows.
Compatibility: implemented methods are fully compatible with the sklearn API and torch ecosystem.

Getting Started#

TorchDR offers a user-friendly API similar to scikit-learn where dimensionality reduction modules can be called with the fit_transform method. It seamlessly accepts both NumPy arrays and PyTorch tensors as input, ensuring that the output matches the type and backend of the input.

from sklearn.datasets import fetch_openml
from torchdr import PCA, TSNE

x = fetch_openml("mnist_784").data.astype("float32")

x_ = PCA(n_components=50).fit_transform(x)
z = TSNE(perplexity=30).fit_transform(x_)

TorchDR is fully GPU compatible, enabling significant speed-ups when a GPU is available. To run computations on the GPU, simply set device="cuda" as shown in the example below:

z_gpu = TSNE(perplexity=30, device="cuda").fit_transform(x_)

Backends#

The backend keyword specifies which tool to use for handling kNN computations and memory-efficient symbolic computations.

To perform symbolic tensor computations on the GPU without memory limitations, you can leverage the KeOps Library. This library also allows computing kNN graphs. To enable KeOps, set backend="keops".
Alternatively, you can use backend="faiss" to rely on Faiss for fast kNN computations.
Finally, setting backend=None will use raw PyTorch for all computations.

Benchmarks#

Relying on TorchDR enables an order-of-magnitude improvement in both runtime and memory performance compared to CPU-based implementations. See the code. Stay tuned for additional benchmarks.

Dataset	Samples	Method	Runtime (sec)	Memory (MB)
Macosko	44,808	Classic UMAP (CPU)	61.3	410.9
		TorchDR UMAP (GPU)	7.7	100.4
10x Mouse Zheng	1,306,127	Classic UMAP (CPU)	1910.4	11278.1
		TorchDR UMAP (GPU)	184.4	2699.7

Examples#

See the examples folder for all examples.

MNIST. (Code) A comparison of various neighbor embedding methods on the MNIST digits dataset.

various neighbor embedding methods on MNIST

Single-cell genomics. (Code) Visualizing cells using LargeVis from TorchDR.

single cell embeddings

CIFAR100. (Code) Visualizing the CIFAR100 dataset using DINO features and TSNE.

TSNE on CIFAR100 DINO features

Implemented Features (to date)#

Affinities#

TorchDR features a wide range of affinities which can then be used as a building block for DR algorithms. It includes:

Usual affinities: ScalarProductAffinity, GaussianAffinity, StudentAffinity.
Affinities based on k-NN normalizations: SelfTuningAffinity, MAGICAffinity.
Doubly stochastic affinities: SinkhornAffinity, DoublyStochasticQuadraticAffinity.
Adaptive affinities with entropy control: EntropicAffinity, SymmetricEntropicAffinity.

Dimensionality Reduction Algorithms#

Spectral. TorchDR provides spectral embeddings calculated via eigenvalue decomposition of the affinities or their Laplacian: PCA, KernelPCA, IncrementalPCA.

Neighbor Embedding. TorchDR includes various neighbor embedding methods: SNE, TSNE, TSNEkhorn, UMAP, LargeVis, InfoTSNE.

Evaluation Metric#

TorchDR provides efficient GPU-compatible evaluation metrics: silhouette_score.

Installation#

You can install the toolbox through PyPI with:

pip install torchdr

To get the latest version, you can install it from the source code as follows:

pip install git+https://github.com/torchdr/torchdr

Finding Help#

If you have any questions or suggestions, feel free to open an issue on the issue tracker or contact Hugues Van Assel directly.