Torch Dimensionality Reduction#
TorchDR is an open-source dimensionality reduction (DR) library using PyTorch. Its goal is to provide fast GPU-compatible implementations of DR algorithms, as well as to accelerate the development of new DR methods by providing a common simplified framework.
DR aims to construct a low-dimensional representation (or embedding) of an input dataset that best preserves its geometry encoded via a pairwise affinity matrix. To this end, DR methods optimize the embedding such that its associated pairwise affinity matrix matches the input affinity. TorchDR provides a general framework for solving problems of this form. Defining a DR algorithm solely requires choosing or implementing an Affinity object for both input and embedding as well as an objective function.
Benefits of TorchDR#
Speed: supports GPU acceleration, leverages sparsity and sampling strategies with contrastive learning techniques.
Modularity: all of it is written in Python in a highly modular way, making it easy to create or transform components.
Memory efficiency: relies on sparsity and/or symbolic tensors to avoid memory overflows.
Compatibility: implemented methods are fully compatible with the sklearn API and torch ecosystem.
Getting Started#
TorchDR
offers a user-friendly API similar to scikit-learn where dimensionality reduction modules can be called with the fit_transform
method. It seamlessly accepts both NumPy arrays and PyTorch tensors as input, ensuring that the output matches the type and backend of the input.
from sklearn.datasets import fetch_openml
from torchdr import PCA, TSNE
x = fetch_openml("mnist_784").data.astype("float32")
x_ = PCA(n_components=50).fit_transform(x)
z = TSNE(perplexity=30).fit_transform(x_)
TorchDR
is fully GPU compatible, enabling significant speed-ups when a GPU is available. To run computations on the GPU, simply set device="cuda"
as shown in the example below:
z_gpu = TSNE(perplexity=30, device="cuda").fit_transform(x_)
Backends#
The backend
keyword specifies which tool to use for handling kNN computations and memory-efficient symbolic computations.
To perform symbolic tensor computations on the GPU without memory limitations, you can leverage the KeOps Library. This library also allows computing kNN graphs. To enable KeOps, set
backend="keops"
.Alternatively, you can use
backend="faiss"
to rely on Faiss for fast kNN computations.Finally, setting
backend=None
will use raw PyTorch for all computations.
Benchmarks#
Relying on TorchDR
enables an order-of-magnitude improvement in both runtime and memory performance compared to CPU-based implementations. See the code. Stay tuned for additional benchmarks.
Dataset |
Samples |
Method |
Runtime (sec) |
Memory (MB) |
---|---|---|---|---|
Macosko |
44,808 |
Classic UMAP (CPU) |
61.3 |
410.9 |
TorchDR UMAP (GPU) |
7.7 |
100.4 |
||
10x Mouse Zheng |
1,306,127 |
Classic UMAP (CPU) |
1910.4 |
11278.1 |
TorchDR UMAP (GPU) |
184.4 |
2699.7 |
Examples#
See the examples folder for all examples.
MNIST. (Code) A comparison of various neighbor embedding methods on the MNIST digits dataset.
Single-cell genomics. (Code)
Visualizing cells using LargeVis
from TorchDR
.
CIFAR100. (Code) Visualizing the CIFAR100 dataset using DINO features and TSNE.
Implemented Features (to date)#
Affinities#
TorchDR
features a wide range of affinities which can then be used as a building block for DR algorithms. It includes:
Usual affinities:
ScalarProductAffinity
,GaussianAffinity
,StudentAffinity
.Affinities based on k-NN normalizations:
SelfTuningAffinity
,MAGICAffinity
.Doubly stochastic affinities:
SinkhornAffinity
,DoublyStochasticQuadraticAffinity
.Adaptive affinities with entropy control:
EntropicAffinity
,SymmetricEntropicAffinity
.
Dimensionality Reduction Algorithms#
Spectral. TorchDR
provides spectral embeddings calculated via eigenvalue decomposition of the affinities or their Laplacian: PCA
, KernelPCA
, IncrementalPCA
.
Neighbor Embedding. TorchDR
includes various neighbor embedding methods: SNE
, TSNE
, TSNEkhorn
, UMAP
, LargeVis
, InfoTSNE
.
Evaluation Metric#
TorchDR
provides efficient GPU-compatible evaluation metrics: silhouette_score
.
Installation#
You can install the toolbox through PyPI with:
pip install torchdr
To get the latest version, you can install it from the source code as follows:
pip install git+https://github.com/torchdr/torchdr
Finding Help#
If you have any questions or suggestions, feel free to open an issue on the issue tracker or contact Hugues Van Assel directly.