Torch Dimensionality Reduction
Github repository: https://github.com/torchdr/torchdr/.
Documentation: https://torchdr.github.io/.
TorchDR is an open-source dimensionality reduction (DR) library using PyTorch. Its goal is to accelerate the development of new DR methods by providing a common simplified framework.
DR aims to construct a low-dimensional representation (or embedding) of an input dataset that best preserves its geometry encoded via a pairwise affinity matrix . To this end, DR methods optimize the embedding such that its associated pairwise affinity matrix matches the input affinity. TorchDR provides a general framework for solving problems of this form. Defining a DR algorithm solely requires choosing or implementing an Affinity object for both input and embedding as well as an objective function.
Benefits of TorchDR include:
Modularity |
All of it is written in python in a highly modular way, making it easy to create or transform components. |
Speed |
Supports GPU acceleration, leverages sparsity and batching strategies with contrastive learning techniques. |
Memory efficiency |
Relies on sparsity and/or |
Compatibility |
Implemented methods are fully compatible with the |
Getting Started
TorchDR offers a user-friendly API similar to scikit-learn where dimensionality reduction modules can be called with the fit_transform
method. It seamlessly accepts both NumPy arrays and PyTorch tensors as input, ensuring that the output matches the type and backend of the input.
from sklearn.datasets import fetch_openml
from torchdr import PCA, TSNE
x = fetch_openml("mnist_784").data.astype("float32")
x_ = PCA(n_components=50).fit_transform(x)
z = TSNE(perplexity=30).fit_transform(x_)
TorchDR enables GPU acceleration without memory limitations thanks to the KeOps library. This can be easily enabled as follows:
z_gpu = TSNE(perplexity=30, device="cuda", keops=True).fit_transform(x_)
MNIST example. Here is a comparison of various neighbor embedding methods on the MNIST digits dataset.
The code to generate this figure is available here.
Single cell example. Here is an example of single cell embeddings using TorchDR, where the embeddings are colored by cell type and the number of cells is indicated in each title.
The code for this figure is here.
Implemented Features (to date)
Affinities
TorchDR features a wide range of affinities which can then be used as a building block for DR algorithms. It includes:
Usual affinities such that scalar product, Gaussian and Student kernels.
Affinities based on k-NN normalizations such Self-tuning affinities [22] and MAGIC [23].
Doubly stochastic affinities with entropic [5] [6] [7] [16] and quadratic [10] projections.
Adaptive affinities with entropy control [1] [4] and its symmetric version [3].
Dimensionality Reduction Algorithms
Spectral. TorchDR provides spectral embeddings [11] calculated via eigenvalue decomposition of the affinities or their Laplacian.
Neighbor Embedding. TorchDR includes various neighbor embedding methods such as SNE [1], t-SNE [2], t-SNEkhorn [3], UMAP [8], LargeVis [13] and InfoTSNE [15].
Evaluation Metric
TorchDR provides efficient GPU-compatible evaluation metrics : Silhouette score [24].
Installation
You can install the toolbox through PyPI with:
pip install torchdr
To get the latest version, you can install it from the source code as follows:
pip install git+https://github.com/torchdr/torchdr
Finding Help
If you have any questions or suggestions, feel free to open an issue on the issue tracker or contact Hugues Van Assel directly.
Citation
If you use TorchDR in your research, please cite the following reference:
Van Assel H., Courty N., Flamary R., Garivier A., Massias M., Vayer T., Vincent-Cuaz C. TorchDR URL: https://torchdr.github.io/
or in Bibtex format :
@misc{vanassel2024torchdr,
author = {Van Assel, Hugues and Courty, Nicolas and Flamary, Rémi and Garivier, Aurélien and Massias, Mathurin and Vayer, Titouan and Vincent-Cuaz, Cédric},
title = {TorchDR},
url = {https://torchdr.github.io/},
year = {2024}
}