API and Modules#

Dimensionality Reduction sklearn Compatible Estimators#

TorchDR provides a set of classes that are compatible with the sklearn API. For example, running TSNE can be done in the exact same way as running sklearn.manifold.TSNE with the same parameters. Note that the TorchDR classes work seamlessly with both Numpy and PyTorch tensors.

For all methods, TorchDR provides the ability to use GPU acceleration using device='cuda' as well as LazyTensor objects that allows to fit large scale models directly on the GPU memory without overflows using keops=True.

TorchDR supports a variety of dimensionality reduction methods. They are presented in the following sections.

Spectral Embedding#

Those classes are used to perform classical spectral embedding from a torchdr.Affinity object defined on the input data. They give the same output as using torchdr.AffinityMatcher with this same torchdr.Affinity in input space and a torchdr.ScalarProductAffinity in the embedding space. However, torchdr.AffinityMatcher relies on a gradient-based solver while the spectral embedding classes rely on the eigendecomposition of the affinity matrix.

PCA([n_components, device, verbose, ...])

Principal Component Analysis module.

KernelPCA(affinity, n_components, device, ...)

Kernel Principal Component Analysis module.

IncrementalPCA([n_components, copy, ...])

Incremental Principal Components Analysis (IPCA) leveraging PyTorch for GPU acceleration.

Neighbor Embedding#

TorchDR supports the following neighbor embedding methods.

SNE([perplexity, n_components, lr, ...])

Stochastic Neighbor Embedding (SNE) introduced in [Hinton and Roweis, 2002].

TSNE([perplexity, n_components, lr, ...])

t-Stochastic Neighbor Embedding (t-SNE) introduced in [Van der Maaten and Hinton, 2008].

TSNEkhorn([perplexity, n_components, lr, ...])

TSNEkhorn algorithm introduced in [Van Assel et al., 2024].

InfoTSNE([perplexity, n_components, lr, ...])

InfoTSNE algorithm introduced in [Damrich et al., 2022].

LargeVis([perplexity, n_components, lr, ...])

LargeVis algorithm introduced in [Tang et al., 2016].

UMAP([n_neighbors, n_components, min_dist, ...])

UMAP introduced in [McInnes et al., 2018] and further studied in [Damrich and Hamprecht, 2021].

Advanced Dimensionality Reduction with TorchDR#

TorchDR provides a set of generic classes that can be used to implement new dimensionality reduction methods. These classes provide a modular and extensible framework that allows you to focus on the core components of your method.

Base Classes#

The torchdr.DRModule class is the base class for a dimensionality reduction estimator. It is the base class for all the DR classes in TorchDR.

torchdr.AffinityMatcher is the base class for all the DR methods that use gradient-based optimization to minimize a loss function constructed from two affinities in input and embedding spaces.

DRModule([n_components, device, backend, ...])

Base class for DR methods.

AffinityMatcher(affinity_in, affinity_out[, ...])

Perform dimensionality reduction by matching two affinity matrices.

Base Neighbor Embedding Modules#

Neighbor embedding base modules inherit from the torchdr.AffinityMatcher class and implement specific strategies that are common to all neighbor embedding methods such as early exaggeration.

In particular, torchdr.SparseNeighborEmbedding relies on the sparsity of the input affinity to compute the attractive term in linear time. torchdr.SampledNeighborEmbedding inherits from this class and adds the possibility to approximate the repulsive term of the loss via negative samples.

NeighborEmbedding(affinity_in, affinity_out)

Solves the neighbor embedding problem.

SparseNeighborEmbedding(affinity_in, ...[, ...])

Solves the neighbor embedding problem with a sparse input affinity matrix.

SampledNeighborEmbedding(affinity_in, ...[, ...])

Solves the neighbor embedding problem with both sparsity and sampling.

Affinity Classes#

The following classes are used to compute the affinities between the data points. Broadly speaking, they define a notion of similarity between samples.

Simple Affinities#

GaussianAffinity([sigma, metric, zero_diag, ...])

Compute the Gaussian affinity matrix.

StudentAffinity([degrees_of_freedom, ...])

Compute the Student affinity matrix based on the Student-t distribution.

ScalarProductAffinity([device, backend, verbose])

Compute the scalar product affinity matrix.

NormalizedGaussianAffinity([sigma, metric, ...])

Compute the Gaussian affinity matrix which can be normalized along a dimension.

NormalizedStudentAffinity([...])

Compute the Student affinity matrix which can be normalized along a dimension.

Affinities Normalized by kNN Distances#

SelfTuningAffinity([K, normalization_dim, ...])

Self-tuning affinity introduced in [Zelnik-Manor and Perona, 2004].

MAGICAffinity([K, metric, zero_diag, ...])

Compute the MAGIC affinity introduced in [Van Dijk et al., 2018].

Entropic Affinities#

SinkhornAffinity([eps, tol, max_iter, ...])

Compute the symmetric doubly stochastic affinity matrix.

EntropicAffinity([perplexity, tol, ...])

Solve the directed entropic affinity problem introduced in [Hinton and Roweis, 2002].

SymmetricEntropicAffinity([perplexity, lr, ...])

Compute the symmetric entropic affinity (SEA) introduced in [Van Assel et al., 2024].

Quadratic Affinities#

DoublyStochasticQuadraticAffinity([eps, ...])

Compute the symmetric doubly stochastic affinity.

UMAP Affinities#

UMAPAffinityIn([n_neighbors, tol, max_iter, ...])

Compute the input affinity used in UMAP [McInnes et al., 2018].

UMAPAffinityOut([min_dist, spread, a, b, ...])

Compute the affinity used in embedding space in UMAP [McInnes et al., 2018].

Scores#

The following classes are used to evaluate the embeddings.

silhouette_score(X, labels[, weights, ...])

Compute the Silhouette score as the mean of silhouette coefficients.

Utils#

The following classes are used to perform various operations such as computing the pairwise distances between the data points as well as solving root search problems.

pairwise_distances(X[, Y, metric, backend, ...])

Compute pairwise distances matrix between points in two datasets.

binary_search(f, n[, begin, end, max_iter, ...])

Implement the binary search root finding method.

false_position(f, n[, begin, end, max_iter, ...])

Implement the false position root finding method.