User Guide#

Overview #

General Formulation of Dimensionality Reduction#

DR aims to construct a low-dimensional representation (or embedding) \(\mathbf{Z} = (\mathbf{z}_1, ..., \mathbf{z}_n)^\top\) of an input dataset \(\mathbf{X} = (\mathbf{x}_1, ..., \mathbf{x}_n)^\top\) that best preserves its geometry, encoded via a pairwise affinity matrix \(\mathbf{P}\). To this end, DR methods optimize \(\mathbf{Z}\) such that a pairwise affinity matrix in the embedding space (denoted \(\mathbf{Q}\)) matches \(\mathbf{P}\). This general problem is as follows

\[\min_{\mathbf{Z}} \: \mathcal{L}( \mathbf{P}, \mathbf{Q}) \quad \text{(DR)}\]

where \(\mathcal{L}\) is typically the \(\ell_2\) or cross-entropy loss. Each DR method is thus characterized by a triplet \((\mathcal{L}, \mathbf{P}, \mathbf{Q})\).

TorchDR is structured around the above formulation \(\text{(DR)}\). Defining a DR algorithm solely requires providing an Affinity object for both input and embedding as well as a loss function \(\mathcal{L}\).

All modules follow the sklearn [Pedregosa et al., 2011] API and can be used in sklearn pipelines.

Torch GPU support and automatic differentiation#

TorchDR is built on top of torch [Paszke et al., 2019], offering GPU support and automatic differentiation. This foundation enables efficient computations and straightforward implementation of new DR methods.

To utilize GPU support, set device="cuda" when initializing any module. For CPU computations, set device="cpu".

Note

DR particularly benefits from GPU acceleration as most computations, including affinity calculations and the DR objective, involve matrix reductions that are highly parallelizable.

Handling the quadratic cost via sparsity or symbolic tensors#

Affinities naturally incur a quadratic memory cost, which can be particularly problematic when dealing with large numbers of samples, especially when using GPUs.

Many affinity metrics only require computing distances to each point’s k nearest neighbors. For large datasets (typically when \(n > 10^4\)) where the full pairwise-distance matrix won’t fit in GPU memory, TorchDR can offload these computations to the GPU-compatible kNN library Faiss. Simply set backend to faiss to leverage Faiss’s efficient implementations.

Alternatively, for exact computations or affinities that can’t be limited to kNNs, you can use symbolic (lazy) tensors to avoid memory overflows. TorchDR integrates with pykeops[Charlier et al., 2021], representing tensors as mathematical expressions evaluated directly on your data samples. By computing on-the-fly formulas instead of storing full matrices, this approach removes memory constraints entirely. However, for very large datasets (typically when \(n > 10^5\)), the computational cost becomes prohibitive. Simply set backend to keops in any module to enable it.

The above figure is taken from here.

Affinities #

Affinities are the essential building blocks of dimensionality reduction methods. TorchDR provides a wide range of affinities. See API and Modules for a complete list.

Base structure#

Affinities inherit the structure of the following Affinity() class.

torchdr.Affinity

Base class for affinity matrices.

If computations can be performed in log domain, the LogAffinity() class should be used.

torchdr.LogAffinity

Base class for affinity matrices in log domain.

Affinities are objects that can directly be called. The outputted affinity matrix is a square matrix of size (n, n) where n is the number of input samples.

Here is an example with the GaussianAffinity:

>>> import torch, torchdr
>>>
>>> n = 100
>>> data = torch.randn(n, 2)
>>> affinity = torchdr.GaussianAffinity()
>>> affinity_matrix = affinity(data)
>>> print(affinity_matrix.shape)
(100, 100)

Spotlight on affinities based on entropic projections#

A widely used family of affinities focuses on controlling the entropy of the affinity matrix. It is notably a crucial component of Neighbor-Embedding methods (see Neighbor Embedding).

These affinities are normalized such that each row sums to one, allowing the affinity matrix to be viewed as a Markov transition matrix. An adaptive bandwidth parameter then determines how the mass from each point spreads to its neighbors. The bandwidth is based on the perplexity hyperparameter which controls the number of effective neighbors for each point.

The resulting affinities can be viewed as a soft approximation of a k-nearest neighbor graph, where perplexity takes the role of k. This allows for capturing more nuances than binary weights, as closer neighbors receive a higher weight compared to those farther away. Ultimately, perplexity is an interpretable hyperparameter that governs the scale of dependencies represented in the affinity.

The following table outlines the aspects controlled by different formulations of entropic affinities. Marginal indicates whether each row of the affinity matrix has a controlled sum. Symmetry indicates whether the affinity matrix is symmetric. Entropy indicates whether each row of the affinity matrix has controlled entropy, dictated by the perplexity hyperparameter.

Affinity (associated DR method)	Marginal	Symmetry	Entropy
`NormalizedGaussianAffinity`	✅	✅	❌
`SinkhornAffinity`	✅	✅	❌
`EntropicAffinity`	✅	❌	✅
`SymmetricEntropicAffinity`	✅	✅	✅

More details on these affinities can be found in the SNEkhorn paper [Van Assel et al., 2024].

Examples using `EntropicAffinity`:#

Entropic Affinities can adapt to varying noise levels

Neighbor Embedding on genomics & equivalent affinity matcher formulation

Other various affinities#

TorchDR features other affinities that can be used in various contexts.

For instance, the UMAP [McInnes et al., 2018] algorithm relies on the affinity UMAPAffinity in input space. UMAPAffinity follows a similar construction as entropic affinities to ensure a constant number of effective neighbors, with n_neighbors playing the role of the perplexity hyperparameter.

Another example is the doubly stochastic normalization of a base affinity under the \(\ell_2\) geometry that has recently been proposed for DR [Zhang et al., 2023]. This method is analogous to SinkhornAffinity where the Shannon entropy is replaced by the \(\ell_2\) norm to recover a sparse affinity. It is available at DoublyStochasticQuadraticAffinity.

Dimensionality Reduction Modules #

TorchDR provides a wide range of dimensionality reduction (DR) methods. All DR estimators inherit the structure of the DRModule() class:

torchdr.DRModule

Base class for DR methods.

They are sklearn.base.BaseEstimator and sklearn.base.TransformerMixin classes which can be called with the fit_transform method.

Outside of Spectral methods, a closed-form solution to the DR problem is typically not available. The problem can then be solved using gradient-based optimizers.

The following classes serve as parent classes for this approach, requiring the user to provide affinity objects for the input and output spaces, referred to as affinity_in and affinity_out.

torchdr.AffinityMatcher

Perform dimensionality reduction by matching two affinity matrices.

In what follows we briefly present two families of DR algorithms: neighbor embedding methods and spectral methods.

Spectral methods#

Spectral methods correspond to choosing the scalar product affinity \(P_{ij} = \langle \mathbf{z}_i, \mathbf{z}_j \rangle\) for the embeddings and the \(\ell_2\) loss.

\[\min_{\mathbf{Z}} \: \sum_{ij} ( P_{ij} - \langle \mathbf{z}_i, \mathbf{z}_j \rangle )^{2}\]

When \(\mathbf{P}\) is positive semi-definite, this problem is commonly known as kernel Principal Component Analysis [Ham et al., 2004] and an optimal solution is given by

\[\mathbf{Z}^{\star} = (\sqrt{\lambda_1} \mathbf{v}_1, ..., \sqrt{\lambda_d} \mathbf{v}_d)^\top\]

where \(\lambda_1, ..., \lambda_d\) are the largest eigenvalues of the centered kernel matrix \(\mathbf{P}\) and \(\mathbf{v}_1, ..., \mathbf{v}_d\) are the corresponding eigenvectors.

Note

PCA (available at torchdr.PCA) corresponds to choosing \(P_{ij} = \langle \mathbf{x}_i, \mathbf{x}_j \rangle\).

Neighbor Embedding#

TorchDR aims to implement most popular neighbor embedding (NE) algorithms. In these methods, \(\mathbf{P}\) and \(\mathbf{Q}\) can be viewed as soft neighborhood graphs, hence the term neighbor embedding.

NE objectives share a common structure: they aim to minimize the weighted sum of an attractive term and a repulsive term. Interestingly, the attractive term is often the cross-entropy between the input and output affinities. Additionally, the repulsive term is typically a function of the output affinities only. Thus, the NE problem can be formulated as the following minimization problem:

\[\min_{\mathbf{Z}} \: - \lambda \sum_{ij} P_{ij} \log Q_{ij} + \mathcal{L}_{\mathrm{rep}}(\mathbf{Q}) \:.\]

In the above, \(\mathcal{L}_{\mathrm{rep}}(\mathbf{Q})\) represents the repulsive part of the loss function while \(\lambda\) is a hyperparameter that controls the balance between attraction and repulsion. The latter is called early_exaggeration_coeff in TorchDR because it is often set to a value larger than one at the beginning of the optimization.

Many NE methods can be represented within this framework. See below for some examples.

Method	Repulsive term \(\mathcal{L}_{\mathrm{rep}}\)	Affinity input \(\mathbf{P}\)	Affinity output \(\mathbf{Q}\)
`SNE`	\(\sum_{i} \log(\sum_j Q_{ij})\)	`EntropicAffinity`	`GaussianAffinity`
`TSNE`	\(\log(\sum_{ij} Q_{ij})\)	`EntropicAffinity`	`StudentAffinity`
`TSNEkhorn`	\(\sum_{ij} Q_{ij}\)	`SymmetricEntropicAffinity`	`SinkhornAffinity(base_kernel="student")`
`InfoTSNE`	\(\sum_i \log(\sum_{j \in \mathrm{Neg}(i)} Q_{ij})\)	`EntropicAffinity`	`StudentAffinity`
`UMAP`	\(- \sum_{i, j \in \mathrm{Neg}(i)} \log (1 - Q_{ij})\)	`UMAPAffinity`	None
`LargeVis`	\(- \sum_{i, j \in \mathrm{Neg}(i)} \log (1 - Q_{ij})\)	`EntropicAffinity`	`StudentAffinity`

In the above table, \(\mathrm{Neg}(i)\) denotes the set of negative samples for point \(i\). They are usually sampled uniformly at random from the dataset.

User Guide#

Overview#

General Formulation of Dimensionality Reduction#

Torch GPU support and automatic differentiation#

Handling the quadratic cost via sparsity or symbolic tensors#

Affinities#

Base structure#

Spotlight on affinities based on entropic projections#

Examples using EntropicAffinity:#

Other various affinities#

Dimensionality Reduction Modules#

Spectral methods#

Neighbor Embedding#

This Page

Overview #

Affinities #

Examples using `EntropicAffinity`:#

Dimensionality Reduction Modules #