SinkhornAffinity

class torchdr.SinkhornAffinity(eps: float = 1.0, tol: float = 1e-05, max_iter: int = 1000, base_kernel: str = 'gaussian', tolog: bool = False, metric: str = 'sqeuclidean', zero_diag: bool = True, device: str = 'auto', keops: bool = False, verbose: bool = False, with_grad: bool = False)[source]

Bases: LogAffinity

Compute the symmetric doubly stochastic affinity matrix.

The algorithm computes the doubly stochastic matrix \(\mathbf{P}^{\mathrm{ds}}\) with controlled global entropy using the symmetric Sinkhorn algorithm [S67].

The algorithm computes the optimal dual variable \(\mathbf{f}^\star \in \mathbb{R}^n\) such that

\[\mathbf{P}^{\mathrm{ds}} \mathbf{1} = \mathbf{1} \quad \text{where} \quad \forall (i,j), \: P^{\mathrm{ds}}_{ij} = \exp(f^\star_i + f^\star_j - C_{ij} / \varepsilon) \:.\]

where :

  • \(\mathbf{C}\): symmetric pairwise distance matrix between the samples.

  • \(\varepsilon\): entropic regularization parameter.

  • \(\mathbf{1} := (1,...,1)^\top\): all-ones vector.

\(\mathbf{f}^\star\) is computed by performing dual ascent via the Sinkhorn fixed-point iteration (eq. 25 in [F19]).

Convex problem. Consists in solving the following symmetric entropic optimal transport problem [C13]:

\[\mathbf{P}^{\mathrm{ds}} \in \mathop{\arg\min}_{\mathbf{P} \in \mathcal{DS}} \: \langle \mathbf{C}, \mathbf{P} \rangle + \varepsilon \mathrm{H}(\mathbf{P})\]

where :

  • \(\mathcal{DS} := \left\{ \mathbf{P} \in \mathbb{R}_+^{n \times n}: \: \mathbf{P} = \mathbf{P}^\top \:,\: \mathbf{P} \mathbf{1} = \mathbf{1} \right\}\): set of symmetric doubly stochastic matrices.

  • \(\mathrm{H}\): (global) Shannon entropy such that \(\mathrm{H}(\mathbf{P}) := - \sum_{ij} P_{ij} (\log P_{ij} - 1)\).

Bregman projection. Another way to write this problem is to consider the KL projection of the Gaussian kernel \(\mathbf{K}_\varepsilon = \exp(- \mathbf{C} / \varepsilon)\) onto the set of doubly stochastic matrices:

\[\mathbf{P}^{\mathrm{ds}} = \mathrm{Proj}_{\mathcal{DS}}^{\mathrm{KL}}(\mathbf{K}_\varepsilon) := \mathop{\arg\min}_{\mathbf{P} \in \mathcal{DS}} \: \mathrm{KL}(\mathbf{P} \| \mathbf{K}_\varepsilon)\]

where \(\mathrm{KL}(\mathbf{P} \| \mathbf{Q}) := \sum_{ij} P_{ij} (\log (Q_{ij} / P_{ij}) - 1) + Q_{ij}\) is the Kullback Leibler divergence between \(\mathbf{P}\) and \(\mathbf{Q}\).

Parameters:
  • eps (float, optional) – Regularization parameter for the Sinkhorn algorithm.

  • tol (float, optional) – Precision threshold at which the algorithm stops.

  • max_iter (int, optional) – Number of maximum iterations for the algorithm.

  • base_kernel ({"gaussian", "student"}, optional) – Which base kernel to normalize as doubly stochastic.

  • tolog (bool, optional) – Whether to store intermediate result in a dictionary.

  • metric (str, optional) – Metric to use for computing distances (default “sqeuclidean”).

  • zero_diag (bool, optional) – Whether to set the diagonal of the distance matrix to 0.

  • device (str, optional) – Device to use for computation.

  • keops (bool, optional) – Whether to use KeOps for computation.

  • verbose (bool, optional) – Verbosity. Default is False.

  • with_grad (bool, optional (default=False)) – If True, the Sinkhorn iterations are done with gradient tracking. If False, torch.no_grad() is used for the iterations.

References

[S67]

Richard Sinkhorn, Paul Knopp (1967). Concerning nonnegative matrices and doubly stochastic matrices. Pacific Journal of Mathematics, 21(2), 343-348.

[C13]

Marco Cuturi (2013). Sinkhorn Distances: Lightspeed Computation of Optimal Transport. Advances in Neural Information Processing Systems 26 (NeurIPS).

[F19]

Jean Feydy, Thibault Séjourné, François-Xavier Vialard, Shun-ichi Amari, Alain Trouvé, Gabriel Peyré (2019). Interpolating between Optimal Transport and MMD using Sinkhorn Divergences. International Conference on Artificial Intelligence and Statistics (AISTATS).