PCA#

class torchdr.PCA(n_components: int = 2, device: str = 'auto', distributed: str | bool = 'auto', verbose: bool = False, random_state: float | None = None, svd_driver: str | None = None, **kwargs)[source]#

Bases: DRModule

Principal Component Analysis module.

Parameters:
  • n_components (int, default=2) – Number of components to project the input data onto.

  • device (str, default="auto") – Device on which the computations are performed.

  • distributed (str or bool, default="auto") –

    Whether to use distributed mode for multi-GPU training.

    • ”auto”: Automatically detect if torch.distributed is initialized and use distributed mode if available.

    • True: Force distributed mode (requires torch.distributed to be initialized).

    • False: Disable distributed mode.

    In distributed mode, each GPU computes local statistics which are then aggregated using all-reduce operations. This is communication-efficient when the number of samples is much larger than the number of features (n >> d).

  • verbose (bool, default=False) – Whether to print information during the computations.

  • random_state (float, default=None) – Random seed for reproducibility.

  • svd_driver (str, optional) – Name of the cuSOLVER method to be used for torch.linalg.svd. This keyword argument only works on CUDA inputs. Available options are: None, gesvd, gesvdj and gesvda. Defaults to None.

mean_#

Per-feature empirical mean, calculated from the training set.

Type:

torch.Tensor of shape (1, n_features)

components_#

Principal axes in feature space, representing the directions of maximum variance in the data.

Type:

torch.Tensor of shape (n_components, n_features)

embedding_#

The transformed data after calling fit_transform.

Type:

torch.Tensor

Examples

Standard single-GPU usage:

from torchdr import PCA
import torch

X = torch.randn(1000, 50)
pca = PCA(n_components=10)
X_reduced = pca.fit_transform(X)

Multi-GPU distributed usage (launch with torchrun –nproc_per_node=4):

import torch
from torchdr import PCA

# Each GPU loads its chunk of the data
rank = torch.distributed.get_rank()
world_size = torch.distributed.get_world_size()
chunk_size = len(full_data) // world_size
X_local = full_data[rank * chunk_size:(rank + 1) * chunk_size]

# Distributed PCA - handles communication automatically
pca = PCA(n_components=50, distributed="auto")
X_transformed = pca.fit_transform(X_local)

Notes

In distributed mode:

  • Requires torch.distributed to be initialized (use torchrun or TorchDR CLI)

  • Automatically uses local_rank for GPU assignment

  • Each GPU only needs its data chunk in memory

  • Uses the covariance method: computes X.T @ X locally and aggregates via all-reduce, which has O(d^2) communication cost

transform(X: Tensor | ndarray) Tensor | ndarray[source]#

Project the input data onto the PCA components.

Parameters:

X (torch.Tensor or np.ndarray of shape (n_samples, n_features)) – Data to project onto the PCA components.

Returns:

X_new – Projected data.

Return type:

torch.Tensor or np.ndarray of shape (n_samples, n_components)

Examples using PCA:#

PCA via SVD and via AffinityMatcher

PCA via SVD and via AffinityMatcher

Incremental PCA on GPU

Incremental PCA on GPU