IncrementalPCA#

class torchdr.IncrementalPCA(n_components: int | None = None, copy: bool | None = True, batch_size: int | None = None, svd_driver: str | None = None, lowrank: bool = False, lowrank_q: int | None = None, lowrank_niter: int = 4, device: str = 'auto', verbose: bool = False, random_state: float | None = None)[source]#

Bases: DRModule

Incremental Principal Components Analysis (IPCA) leveraging PyTorch for GPU acceleration.

This class provides methods to fit the model on data incrementally in batches, and to transform new data based on the principal components learned during the fitting process.

It is partially useful when the dataset to be decomposed is too large to fit in memory. Adapted from Scikit-learn Incremental PCA.

Parameters:
  • n_components (int, optional) – Number of components to keep. If None, it’s set to the minimum of the number of samples and features. Defaults to None.

  • copy (bool) – If False, input data will be overwritten. Defaults to True.

  • batch_size (int, optional) – The number of samples to use for each batch. Only needed if self.fit is called. If None, it’s inferred from the data and set to 5 * n_features. Defaults to None.

  • svd_driver (str, optional) – Name of the cuSOLVER method to be used for torch.linalg.svd. This keyword argument only works on CUDA inputs. Available options are: None, gesvd, gesvdj and gesvda. Defaults to None.

  • lowrank (bool, optional) – Whether to use torch.svd_lowrank instead of torch.linalg.svd which can be faster. Defaults to False.

  • lowrank_q (int, optional) – For an adequate approximation of n_components, this parameter defaults to n_components * 2.

  • lowrank_niter (int, optional) – Number of subspace iterations to conduct for torch.svd_lowrank. Defaults to 4.

  • device (str, optional) – Device on which the computations are performed. Defaults to “auto”.

fit(X: Tensor | ndarray, check_input: bool = True)[source]#

Fit the model with data X using minibatches of size batch_size.

Parameters:
  • X (torch.Tensor or np.ndarray) – The input data tensor with shape (n_samples, n_features).

  • check_input (bool, optional) – If True, validates the input. Defaults to True.

Returns:

The fitted IPCA model.

Return type:

IncrementalPCA

fit_transform(X: Tensor | ndarray)[source]#

Fit the model with data X and apply the dimensionality reduction.

Parameters:

X (torch.Tensor or np.ndarray of shape (n_samples, n_features)) – Data on which to fit the PCA model and project onto the components.

Returns:

X_new – Projected data.

Return type:

torch.Tensor or np.ndarray of shape (n_samples, n_components)

static gen_batches(n: int, batch_size: int, min_batch_size: int = 0)[source]#

Generate slices containing batch_size elements from 0 to n.

The last slice may contain less than batch_size elements, when batch_size does not divide n.

Parameters:
  • n (int) – Size of the sequence.

  • batch_size (int) – Number of elements in each batch.

  • min_batch_size (int, optional) – Minimum number of elements in each batch. Defaults to 0.

Yields:

slice – A slice of batch_size elements.

partial_fit(X, check_input=True)[source]#

Fit incrementally the model with batch data X.

Parameters:
  • X (torch.Tensor) – The batch input data tensor with shape (n_samples, n_features).

  • check_input (bool, optional) – If True, validates the input. Defaults to True.

Returns:

The updated IPCA model after processing the batch.

Return type:

IncrementalPCA

set_fit_request(*, check_input: bool | None | str = '$UNCHANGED$') IncrementalPCA#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

check_input (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for check_input parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_partial_fit_request(*, check_input: bool | None | str = '$UNCHANGED$') IncrementalPCA#

Request metadata passed to the partial_fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to partial_fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to partial_fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

check_input (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for check_input parameter in partial_fit.

Returns:

self – The updated object.

Return type:

object

transform(X: Tensor | ndarray)[source]#

Apply dimensionality reduction to X.

The input data X is projected on the first principal components previously extracted from a training set.

Parameters:

X (torch.Tensor or np.ndarray) – New data tensor with shape (n_samples, n_features) to be transformed.

Returns:

Transformed data tensor with shape (n_samples, n_components).

Return type:

torch.Tensor

Examples using IncrementalPCA:#

Incremental PCA on GPU

Incremental PCA on GPU