IncrementalPCA#
- class torchdr.IncrementalPCA(n_components: int | None = None, copy: bool | None = True, batch_size: int | None = None, svd_driver: str | None = None, lowrank: bool = False, lowrank_q: int | None = None, lowrank_niter: int = 4, device: str = 'auto', verbose: bool = False, random_state: float | None = None)[source]#
Bases:
DRModule
Incremental Principal Components Analysis (IPCA) leveraging PyTorch for GPU acceleration.
This class provides methods to fit the model on data incrementally in batches, and to transform new data based on the principal components learned during the fitting process.
It is partially useful when the dataset to be decomposed is too large to fit in memory. Adapted from Scikit-learn Incremental PCA.
- Parameters:
n_components (int, optional) – Number of components to keep. If None, it’s set to the minimum of the number of samples and features. Defaults to None.
copy (bool) – If False, input data will be overwritten. Defaults to True.
batch_size (int, optional) – The number of samples to use for each batch. Only needed if self.fit is called. If None, it’s inferred from the data and set to 5 * n_features. Defaults to None.
svd_driver (str, optional) – Name of the cuSOLVER method to be used for torch.linalg.svd. This keyword argument only works on CUDA inputs. Available options are: None, gesvd, gesvdj and gesvda. Defaults to None.
lowrank (bool, optional) – Whether to use torch.svd_lowrank instead of torch.linalg.svd which can be faster. Defaults to False.
lowrank_q (int, optional) – For an adequate approximation of n_components, this parameter defaults to n_components * 2.
lowrank_niter (int, optional) – Number of subspace iterations to conduct for torch.svd_lowrank. Defaults to 4.
device (str, optional) – Device on which the computations are performed. Defaults to “auto”.
- fit(X: Tensor | ndarray, check_input: bool = True)[source]#
Fit the model with data X using minibatches of size batch_size.
- Parameters:
X (torch.Tensor or np.ndarray) – The input data tensor with shape (n_samples, n_features).
check_input (bool, optional) – If True, validates the input. Defaults to True.
- Returns:
The fitted IPCA model.
- Return type:
- fit_transform(X: Tensor | ndarray)[source]#
Fit the model with data X and apply the dimensionality reduction.
- Parameters:
X (torch.Tensor or np.ndarray of shape (n_samples, n_features)) – Data on which to fit the PCA model and project onto the components.
- Returns:
X_new – Projected data.
- Return type:
torch.Tensor or np.ndarray of shape (n_samples, n_components)
- static gen_batches(n: int, batch_size: int, min_batch_size: int = 0)[source]#
Generate slices containing batch_size elements from 0 to n.
The last slice may contain less than batch_size elements, when batch_size does not divide n.
- partial_fit(X, check_input=True)[source]#
Fit incrementally the model with batch data X.
- Parameters:
X (torch.Tensor) – The batch input data tensor with shape (n_samples, n_features).
check_input (bool, optional) – If True, validates the input. Defaults to True.
- Returns:
The updated IPCA model after processing the batch.
- Return type:
- set_fit_request(*, check_input: bool | None | str = '$UNCHANGED$') IncrementalPCA #
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
- set_partial_fit_request(*, check_input: bool | None | str = '$UNCHANGED$') IncrementalPCA #
Request metadata passed to the
partial_fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed topartial_fit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it topartial_fit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
- transform(X: Tensor | ndarray)[source]#
Apply dimensionality reduction to X.
The input data X is projected on the first principal components previously extracted from a training set.
- Parameters:
X (torch.Tensor or np.ndarray) – New data tensor with shape (n_samples, n_features) to be transformed.
- Returns:
Transformed data tensor with shape (n_samples, n_components).
- Return type: