silhouette_score#

torchdr.silhouette_score(X: Tensor | ndarray, labels: Tensor | ndarray, weights: Tensor | ndarray | None = None, metric: str = 'euclidean', device: str | None = None, backend: str | None = None, sample_size: int | None = None, random_state: int | None = None, warn: bool = True)[source]#

Compute the Silhouette score as the mean of silhouette coefficients.

Each coefficient is calculated using the mean intra-cluster distance (\(a\)) and the mean nearest-cluster distance (\(b\)) of the sample, according to the formula \((b - a) / max(a, b)\). The best value is 1 and the worst value is -1. Values near 0 indicate overlapping clusters. Negative values generally indicate that a sample has been assigned to the wrong cluster, as a different cluster is more similar.

See [Rousseeuw, 1987] for more information.

Parameters:
  • X (torch.Tensor, np.ndarray of shape (n_samples_x, n_samples_x) if) – `metric=”precomputed” else (n_samples_x, n_features) Input data as a pairwise distance matrix or a feature matrix.

  • labels (torch.Tensor or np.ndarray of shape (n_samples_x,)) – Labels associated to X.

  • weights (torch.Tensor or np.ndarray of shape (n_samples_x,), optional) – Probability vector taking into account the relative importance of samples in X. The default is None and considers uniform weights.

  • metric (str, optional) – The distance to use for computing pairwise distances. Must be an element of [“euclidean”, “manhattan”, “hyperbolic”, “precomputed”]. The default is ‘euclidean’.

  • device (str, optional) – Device to use for computations.

  • backend ({"keops", "faiss", None}, optional) – Which backend to use for handling sparsity and memory efficiency. Default is None.

  • sample_size (int, optional) – Number of samples to use when computing the score on a random subset. If sample_size is None, no sampling is used.

  • random_state (int, optional) – Random state for selecting a subset of samples. Used when sample_size is not None.

  • warn (bool, optional) – Whether to output warnings when edge cases are identified.

Returns:

silhouette_score – mean silhouette coefficients for all samples.

Return type:

float