cca_zoo.linear¶

Linear CCA methods. All classes are sklearn.base.BaseEstimator subclasses.

Base class¶

BaseModel ¶

BaseModel(latent_dimensions: int = 1, center: bool = True)

Bases: BaseEstimator, ABC

Abstract base class for all multiview CCA models.

Subclasses must implement :meth:fit. All other public methods (transform, fit_transform, score, pairwise_correlations, average_pairwise_correlations, get_factor_loadings) are provided here using the weights_ attribute set by fit.

This class inherits from :class:sklearn.base.BaseEstimator so that get_params / set_params round-trip correctly and sklearn model selection utilities work out of the box.

Parameters:

Name	Type	Description	Default
`latent_dimensions`	`int`	Number of latent dimensions to fit. Default is 1.	`1`
`center`	`bool`	Whether to subtract per-view column means before fitting. The means are stored in `means_` and applied in `transform`.	`True`

weights `property` ¶

weights: list[ndarray]

Weight matrices post-fit, one per view.

Shape is (n_features_i, latent_dimensions) for each view.

Raises:

Type	Description
`NotFittedError`	If `fit` has not been called.

fit `abstractmethod` ¶

fit(views: list[ArrayLike], y: None = None) -> BaseModel

Fit the model to multiview data.

Parameters:

Name	Type	Description	Default
`views`	`list[ArrayLike]`	List of arrays, each of shape (n_samples, n_features_i). All arrays must have the same number of rows.	required
`y`	`None`	Ignored. Present for scikit-learn API compatibility.	`None`

Returns:

Name	Type	Description
`self`	`BaseModel`	Fitted estimator.

Raises:

Type	Description
`ValueError`	If fewer than 2 views are provided.
`ValueError`	If views have inconsistent numbers of samples.

transform ¶

transform(views: list[ArrayLike]) -> list[np.ndarray]

Project views into the latent space using the fitted weights.

Parameters:

Name	Type	Description	Default
`views`	`list[ArrayLike]`	List of arrays, each of shape (n_samples, n_features_i).	required

Returns:

Type	Description
`list[ndarray]`	List of arrays, each of shape (n_samples, latent_dimensions).

Raises:

Type	Description
`NotFittedError`	If `fit` has not been called.

fit_transform ¶

fit_transform(views: list[ArrayLike], y: None = None) -> list[np.ndarray]

Fit and then transform the training data.

Equivalent to self.fit(views).transform(views) but may be more efficient for some subclasses.

Parameters:

Name	Type	Description	Default
`views`	`list[ArrayLike]`	List of arrays, each of shape (n_samples, n_features_i).	required
`y`	`None`	Ignored.	`None`

Returns:

Type	Description
`list[ndarray]`	List of arrays, each of shape (n_samples, latent_dimensions).

score ¶

score(views: list[ArrayLike], y: None = None) -> np.ndarray

Return average pairwise canonical correlations for each dimension.

Parameters:

Name	Type	Description	Default
`views`	`list[ArrayLike]`	List of arrays, each of shape (n_samples, n_features_i).	required
`y`	`None`	Ignored.	`None`

Returns:

Type	Description
`ndarray`	Array of shape `(latent_dimensions,)` with the average
`ndarray`	pairwise correlation for each canonical dimension.

pairwise_correlations ¶

pairwise_correlations(views: list[ArrayLike]) -> np.ndarray

Compute the full pairwise correlation matrix per latent dimension.

Parameters:

Name	Type	Description	Default
`views`	`list[ArrayLike]`	List of arrays, each of shape (n_samples, n_features_i).	required

Returns:

Type	Description
`ndarray`	Array of shape `(n_views, n_views, latent_dimensions)` where
`ndarray`	entry `[i, j, d]` is the Pearson correlation between the
`ndarray`	d-th canonical variate of view i and view j.

average_pairwise_correlations ¶

average_pairwise_correlations(views: list[ArrayLike]) -> np.ndarray

Return the mean off-diagonal pairwise correlation per dimension.

Parameters:

Name	Type	Description	Default
`views`	`list[ArrayLike]`	List of arrays, each of shape (n_samples, n_features_i).	required

Returns:

Type	Description
`ndarray`	Array of shape `(latent_dimensions,)` with the average
`ndarray`	off-diagonal pairwise correlation for each canonical dimension.

get_factor_loadings ¶

get_factor_loadings(views: list[ArrayLike]) -> list[np.ndarray]

Compute canonical factor loadings for each view.

A loading is the Pearson correlation between an original feature and a canonical variate. Loadings indicate which original variables drive each canonical direction.

Parameters:

Name	Type	Description	Default
`views`	`list[ArrayLike]`	List of arrays, each of shape (n_samples, n_features_i).	required

Returns:

Type	Description
`list[ndarray]`	List of arrays, each of shape (n_features_i, latent_dimensions),
`list[ndarray]`	where entry `[j, d]` is the correlation between feature j of
`list[ndarray]`	view i and the d-th canonical variate of view i.

Two-view exact methods¶

CCA ¶

CCA(latent_dimensions: int = 1, center: bool = True)

Bases: rCCA

Canonical Correlation Analysis.

Finds the pair of linear projections that maximise the Pearson correlation between two views subject to unit within-view variance constraints:

.. math::

\max_{\mathbf{w}_1, \mathbf{w}_2}
    \mathbf{w}_1^\top X_1^\top X_2 \mathbf{w}_2

\text{subject to }
\mathbf{w}_i^\top X_i^\top X_i \mathbf{w}_i = 1

This is a special case of :class:rCCA with c=0. The solution uses PCA whitening followed by an SVD of the cross-covariance matrix, which is numerically stable even for high-dimensional views.

References

Hotelling, H. (1936). Relations between two sets of variates. Biometrika, 28(3/4), 321–377.

Parameters:

Name	Type	Description	Default
`latent_dimensions`	`int`	Number of latent dimensions. Default is 1.	`1`
`center`	`bool`	Whether to subtract column means before fitting. Default True.	`True`

Example

import numpy as np rng = np.random.default_rng(0) X1 = rng.standard_normal((50, 10)) X2 = rng.standard_normal((50, 8)) model = CCA(latent_dimensions=2).fit([X1, X2]) corrs = model.score([X1, X2])

Source code in cca_zoo/linear/_cca.py

def __init__(
    self,
    latent_dimensions: int = 1,
    center: bool = True,
) -> None:
    super().__init__(
        latent_dimensions=latent_dimensions,
        center=center,
        c=0.0,
    )

fit ¶

fit(views: list[ArrayLike], y: None = None) -> CCA

Fit the CCA model.

Parameters:

Name	Type	Description	Default
`views`	`list[ArrayLike]`	List of exactly two arrays, each (n_samples, n_features_i).	required
`y`	`None`	Ignored.	`None`

Returns:

Name	Type	Description
`self`	`CCA`	Fitted estimator.

Raises:

Type	Description
`ValueError`	If the number of views is not exactly 2.
`ValueError`	If views have inconsistent numbers of samples.

Example

import numpy as np rng = np.random.default_rng(0) X1 = rng.standard_normal((50, 10)) X2 = rng.standard_normal((50, 8)) model = CCA(latent_dimensions=2).fit([X1, X2])

Source code in cca_zoo/linear/_cca.py

def fit(self, views: list[ArrayLike], y: None = None) -> CCA:
    """Fit the CCA model.

    Args:
        views: List of exactly two arrays, each (n_samples, n_features_i).
        y: Ignored.

    Returns:
        self: Fitted estimator.

    Raises:
        ValueError: If the number of views is not exactly 2.
        ValueError: If views have inconsistent numbers of samples.

    Example:
        >>> import numpy as np
        >>> rng = np.random.default_rng(0)
        >>> X1 = rng.standard_normal((50, 10))
        >>> X2 = rng.standard_normal((50, 8))
        >>> model = CCA(latent_dimensions=2).fit([X1, X2])
    """
    return super().fit(views, y)

rCCA ¶

rCCA(latent_dimensions: int = 1, center: bool = True, c: float | list[float] = 0.0)

Bases: BaseModel

Regularised Canonical Correlation Analysis (canonical ridge).

Finds the pair of linear projections of two views that maximise their correlation subject to regularised within-view variance constraints:

.. math::

\max_{\mathbf{w}_1, \mathbf{w}_2}
    \mathbf{w}_1^\top X_1^\top X_2 \mathbf{w}_2

\text{subject to }
\mathbf{w}_i^\top
\bigl((1 - c_i) X_i^\top X_i + c_i I\bigr) \mathbf{w}_i = 1

The solution is found by whitening each view with its regularised covariance matrix and computing the SVD of the resulting cross-covariance.

:class:CCA (c=0) and :class:PLS (c=1) are special cases.

References

Vinod, H. D. (1976). Canonical ridge and econometrics of joint production. Journal of Econometrics, 4(2), 147–166.

Parameters:

Name	Type	Description	Default
`latent_dimensions`	`int`	Number of latent dimensions. Default is 1.	`1`
`center`	`bool`	Whether to subtract column means before fitting. Default True.	`True`
`c`	`float \| list[float]`	Ridge regularisation parameter(s) in `[0, 1]`. A single float is applied to both views; a list `[c1, c2]` applies per-view regularisation. Default is 0 (standard CCA).	`0.0`

Example

import numpy as np rng = np.random.default_rng(0) X1 = rng.standard_normal((50, 10)) X2 = rng.standard_normal((50, 8)) model = rCCA(latent_dimensions=2, c=0.1).fit([X1, X2]) scores = model.transform([X1, X2])

Source code in cca_zoo/linear/_rcca.py

def __init__(
    self,
    latent_dimensions: int = 1,
    center: bool = True,
    c: float | list[float] = 0.0,
) -> None:
    super().__init__(latent_dimensions=latent_dimensions, center=center)
    self.c = c

fit ¶

fit(views: list[ArrayLike], y: None = None) -> rCCA

Fit the rCCA model.

Parameters:

Name	Type	Description	Default
`views`	`list[ArrayLike]`	List of exactly two arrays, each (n_samples, n_features_i).	required
`y`	`None`	Ignored.	`None`

Returns:

Name	Type	Description
`self`	`rCCA`	Fitted estimator.

Raises:

Type	Description
`ValueError`	If the number of views is not exactly 2.
`ValueError`	If views have inconsistent numbers of samples.

Source code in cca_zoo/linear/_rcca.py

def fit(self, views: list[ArrayLike], y: None = None) -> rCCA:
    """Fit the rCCA model.

    Args:
        views: List of exactly two arrays, each (n_samples, n_features_i).
        y: Ignored.

    Returns:
        self: Fitted estimator.

    Raises:
        ValueError: If the number of views is not exactly 2.
        ValueError: If views have inconsistent numbers of samples.
    """
    views_: list[np.ndarray] = self._setup_fit(views)
    if self.n_views_ != 2:
        raise ValueError(
            f"rCCA requires exactly 2 views, got {self.n_views_}. "
            "Use MCCA for more than 2 views."
        )
    c_ = perview_parameter("c", self.c, 0.0, 2)
    X1, X2 = views_
    # Whiten each view with its regularised covariance
    X1_w, W1 = svd_whiten(X1, c_[0])
    X2_w, W2 = svd_whiten(X2, c_[1])
    # SVD of the cross-covariance of whitened views
    k = min(self.latent_dimensions, X1_w.shape[1], X2_w.shape[1])
    cross_cov = X1_w.T @ X2_w / (X1.shape[0] - 1)
    U, _, Vt = np.linalg.svd(cross_cov, full_matrices=False)
    U = U[:, :k]
    Vt = Vt[:k, :]
    self.weights_: list[np.ndarray] = [W1 @ U, W2 @ Vt.T]
    return self

PLS ¶

PLS(latent_dimensions: int = 1, center: bool = True)

Bases: rCCA

Partial Least Squares (two-view).

Finds the pair of unit-norm weight vectors that maximise the covariance between the projected views:

.. math::

\max_{\mathbf{w}_1, \mathbf{w}_2}
    \mathbf{w}_1^\top X_1^\top X_2 \mathbf{w}_2

\text{subject to }
\|\mathbf{w}_i\|_2 = 1

This is equivalent to the truncated SVD of the sample cross-covariance matrix :math:X_1^\top X_2 / (n - 1), and corresponds to :class:rCCA with c=1.

References

Wold, H. (1975). Soft modelling by latent variables: the nonlinear iterative partial least squares (NIPALS) approach. Perspectives in Probability and Statistics, 117–142.

Parameters:

Name	Type	Description	Default
`latent_dimensions`	`int`	Number of latent dimensions. Default is 1.	`1`
`center`	`bool`	Whether to subtract column means before fitting. Default True.	`True`

Example

import numpy as np rng = np.random.default_rng(0) X1 = rng.standard_normal((50, 10)) X2 = rng.standard_normal((50, 8)) model = PLS(latent_dimensions=2).fit([X1, X2]) scores = model.transform([X1, X2])

Source code in cca_zoo/linear/_pls.py

def __init__(
    self,
    latent_dimensions: int = 1,
    center: bool = True,
) -> None:
    super().__init__(
        latent_dimensions=latent_dimensions,
        center=center,
        c=1.0,
    )

fit ¶

fit(views: list[ArrayLike], y: None = None) -> PLS

Fit the PLS model.

Parameters:

Name	Type	Description	Default
`views`	`list[ArrayLike]`	List of exactly two arrays, each (n_samples, n_features_i).	required
`y`	`None`	Ignored.	`None`

Returns:

Name	Type	Description
`self`	`PLS`	Fitted estimator.

Raises:

Type	Description
`ValueError`	If the number of views is not exactly 2.
`ValueError`	If views have inconsistent numbers of samples.

Example

import numpy as np rng = np.random.default_rng(0) X1 = rng.standard_normal((50, 10)) X2 = rng.standard_normal((50, 8)) model = PLS(latent_dimensions=2).fit([X1, X2])

Source code in cca_zoo/linear/_pls.py

def fit(self, views: list[ArrayLike], y: None = None) -> PLS:
    """Fit the PLS model.

    Args:
        views: List of exactly two arrays, each (n_samples, n_features_i).
        y: Ignored.

    Returns:
        self: Fitted estimator.

    Raises:
        ValueError: If the number of views is not exactly 2.
        ValueError: If views have inconsistent numbers of samples.

    Example:
        >>> import numpy as np
        >>> rng = np.random.default_rng(0)
        >>> X1 = rng.standard_normal((50, 10))
        >>> X2 = rng.standard_normal((50, 8))
        >>> model = PLS(latent_dimensions=2).fit([X1, X2])
    """
    return super().fit(views, y)

Multiview methods¶

MCCA ¶

MCCA(latent_dimensions: int = 1, center: bool = True, c: float | list[float] = 0.0, pca: bool = True, eps: float = 1e-06)

Bases: BaseModel

Multiset Canonical Correlation Analysis.

Finds linear projections of multiple (>=2) views that maximise the sum of pairwise cross-view covariances subject to within-view variance constraints. A ridge regularisation parameter c controls the trade-off between correlation and variance explained.

The primal objective is:

.. math::

\max_{\mathbf{w}} \sum_{i \neq j} \mathbf{w}_i^\top X_i^\top X_j
\mathbf{w}_j

\text{subject to } \mathbf{w}_i^\top
\bigl((1-c_i) X_i^\top X_i + c_i I\bigr) \mathbf{w}_i = 1

This is solved as a generalised eigenvalue problem:

.. math::

A \mathbf{v} = \lambda B \mathbf{v}

where :math:A is the between-view block covariance matrix and :math:B is the block-diagonal regularised within-view covariance matrix.

When pca=True (default), each view is first reduced to its principal components, which makes the problem numerically stable for high-dimensional data and allows an efficient closed-form :math:B.

References

Kettenring, J. R. (1971). Canonical analysis of several sets of variables. Biometrika, 58(3), 433–451.

Vinod, H. D. (1976). Canonical ridge and econometrics of joint production. Journal of Econometrics, 4(2), 147–166.

Parameters:

Name	Type	Description	Default
`latent_dimensions`	`int`	Number of latent dimensions. Default is 1.	`1`
`center`	`bool`	Whether to subtract column means before fitting. Default True.	`True`
`c`	`float \| list[float]`	Ridge regularisation parameter(s). Either a single float applied to all views or a list of per-view floats in `[0, 1]`. `c=0` gives standard CCA constraints; `c=1` gives sphering (PLS-like). Default is 0.	`0.0`
`pca`	`bool`	Whether to apply full PCA whitening as a pre-processing step before solving the eigenvalue problem. Highly recommended for high-dimensional data. Default is True.	`True`
`eps`	`float`	Small constant added to the eigenvalues of B to ensure positive definiteness. Default is 1e-6.	`1e-06`

Example

import numpy as np rng = np.random.default_rng(0) X1 = rng.standard_normal((50, 10)) X2 = rng.standard_normal((50, 8)) X3 = rng.standard_normal((50, 6)) model = MCCA(latent_dimensions=2).fit([X1, X2, X3]) scores = model.transform([X1, X2, X3])

Source code in cca_zoo/linear/_mcca.py

def __init__(
    self,
    latent_dimensions: int = 1,
    center: bool = True,
    c: float | list[float] = 0.0,
    pca: bool = True,
    eps: float = 1e-6,
) -> None:
    super().__init__(latent_dimensions=latent_dimensions, center=center)
    self.c = c
    self.pca = pca
    self.eps = eps

fit ¶

fit(views: list[ArrayLike], y: None = None) -> MCCA

Fit the MCCA model.

Parameters:

Name	Type	Description	Default
`views`	`list[ArrayLike]`	List of arrays, each (n_samples, n_features_i).	required
`y`	`None`	Ignored.	`None`

Returns:

Name	Type	Description
`self`	`MCCA`	Fitted estimator.

Raises:

Type	Description
`ValueError`	If fewer than 2 views are provided.
`ValueError`	If views have inconsistent numbers of samples.

Source code in cca_zoo/linear/_mcca.py

def fit(self, views: list[ArrayLike], y: None = None) -> MCCA:
    """Fit the MCCA model.

    Args:
        views: List of arrays, each (n_samples, n_features_i).
        y: Ignored.

    Returns:
        self: Fitted estimator.

    Raises:
        ValueError: If fewer than 2 views are provided.
        ValueError: If views have inconsistent numbers of samples.
    """
    views_: list[np.ndarray] = self._setup_fit(views)
    c_ = perview_parameter("c", self.c, 0.0, self.n_views_)

    if self.pca:
        pca_models = [PCA().fit(v) for v in views_]
        views_pca = [m.transform(v) for m, v in zip(pca_models, views_)]
        A = self._build_A(views_pca)
        B = self._build_B_pca(pca_models, c_)
    else:
        A = self._build_A(views_)
        B = self._build_B(views_, c_)

    splits = np.cumsum([v.shape[1] for v in (views_pca if self.pca else views_)])
    _, eigvecs = gevp(A, B, self.latent_dimensions)

    raw_weights = np.split(eigvecs, splits[:-1], axis=0)
    if self.pca:
        self.weights_: list[np.ndarray] = [
            m.components_.T @ w for m, w in zip(pca_models, raw_weights)
        ]
    else:
        self.weights_ = raw_weights
    return self

GCCA ¶

GCCA(latent_dimensions: int = 1, center: bool = True, c: float | list[float] = 0.0, view_weights: list[float] | None = None, eps: float = 1e-06)

Bases: BaseModel

Generalised Canonical Correlation Analysis.

Finds linear projections of multiple (>=2) views that maximise their joint correlation with a shared auxiliary latent vector:

.. math::

\max_{\mathbf{w}_i, T}
    \sum_{i=1}^M \mathbf{w}_i^\top X_i^\top T

\text{subject to }
T^\top T = I

The solution is obtained by constructing the weighted projection matrix:

.. math::

Q = \sum_{i=1}^M \mu_i X_i
    \bigl((1-c_i) X_i^\top X_i + c_i I\bigr)^{-1} X_i^\top

and computing its top-k eigenvectors :math:V, then recovering the per-view weights as :math:\mathbf{w}_i = X_i^+ V.

References

Tenenhaus, A., & Tenenhaus, M. (2011). Regularized generalized canonical correlation analysis. Psychometrika, 76(2), 257–284.

Parameters:

Name	Type	Description	Default
`latent_dimensions`	`int`	Number of latent dimensions. Default is 1.	`1`
`center`	`bool`	Whether to subtract column means before fitting. Default True.	`True`
`c`	`float \| list[float]`	Ridge regularisation parameter(s) in `[0, 1]`. Default is 0.	`0.0`
`view_weights`	`list[float] \| None`	Per-view weights :math:`\mu_i` in the GCCA objective. Default is equal weights (1 for all views).	`None`
`eps`	`float`	Regularisation floor for within-view matrices. Default is 1e-6.	`1e-06`

Example

import numpy as np rng = np.random.default_rng(0) X1 = rng.standard_normal((50, 10)) X2 = rng.standard_normal((50, 8)) X3 = rng.standard_normal((50, 6)) model = GCCA(latent_dimensions=2).fit([X1, X2, X3]) scores = model.transform([X1, X2, X3])

Source code in cca_zoo/linear/_gcca.py

def __init__(
    self,
    latent_dimensions: int = 1,
    center: bool = True,
    c: float | list[float] = 0.0,
    view_weights: list[float] | None = None,
    eps: float = 1e-6,
) -> None:
    super().__init__(latent_dimensions=latent_dimensions, center=center)
    self.c = c
    self.view_weights = view_weights
    self.eps = eps

fit ¶

fit(views: list[ArrayLike], y: None = None) -> GCCA

Fit the GCCA model.

Parameters:

Name	Type	Description	Default
`views`	`list[ArrayLike]`	List of arrays, each (n_samples, n_features_i).	required
`y`	`None`	Ignored.	`None`

Returns:

Name	Type	Description
`self`	`GCCA`	Fitted estimator.

Raises:

Type	Description
`ValueError`	If fewer than 2 views are provided.
`ValueError`	If views have inconsistent numbers of samples.

Source code in cca_zoo/linear/_gcca.py

def fit(self, views: list[ArrayLike], y: None = None) -> GCCA:
    """Fit the GCCA model.

    Args:
        views: List of arrays, each (n_samples, n_features_i).
        y: Ignored.

    Returns:
        self: Fitted estimator.

    Raises:
        ValueError: If fewer than 2 views are provided.
        ValueError: If views have inconsistent numbers of samples.
    """
    views_: list[np.ndarray] = self._setup_fit(views)
    c_ = perview_parameter("c", self.c, 0.0, self.n_views_)
    mu = perview_parameter("view_weights", self.view_weights, 1.0, self.n_views_)

    # Build Q = sum_i mu_i X_i (cov_i)^{-1} X_i^T
    Q = np.zeros((self.n_samples_, self.n_samples_))
    for i, (v, ci, mi) in enumerate(zip(views_, c_, mu)):
        cov_i = (1.0 - ci) * np.cov(v, rowvar=False) + ci * np.eye(v.shape[1])
        min_eig = np.linalg.eigvalsh(cov_i).min()
        if min_eig < self.eps:
            cov_i += (self.eps - min_eig) * np.eye(cov_i.shape[0])
        Q += mi * (v @ np.linalg.inv(cov_i) @ v.T)

    _, eigvecs = gevp(Q, None, self.latent_dimensions)
    T = eigvecs[:, : self.latent_dimensions]  # (n_samples, k)
    self.weights_: list[np.ndarray] = [np.linalg.pinv(v) @ T for v in views_]
    return self

TCCA ¶

TCCA(latent_dimensions: int = 1, center: bool = True, c: float | list[float] = 0.0, eps: float = 1e-06, random_state: int | None = None)

Bases: BaseModel

Tensor Canonical Correlation Analysis.

Extends CCA to more than two views by exploiting higher-order cross-view correlations via a tensor product structure. The method constructs the order-M cross-moment tensor:

.. math::

\mathcal{M}_{p_1 p_2 \ldots p_M}
    = \frac{1}{n} \sum_{i=1}^n
        \tilde{x}_{1,i}^{(p_1)}
        \tilde{x}_{2,i}^{(p_2)}
        \cdots
        \tilde{x}_{M,i}^{(p_M)}

where :math:\tilde{X}_j = X_j \Sigma_j^{-1/2} are the whitened views, and then decomposes :math:\mathcal{M} using PARAFAC to recover the canonical directions.

References

Kim, T.-K., Wong, S.-F., & Cipolla, R. (2007). Tensor canonical correlation analysis for action classification. CVPR 2007. IEEE.

Parameters:

Name	Type	Description	Default
`latent_dimensions`	`int`	Number of latent dimensions. Default is 1.	`1`
`center`	`bool`	Whether to subtract column means before fitting. Default True.	`True`
`c`	`float \| list[float]`	Ridge regularisation in `[0, 1]`. Default is 0.	`0.0`
`eps`	`float`	Regularisation floor for within-view covariance matrices.	`1e-06`
`random_state`	`int \| None`	Seed for reproducibility (passed to PARAFAC).	`None`

Example

import numpy as np rng = np.random.default_rng(0) X1 = rng.standard_normal((50, 5)) X2 = rng.standard_normal((50, 5)) X3 = rng.standard_normal((50, 5)) model = TCCA(latent_dimensions=2, random_state=0).fit([X1, X2, X3]) scores = model.transform([X1, X2, X3])

Source code in cca_zoo/linear/_tcca.py

def __init__(
    self,
    latent_dimensions: int = 1,
    center: bool = True,
    c: float | list[float] = 0.0,
    eps: float = 1e-6,
    random_state: int | None = None,
) -> None:
    super().__init__(latent_dimensions=latent_dimensions, center=center)
    self.c = c
    self.eps = eps
    self.random_state = random_state

fit ¶

fit(views: list[ArrayLike], y: None = None) -> TCCA

Fit the TCCA model.

Parameters:

Name	Type	Description	Default
`views`	`list[ArrayLike]`	List of arrays, each (n_samples, n_features_i).	required
`y`	`None`	Ignored.	`None`

Returns:

Name	Type	Description
`self`	`TCCA`	Fitted estimator.

Raises:

Type	Description
`ValueError`	If fewer than 2 views are provided.
`ValueError`	If views have inconsistent numbers of samples.

Source code in cca_zoo/linear/_tcca.py

def fit(self, views: list[ArrayLike], y: None = None) -> TCCA:
    """Fit the TCCA model.

    Args:
        views: List of arrays, each (n_samples, n_features_i).
        y: Ignored.

    Returns:
        self: Fitted estimator.

    Raises:
        ValueError: If fewer than 2 views are provided.
        ValueError: If views have inconsistent numbers of samples.
    """
    views_: list[np.ndarray] = self._setup_fit(views)
    c_ = perview_parameter("c", self.c, 0.0, self.n_views_)
    whitened, cov_invsqrt = self._whiten_views(views_, c_)

    # Build cross-moment tensor via sequential outer products
    M: np.ndarray | None = None
    for i, wv in enumerate(whitened):
        if M is None:
            M = wv
        else:
            for _ in range(len(M.shape) - 1):
                wv = np.expand_dims(wv, 1)
            M = np.expand_dims(M, -1) @ wv
    assert M is not None
    M = np.mean(M, 0)

    tl.set_backend("numpy")
    parafac_result = parafac(
        M,
        self.latent_dimensions,
        verbose=False,
        random_state=self.random_state,
    )
    self.weights_: list[np.ndarray] = [
        cov_invsqrt[i] @ fac for i, fac in enumerate(parafac_result.factors)
    ]
    return self

Gradient-descent methods¶

PLS_EY ¶

PLS_EY(latent_dimensions: int = 1, center: bool = True, learning_rate: float = 0.01, max_iter: int = 1000, batch_size: int | None = None, tol: float = 1e-06, random_state: int | None = None)

Bases: BaseModel

Stochastic Eckart-Young PLS for large-scale data.

Optimises the Eckart-Young (EY) objective for PLS by mini-batch Riemannian gradient descent on the Stiefel manifold:

.. math::

\min_{U, V \,:\, U^\top U = I,\, V^\top V = I}
    \left\| X_1 U - X_2 V \right\|_F^2

which is equivalent to maximising :math:\mathrm{tr}(U^\top X_1^\top X_2 V) (the PLS objective). At each step the Euclidean gradient is projected onto the tangent space of the Stiefel manifold, and the result is retracted back to the manifold via polar decomposition.

Suitable for high-dimensional or streaming data where forming the full (p × p) cross-covariance matrix is too expensive.

References

Gemp, I., McWilliams, B., Vernade, C., & Graepel, T. (2022). EigenGame Unloaded: When playing games is better than optimizing. ICLR 2022.

Parameters:

Name	Type	Description	Default
`latent_dimensions`	`int`	Number of latent dimensions. Default is 1.	`1`
`center`	`bool`	Whether to subtract column means. Default True.	`True`
`learning_rate`	`float`	Riemannian gradient step size. Default is 1e-2.	`0.01`
`max_iter`	`int`	Number of gradient iterations. Default is 1000.	`1000`
`batch_size`	`int \| None`	Mini-batch size. `None` uses the full dataset.	`None`
`tol`	`float`	Convergence tolerance on the objective change. Default is 1e-6.	`1e-06`
`random_state`	`int \| None`	Seed for reproducibility.	`None`

Example

import numpy as np rng = np.random.default_rng(0) X1 = rng.standard_normal((200, 500)) X2 = rng.standard_normal((200, 400)) model = PLS_EY(latent_dimensions=4, batch_size=64, random_state=0) model.fit([X1, X2])

Source code in cca_zoo/linear/_gradient.py

def __init__(
    self,
    latent_dimensions: int = 1,
    center: bool = True,
    learning_rate: float = 1e-2,
    max_iter: int = 1000,
    batch_size: int | None = None,
    tol: float = 1e-6,
    random_state: int | None = None,
) -> None:
    super().__init__(latent_dimensions=latent_dimensions, center=center)
    self.learning_rate = learning_rate
    self.max_iter = max_iter
    self.batch_size = batch_size
    self.tol = tol
    self.random_state = random_state

fit ¶

fit(views: list[ArrayLike], y: None = None) -> PLS_EY

Fit PLS_EY by Riemannian gradient descent.

Parameters:

Name	Type	Description	Default
`views`	`list[ArrayLike]`	List of arrays, each (n_samples, n_features_i).	required
`y`	`None`	Ignored.	`None`

Returns:

Name	Type	Description
`self`	`PLS_EY`	Fitted estimator.

Raises:

Type	Description
`ValueError`	If fewer than 2 views are provided.
`ValueError`	If views have inconsistent numbers of samples.

Source code in cca_zoo/linear/_gradient.py

def fit(self, views: list[ArrayLike], y: None = None) -> PLS_EY:
    """Fit PLS_EY by Riemannian gradient descent.

    Args:
        views: List of arrays, each (n_samples, n_features_i).
        y: Ignored.

    Returns:
        self: Fitted estimator.

    Raises:
        ValueError: If fewer than 2 views are provided.
        ValueError: If views have inconsistent numbers of samples.
    """
    views_: list[np.ndarray] = self._setup_fit(views)
    rng = np.random.default_rng(self.random_state)
    n = self.n_samples_
    bs = n if self.batch_size is None else min(self.batch_size, n)

    # Initialise weights on Stiefel manifold
    W = [
        _stiefel_retract(rng.standard_normal((p, self.latent_dimensions)))
        for p in self.n_features_in_
    ]
    prev_obj = np.inf
    for iteration in range(self.max_iter):
        idx = rng.choice(n, bs, replace=False)
        batch = [v[idx] for v in views_]
        obj, W = self._step(batch, W)
        if abs(prev_obj - obj) < self.tol:
            logger.debug("PLS_EY converged at iteration %d", iteration)
            break
        prev_obj = obj
    self.weights_ = W
    return self

CCA_EY ¶

CCA_EY(latent_dimensions: int = 1, center: bool = True, c: float | list[float] = 0.0, learning_rate: float = 0.01, max_iter: int = 1000, batch_size: int | None = None, tol: float = 1e-06, random_state: int | None = None)

Bases: PLS_EY

Eckart-Young CCA for large-scale data.

Equivalent to :class:PLS_EY but applies per-view PCA whitening before the gradient updates, so the resulting objective is the CCA correlation rather than covariance. This makes the method applicable to views with very different scales.

The whitening pre-processing is computed once at the start of fit using the full data, then the gradient updates operate in the whitened space.

References

Gemp, I., McWilliams, B., Vernade, C., & Graepel, T. (2022). EigenGame Unloaded: When playing games is better than optimizing. ICLR 2022.

Parameters:

Name	Type	Description	Default
`latent_dimensions`	`int`	Number of latent dimensions. Default is 1.	`1`
`center`	`bool`	Whether to subtract column means. Default True.	`True`
`c`	`float \| list[float]`	Ridge regularisation parameter(s) in `[0, 1]`. Default is 0 (standard CCA whitening); increase for noisy high-dimensional data.	`0.0`
`learning_rate`	`float`	Riemannian gradient step size. Default is 1e-2.	`0.01`
`max_iter`	`int`	Number of gradient iterations. Default is 1000.	`1000`
`batch_size`	`int \| None`	Mini-batch size. `None` uses the full dataset.	`None`
`tol`	`float`	Convergence tolerance. Default is 1e-6.	`1e-06`
`random_state`	`int \| None`	Seed for reproducibility.	`None`

Example

import numpy as np rng = np.random.default_rng(0) X1 = rng.standard_normal((200, 500)) X2 = rng.standard_normal((200, 400)) model = CCA_EY(latent_dimensions=4, c=0.1, batch_size=64, random_state=0) model.fit([X1, X2])

Source code in cca_zoo/linear/_gradient.py

def __init__(
    self,
    latent_dimensions: int = 1,
    center: bool = True,
    c: float | list[float] = 0.0,
    learning_rate: float = 1e-2,
    max_iter: int = 1000,
    batch_size: int | None = None,
    tol: float = 1e-6,
    random_state: int | None = None,
) -> None:
    super().__init__(
        latent_dimensions=latent_dimensions,
        center=center,
        learning_rate=learning_rate,
        max_iter=max_iter,
        batch_size=batch_size,
        tol=tol,
        random_state=random_state,
    )
    self.c = c

fit ¶

fit(views: list[ArrayLike], y: None = None) -> CCA_EY

Fit CCA_EY with whitening pre-processing.

Parameters:

Name	Type	Description	Default
`views`	`list[ArrayLike]`	List of arrays, each (n_samples, n_features_i).	required
`y`	`None`	Ignored.	`None`

Returns:

Name	Type	Description
`self`	`CCA_EY`	Fitted estimator.

Raises:

Type	Description
`ValueError`	If fewer than 2 views are provided.
`ValueError`	If views have inconsistent numbers of samples.

Source code in cca_zoo/linear/_gradient.py

def fit(self, views: list[ArrayLike], y: None = None) -> CCA_EY:
    """Fit CCA_EY with whitening pre-processing.

    Args:
        views: List of arrays, each (n_samples, n_features_i).
        y: Ignored.

    Returns:
        self: Fitted estimator.

    Raises:
        ValueError: If fewer than 2 views are provided.
        ValueError: If views have inconsistent numbers of samples.
    """
    views_: list[np.ndarray] = self._setup_fit(views)
    c_ = perview_parameter("c", self.c, 0.0, self.n_views_)
    # Whiten each view; store whitening matrices to back-project weights
    whitened = []
    self._whiten_mats: list[np.ndarray] = []
    for v, ci in zip(views_, c_):
        v_w, W_whiten = svd_whiten(v, ci)
        whitened.append(v_w)
        self._whiten_mats.append(W_whiten)

    rng = np.random.default_rng(self.random_state)
    n = self.n_samples_
    bs = n if self.batch_size is None else min(self.batch_size, n)
    latent_dims_clamped = min(
        self.latent_dimensions, *[w.shape[1] for w in whitened]
    )
    W_white = [
        _stiefel_retract(rng.standard_normal((w.shape[1], latent_dims_clamped)))
        for w in whitened
    ]
    prev_obj = np.inf
    for iteration in range(self.max_iter):
        idx = rng.choice(n, bs, replace=False)
        batch = [v[idx] for v in whitened]
        obj, W_white = self._step(batch, W_white)
        if abs(prev_obj - obj) < self.tol:
            logger.debug("CCA_EY converged at iteration %d", iteration)
            break
        prev_obj = obj
    # Back-project from whitened space to original space
    self.weights_ = [wm @ ww for wm, ww in zip(self._whiten_mats, W_white)]
    return self

MCCA_EY ¶

MCCA_EY(latent_dimensions: int = 1, center: bool = True, c: float | list[float] = 0.0, learning_rate: float = 0.01, max_iter: int = 1000, batch_size: int | None = None, tol: float = 1e-06, random_state: int | None = None)

Bases: CCA_EY

Eckart-Young multiview CCA for large-scale data (>=2 views).

Extends :class:CCA_EY to handle more than two views by optimising the multiview EY loss:

.. math::

\min_{\{W_i\}} \sum_{i \neq j}
    \left\| \tilde{X}_i W_i - \tilde{X}_j W_j \right\|_F^2

where :math:\tilde{X}_i are the whitened views, and all weight matrices are constrained to lie on the Stiefel manifold.

References

Gemp, I., McWilliams, B., Vernade, C., & Graepel, T. (2022). EigenGame Unloaded: When playing games is better than optimizing. ICLR 2022.

Parameters:

Name	Type	Description	Default
`latent_dimensions`	`int`	Number of latent dimensions. Default is 1.	`1`
`center`	`bool`	Whether to subtract column means. Default True.	`True`
`c`	`float \| list[float]`	Ridge regularisation parameter(s) in `[0, 1]`. Default is 0.	`0.0`
`learning_rate`	`float`	Riemannian gradient step size. Default is 1e-2.	`0.01`
`max_iter`	`int`	Number of gradient iterations. Default is 1000.	`1000`
`batch_size`	`int \| None`	Mini-batch size. `None` uses the full dataset.	`None`
`tol`	`float`	Convergence tolerance. Default is 1e-6.	`1e-06`
`random_state`	`int \| None`	Seed for reproducibility.	`None`

Example

import numpy as np rng = np.random.default_rng(0) X1 = rng.standard_normal((200, 500)) X2 = rng.standard_normal((200, 400)) X3 = rng.standard_normal((200, 300)) model = MCCA_EY(latent_dimensions=4, c=0.1, batch_size=64, random_state=0) model.fit([X1, X2, X3])

Source code in cca_zoo/linear/_gradient.py

def __init__(
    self,
    latent_dimensions: int = 1,
    center: bool = True,
    c: float | list[float] = 0.0,
    learning_rate: float = 1e-2,
    max_iter: int = 1000,
    batch_size: int | None = None,
    tol: float = 1e-6,
    random_state: int | None = None,
) -> None:
    super().__init__(
        latent_dimensions=latent_dimensions,
        center=center,
        learning_rate=learning_rate,
        max_iter=max_iter,
        batch_size=batch_size,
        tol=tol,
        random_state=random_state,
    )
    self.c = c

fit ¶

fit(views: list[ArrayLike], y: None = None) -> MCCA_EY

Fit MCCA_EY for 2 or more views.

Parameters:

Name	Type	Description	Default
`views`	`list[ArrayLike]`	List of arrays, each (n_samples, n_features_i).	required
`y`	`None`	Ignored.	`None`

Returns:

Name	Type	Description
`self`	`MCCA_EY`	Fitted estimator.

Raises:

Type	Description
`ValueError`	If fewer than 2 views are provided.
`ValueError`	If views have inconsistent numbers of samples.

Source code in cca_zoo/linear/_gradient.py

def fit(self, views: list[ArrayLike], y: None = None) -> MCCA_EY:
    """Fit MCCA_EY for 2 or more views.

    Args:
        views: List of arrays, each (n_samples, n_features_i).
        y: Ignored.

    Returns:
        self: Fitted estimator.

    Raises:
        ValueError: If fewer than 2 views are provided.
        ValueError: If views have inconsistent numbers of samples.
    """
    super().fit(views, y)
    return self

Sparse / iterative methods¶

PLS_ALS ¶

PLS_ALS(latent_dimensions: int = 1, center: bool = True, max_iter: int = 500, tol: float = 1e-06, random_state: int | None = None)

Bases: _BaseIterative

Alternating Least Squares variant of Partial Least Squares.

Maximises the sum of cross-view covariances using simple power-iteration updates, without regularisation:

.. math::

\mathbf{w}_i \leftarrow
    \frac{X_i^\top \bar{\mathbf{s}}_{\neg i}}
         {\|X_i^\top \bar{\mathbf{s}}_{\neg i}\|_2}

where :math:\bar{\mathbf{s}}_{\neg i} is the normalised sum of projected scores from all views except :math:i.

Parameters:

Name	Type	Description	Default
`latent_dimensions`	`int`	Number of latent dimensions. Default is 1.	`1`
`center`	`bool`	Whether to subtract column means. Default True.	`True`
`max_iter`	`int`	Maximum ALS iterations per dimension. Default is 500.	`500`
`tol`	`float`	Convergence tolerance. Default is 1e-6.	`1e-06`
`random_state`	`int \| None`	Seed for reproducibility.	`None`

Example

import numpy as np rng = np.random.default_rng(0) X1 = rng.standard_normal((50, 10)) X2 = rng.standard_normal((50, 8)) model = PLS_ALS(latent_dimensions=2, random_state=0).fit([X1, X2])

Source code in cca_zoo/linear/_iterative.py

def __init__(
    self,
    latent_dimensions: int = 1,
    center: bool = True,
    max_iter: int = 500,
    tol: float = 1e-6,
    random_state: int | None = None,
) -> None:
    super().__init__(latent_dimensions=latent_dimensions, center=center)
    self.max_iter = max_iter
    self.tol = tol
    self.random_state = random_state

SCCA_PMD ¶

SCCA_PMD(latent_dimensions: int = 1, center: bool = True, tau: float | list[float] = 1.0, max_iter: int = 500, tol: float = 1e-06, random_state: int | None = None)

Bases: _BaseIterative

Sparse CCA via Penalized Matrix Decomposition.

Maximises the cross-view covariance subject to L1 norm constraints on each weight vector:

.. math::

\max_{\mathbf{w}_1, \mathbf{w}_2}
    \mathbf{w}_1^\top X_1^\top X_2 \mathbf{w}_2

\text{subject to }
\|\mathbf{w}_i\|_1 \leq \tau_i \sqrt{p_i},\quad
\|\mathbf{w}_i\|_2 = 1

The update for each view uses bisection to find the soft-threshold that satisfies the L1 constraint exactly.

References

Witten, D. M., Tibshirani, R., & Hastie, T. (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics, 10(3), 515–534.

Parameters:

Name	Type	Description	Default
`latent_dimensions`	`int`	Number of latent dimensions. Default is 1.	`1`
`center`	`bool`	Whether to subtract column means. Default True.	`True`
`tau`	`float \| list[float]`	L1 bound scaling factor(s) in `(0, 1]`. The actual L1 bound is `tau * sqrt(n_features_i)`. Default is 1 (no sparsity).	`1.0`
`max_iter`	`int`	Maximum ALS iterations. Default is 500.	`500`
`tol`	`float`	Convergence tolerance. Default is 1e-6.	`1e-06`
`random_state`	`int \| None`	Seed for reproducibility.	`None`

Example

import numpy as np rng = np.random.default_rng(0) X1 = rng.standard_normal((50, 10)) X2 = rng.standard_normal((50, 8)) model = SCCA_PMD(tau=0.5, random_state=0).fit([X1, X2])

Source code in cca_zoo/linear/_iterative.py

def __init__(
    self,
    latent_dimensions: int = 1,
    center: bool = True,
    tau: float | list[float] = 1.0,
    max_iter: int = 500,
    tol: float = 1e-6,
    random_state: int | None = None,
) -> None:
    super().__init__(
        latent_dimensions=latent_dimensions,
        center=center,
        max_iter=max_iter,
        tol=tol,
        random_state=random_state,
    )
    self.tau = tau

fit ¶

fit(views: list[ArrayLike], y: None = None) -> SCCA_PMD

Fit the SCCA_PMD model.

Parameters:

Name	Type	Description	Default
`views`	`list[ArrayLike]`	List of arrays, each (n_samples, n_features_i).	required
`y`	`None`	Ignored.	`None`

Returns:

Name	Type	Description
`self`	`SCCA_PMD`	Fitted estimator.

Raises:

Type	Description
`ValueError`	If fewer than 2 views are provided.

Source code in cca_zoo/linear/_iterative.py

def fit(self, views: list[ArrayLike], y: None = None) -> SCCA_PMD:
    """Fit the SCCA_PMD model.

    Args:
        views: List of arrays, each (n_samples, n_features_i).
        y: Ignored.

    Returns:
        self: Fitted estimator.

    Raises:
        ValueError: If fewer than 2 views are provided.
    """
    # Store processed tau for use in _update_weight
    self._tau: list[float] = []  # set in super().fit via _setup_fit
    super().fit(views, y)
    return self

SCCA_ADMM ¶

SCCA_ADMM(latent_dimensions: int = 1, center: bool = True, tau: float | list[float] = 0.1, mu: float = 1.0, max_iter: int = 500, tol: float = 1e-06, random_state: int | None = None)

Bases: _BaseIterative

Sparse CCA via Alternating Direction Method of Multipliers.

Solves the sparse CCA problem using ADMM to enforce both the L1 sparsity constraint on weight vectors and the unit-norm constraint on the projected scores simultaneously.

For view :math:i the ADMM sub-problems are:

:math:\mathbf{w}_i update — proximal gradient step w.r.t. the data fidelity term.
Auxiliary variable :math:\mathbf{z}_i update — soft thresholding.
Dual variable update.

References

Suo, X., Mineiro, P., & Anandkumar, A. (2017). Sparse canonical correlation analysis. arXiv:1705.10865.

Parameters:

Name	Type	Description	Default
`latent_dimensions`	`int`	Number of latent dimensions. Default is 1.	`1`
`center`	`bool`	Whether to subtract column means. Default True.	`True`
`tau`	`float \| list[float]`	L1 regularisation weight(s). Default is 0.1.	`0.1`
`mu`	`float`	ADMM penalty parameter (step size). Default is 1.0.	`1.0`
`max_iter`	`int`	Maximum outer iterations. Default is 500.	`500`
`tol`	`float`	Convergence tolerance. Default is 1e-6.	`1e-06`
`random_state`	`int \| None`	Seed for reproducibility.	`None`

Example

import numpy as np rng = np.random.default_rng(0) X1 = rng.standard_normal((50, 10)) X2 = rng.standard_normal((50, 8)) model = SCCA_ADMM(tau=0.1, random_state=0).fit([X1, X2])

Source code in cca_zoo/linear/_iterative.py

def __init__(
    self,
    latent_dimensions: int = 1,
    center: bool = True,
    tau: float | list[float] = 0.1,
    mu: float = 1.0,
    max_iter: int = 500,
    tol: float = 1e-6,
    random_state: int | None = None,
) -> None:
    super().__init__(
        latent_dimensions=latent_dimensions,
        center=center,
        max_iter=max_iter,
        tol=tol,
        random_state=random_state,
    )
    self.tau = tau
    self.mu = mu

SCCA_IPLS ¶

SCCA_IPLS(latent_dimensions: int = 1, center: bool = True, alpha: float | list[float] = 0.0, l1_ratio: float | list[float] = 1.0, max_iter: int = 500, tol: float = 1e-06, random_state: int | None = None)

Bases: _BaseIterative

Iterative PLS with elastic net penalty on weight vectors.

Alternates between penalised regression sub-problems. For view :math:i:

.. math::

\hat{\mathbf{w}}_i = \arg\min_{\mathbf{w}}
    \frac{1}{2n} \|X_i \mathbf{w} - \bar{\mathbf{s}}_{\neg i}\|_2^2
    + \alpha_i \Bigl(
        l_1 \|\mathbf{w}\|_1
        + \tfrac{1-l_1}{2} \|\mathbf{w}\|_2^2
    \Bigr)

followed by a normalisation step to enforce unit variance of the score.

References

Mai, Q., & Zhang, X. (2019). An iterative penalized least squares approach to sparse canonical correlation analysis. Biometrics, 75(3), 734–744.

Parameters:

Name	Type	Description	Default
`latent_dimensions`	`int`	Number of latent dimensions. Default is 1.	`1`
`center`	`bool`	Whether to subtract column means. Default True.	`True`
`alpha`	`float \| list[float]`	Elastic net penalty strength(s). Default is 0.	`0.0`
`l1_ratio`	`float \| list[float]`	Ratio of L1 to total penalty. 1 = lasso, 0 = ridge. Default is 1.	`1.0`
`max_iter`	`int`	Maximum ALS iterations. Default is 500.	`500`
`tol`	`float`	Convergence tolerance. Default is 1e-6.	`1e-06`
`random_state`	`int \| None`	Seed for reproducibility.	`None`

Example

import numpy as np rng = np.random.default_rng(0) X1 = rng.standard_normal((50, 10)) X2 = rng.standard_normal((50, 8)) model = SCCA_IPLS(alpha=0.1, random_state=0).fit([X1, X2])

Source code in cca_zoo/linear/_iterative.py

def __init__(
    self,
    latent_dimensions: int = 1,
    center: bool = True,
    alpha: float | list[float] = 0.0,
    l1_ratio: float | list[float] = 1.0,
    max_iter: int = 500,
    tol: float = 1e-6,
    random_state: int | None = None,
) -> None:
    super().__init__(
        latent_dimensions=latent_dimensions,
        center=center,
        max_iter=max_iter,
        tol=tol,
        random_state=random_state,
    )
    self.alpha = alpha
    self.l1_ratio = l1_ratio

SCCA_Span ¶

SCCA_Span(latent_dimensions: int = 1, center: bool = True, span: int | list[int] | None = None, max_iter: int = 500, tol: float = 1e-06, random_state: int | None = None)

Bases: _BaseIterative

SpanCCA — sparse CCA via truncated power iteration.

Solves sparse CCA by a sparse power iteration where each weight update retains only the span entries with the largest absolute values.

References

Asteris, M., Khanna, R., Kyrillidis, A., & Dimakis, A. G. (2016). Bilinear approaches for online learning over large feature spaces. NeurIPS 2016. (SpanCCA algorithm).

Parameters:

Name	Type	Description	Default
`latent_dimensions`	`int`	Number of latent dimensions. Default is 1.	`1`
`center`	`bool`	Whether to subtract column means. Default True.	`True`
`span`	`int \| list[int] \| None`	Number of non-zero entries to retain per view. Either a single int or a list. Default is None (keep all — no sparsity).	`None`
`max_iter`	`int`	Maximum ALS iterations. Default is 500.	`500`
`tol`	`float`	Convergence tolerance. Default is 1e-6.	`1e-06`
`random_state`	`int \| None`	Seed for reproducibility.	`None`

Example

import numpy as np rng = np.random.default_rng(0) X1 = rng.standard_normal((50, 10)) X2 = rng.standard_normal((50, 8)) model = SCCA_Span(span=5, random_state=0).fit([X1, X2])

Source code in cca_zoo/linear/_iterative.py

def __init__(
    self,
    latent_dimensions: int = 1,
    center: bool = True,
    span: int | list[int] | None = None,
    max_iter: int = 500,
    tol: float = 1e-6,
    random_state: int | None = None,
) -> None:
    super().__init__(
        latent_dimensions=latent_dimensions,
        center=center,
        max_iter=max_iter,
        tol=tol,
        random_state=random_state,
    )
    self.span = span

ElasticCCA ¶

ElasticCCA(latent_dimensions: int = 1, center: bool = True, alpha: float | list[float] = 0.0, l1_ratio: float | list[float] = 0.5, max_iter: int = 500, tol: float = 1e-06, random_state: int | None = None)

Bases: _BaseIterative

Elastic net regularised CCA.

Alternates between elastic net regression sub-problems, regressing each view's score against the sum of all other views' scores:

.. math::

\hat{\mathbf{w}}_i = \arg\min_{\mathbf{w}}
    \frac{1}{2n} \|X_i \mathbf{w} - \mathbf{s}_{\text{all}}\|_2^2
    + \alpha_i \Bigl(
        l_1 \|\mathbf{w}\|_1
        + \tfrac{1 - l_1}{2} \|\mathbf{w}\|_2^2
    \Bigr)

where :math:\mathbf{s}_{\text{all}} = \sum_j X_j \mathbf{w}_j / \|\cdot\|.

References

Waaijenborg, S., de Witt Hamer, P. C. V., & Zwinderman, A. H. (2008). Quantifying the association between gene expressions and DNA-markers by penalized canonical correlation analysis. Statistical Applications in Genetics and Molecular Biology, 7(1).

Parameters:

Name	Type	Description	Default
`latent_dimensions`	`int`	Number of latent dimensions. Default is 1.	`1`
`center`	`bool`	Whether to subtract column means. Default True.	`True`
`alpha`	`float \| list[float]`	Elastic net regularisation strength. Default is 0.	`0.0`
`l1_ratio`	`float \| list[float]`	L1 / total penalty ratio. Default is 0.5.	`0.5`
`max_iter`	`int`	Maximum ALS iterations. Default is 500.	`500`
`tol`	`float`	Convergence tolerance. Default is 1e-6.	`1e-06`
`random_state`	`int \| None`	Seed for reproducibility.	`None`

Example

import numpy as np rng = np.random.default_rng(0) X1 = rng.standard_normal((50, 10)) X2 = rng.standard_normal((50, 8)) model = ElasticCCA(alpha=0.1, l1_ratio=0.5, random_state=0).fit([X1, X2])

Source code in cca_zoo/linear/_iterative.py

def __init__(
    self,
    latent_dimensions: int = 1,
    center: bool = True,
    alpha: float | list[float] = 0.0,
    l1_ratio: float | list[float] = 0.5,
    max_iter: int = 500,
    tol: float = 1e-6,
    random_state: int | None = None,
) -> None:
    super().__init__(
        latent_dimensions=latent_dimensions,
        center=center,
        max_iter=max_iter,
        tol=tol,
        random_state=random_state,
    )
    self.alpha = alpha
    self.l1_ratio = l1_ratio

ParkhomenkoCCA ¶

ParkhomenkoCCA(latent_dimensions: int = 1, center: bool = True, tau: float | list[float] = 0.1, max_iter: int = 500, tol: float = 1e-06, random_state: int | None = None)

Bases: _BaseIterative

Sparse CCA via soft-thresholding power iteration (Parkhomenko 2009).

Uses a fixed soft-threshold :math:\tau_i rather than the adaptive bisection search of :class:SCCA_PMD:

.. math::

\mathbf{w}_i \leftarrow
    S_{\tau_i}(X_i^\top \bar{\mathbf{s}}_{\neg i})

where :math:S_\tau is the element-wise soft-threshold operator.

References

Parkhomenko, E., Tritchler, D., & Beyene, J. (2009). Sparse canonical correlation analysis with application to genomic data integration. Statistical Applications in Genetics and Molecular Biology, 8(1).

Parameters:

Name	Type	Description	Default
`latent_dimensions`	`int`	Number of latent dimensions. Default is 1.	`1`
`center`	`bool`	Whether to subtract column means. Default True.	`True`
`tau`	`float \| list[float]`	Soft-threshold parameter(s). Default is 0.1.	`0.1`
`max_iter`	`int`	Maximum ALS iterations. Default is 500.	`500`
`tol`	`float`	Convergence tolerance. Default is 1e-6.	`1e-06`
`random_state`	`int \| None`	Seed for reproducibility.	`None`

Example

import numpy as np rng = np.random.default_rng(0) X1 = rng.standard_normal((50, 10)) X2 = rng.standard_normal((50, 8)) model = ParkhomenkoCCA(tau=0.1, random_state=0).fit([X1, X2])

Source code in cca_zoo/linear/_iterative.py

def __init__(
    self,
    latent_dimensions: int = 1,
    center: bool = True,
    tau: float | list[float] = 0.1,
    max_iter: int = 500,
    tol: float = 1e-6,
    random_state: int | None = None,
) -> None:
    super().__init__(
        latent_dimensions=latent_dimensions,
        center=center,
        max_iter=max_iter,
        tol=tol,
        random_state=random_state,
    )
    self.tau = tau

cca_zoo.linear¶

Base class¶

BaseModel ¶

weights property ¶

fit abstractmethod ¶

transform ¶

fit_transform ¶

score ¶

pairwise_correlations ¶

average_pairwise_correlations ¶

get_factor_loadings ¶

Two-view exact methods¶

CCA ¶

fit ¶

rCCA ¶

fit ¶

PLS ¶

fit ¶

Multiview methods¶

MCCA ¶

fit ¶

GCCA ¶

fit ¶

TCCA ¶

fit ¶

Gradient-descent methods¶

PLS_EY ¶

fit ¶

CCA_EY ¶

fit ¶

MCCA_EY ¶

fit ¶

Sparse / iterative methods¶

PLS_ALS ¶

SCCA_PMD ¶

fit ¶

SCCA_ADMM ¶

SCCA_IPLS ¶

SCCA_Span ¶

ElasticCCA ¶

ParkhomenkoCCA ¶

weights `property` ¶

fit `abstractmethod` ¶