Skip to content

cca_zoo.linear

Linear CCA methods. All classes are sklearn.base.BaseEstimator subclasses.


Base class

BaseModel

BaseModel(latent_dimensions: int = 1, center: bool = True)

Bases: BaseEstimator, ABC

Abstract base class for all multiview CCA models.

Subclasses must implement :meth:fit. All other public methods (transform, fit_transform, score, pairwise_correlations, average_pairwise_correlations, get_factor_loadings) are provided here using the weights_ attribute set by fit.

This class inherits from :class:sklearn.base.BaseEstimator so that get_params / set_params round-trip correctly and sklearn model selection utilities work out of the box.

Parameters:

Name Type Description Default
latent_dimensions int

Number of latent dimensions to fit. Default is 1.

1
center bool

Whether to subtract per-view column means before fitting. The means are stored in means_ and applied in transform.

True

weights property

weights: list[ndarray]

Weight matrices post-fit, one per view.

Shape is (n_features_i, latent_dimensions) for each view.

Raises:

Type Description
NotFittedError

If fit has not been called.

fit abstractmethod

fit(views: list[ArrayLike], y: None = None) -> BaseModel

Fit the model to multiview data.

Parameters:

Name Type Description Default
views list[ArrayLike]

List of arrays, each of shape (n_samples, n_features_i). All arrays must have the same number of rows.

required
y None

Ignored. Present for scikit-learn API compatibility.

None

Returns:

Name Type Description
self BaseModel

Fitted estimator.

Raises:

Type Description
ValueError

If fewer than 2 views are provided.

ValueError

If views have inconsistent numbers of samples.

transform

transform(views: list[ArrayLike]) -> list[np.ndarray]

Project views into the latent space using the fitted weights.

Parameters:

Name Type Description Default
views list[ArrayLike]

List of arrays, each of shape (n_samples, n_features_i).

required

Returns:

Type Description
list[ndarray]

List of arrays, each of shape (n_samples, latent_dimensions).

Raises:

Type Description
NotFittedError

If fit has not been called.

fit_transform

fit_transform(views: list[ArrayLike], y: None = None) -> list[np.ndarray]

Fit and then transform the training data.

Equivalent to self.fit(views).transform(views) but may be more efficient for some subclasses.

Parameters:

Name Type Description Default
views list[ArrayLike]

List of arrays, each of shape (n_samples, n_features_i).

required
y None

Ignored.

None

Returns:

Type Description
list[ndarray]

List of arrays, each of shape (n_samples, latent_dimensions).

score

score(views: list[ArrayLike], y: None = None) -> np.ndarray

Return average pairwise canonical correlations for each dimension.

Parameters:

Name Type Description Default
views list[ArrayLike]

List of arrays, each of shape (n_samples, n_features_i).

required
y None

Ignored.

None

Returns:

Type Description
ndarray

Array of shape (latent_dimensions,) with the average

ndarray

pairwise correlation for each canonical dimension.

pairwise_correlations

pairwise_correlations(views: list[ArrayLike]) -> np.ndarray

Compute the full pairwise correlation matrix per latent dimension.

Parameters:

Name Type Description Default
views list[ArrayLike]

List of arrays, each of shape (n_samples, n_features_i).

required

Returns:

Type Description
ndarray

Array of shape (n_views, n_views, latent_dimensions) where

ndarray

entry [i, j, d] is the Pearson correlation between the

ndarray

d-th canonical variate of view i and view j.

average_pairwise_correlations

average_pairwise_correlations(views: list[ArrayLike]) -> np.ndarray

Return the mean off-diagonal pairwise correlation per dimension.

Parameters:

Name Type Description Default
views list[ArrayLike]

List of arrays, each of shape (n_samples, n_features_i).

required

Returns:

Type Description
ndarray

Array of shape (latent_dimensions,) with the average

ndarray

off-diagonal pairwise correlation for each canonical dimension.

get_factor_loadings

get_factor_loadings(views: list[ArrayLike]) -> list[np.ndarray]

Compute canonical factor loadings for each view.

A loading is the Pearson correlation between an original feature and a canonical variate. Loadings indicate which original variables drive each canonical direction.

Parameters:

Name Type Description Default
views list[ArrayLike]

List of arrays, each of shape (n_samples, n_features_i).

required

Returns:

Type Description
list[ndarray]

List of arrays, each of shape (n_features_i, latent_dimensions),

list[ndarray]

where entry [j, d] is the correlation between feature j of

list[ndarray]

view i and the d-th canonical variate of view i.


Two-view exact methods

CCA

CCA(latent_dimensions: int = 1, center: bool = True)

Bases: rCCA

Canonical Correlation Analysis.

Finds the pair of linear projections that maximise the Pearson correlation between two views subject to unit within-view variance constraints:

.. math::

\max_{\mathbf{w}_1, \mathbf{w}_2}
    \mathbf{w}_1^\top X_1^\top X_2 \mathbf{w}_2

\text{subject to }
\mathbf{w}_i^\top X_i^\top X_i \mathbf{w}_i = 1

This is a special case of :class:rCCA with c=0. The solution uses PCA whitening followed by an SVD of the cross-covariance matrix, which is numerically stable even for high-dimensional views.

References

Hotelling, H. (1936). Relations between two sets of variates. Biometrika, 28(3/4), 321–377.

Parameters:

Name Type Description Default
latent_dimensions int

Number of latent dimensions. Default is 1.

1
center bool

Whether to subtract column means before fitting. Default True.

True
Example

import numpy as np rng = np.random.default_rng(0) X1 = rng.standard_normal((50, 10)) X2 = rng.standard_normal((50, 8)) model = CCA(latent_dimensions=2).fit([X1, X2]) corrs = model.score([X1, X2])

Source code in cca_zoo/linear/_cca.py
def __init__(
    self,
    latent_dimensions: int = 1,
    center: bool = True,
) -> None:
    super().__init__(
        latent_dimensions=latent_dimensions,
        center=center,
        c=0.0,
    )

fit

fit(views: list[ArrayLike], y: None = None) -> CCA

Fit the CCA model.

Parameters:

Name Type Description Default
views list[ArrayLike]

List of exactly two arrays, each (n_samples, n_features_i).

required
y None

Ignored.

None

Returns:

Name Type Description
self CCA

Fitted estimator.

Raises:

Type Description
ValueError

If the number of views is not exactly 2.

ValueError

If views have inconsistent numbers of samples.

Example

import numpy as np rng = np.random.default_rng(0) X1 = rng.standard_normal((50, 10)) X2 = rng.standard_normal((50, 8)) model = CCA(latent_dimensions=2).fit([X1, X2])

Source code in cca_zoo/linear/_cca.py
def fit(self, views: list[ArrayLike], y: None = None) -> CCA:
    """Fit the CCA model.

    Args:
        views: List of exactly two arrays, each (n_samples, n_features_i).
        y: Ignored.

    Returns:
        self: Fitted estimator.

    Raises:
        ValueError: If the number of views is not exactly 2.
        ValueError: If views have inconsistent numbers of samples.

    Example:
        >>> import numpy as np
        >>> rng = np.random.default_rng(0)
        >>> X1 = rng.standard_normal((50, 10))
        >>> X2 = rng.standard_normal((50, 8))
        >>> model = CCA(latent_dimensions=2).fit([X1, X2])
    """
    return super().fit(views, y)

rCCA

rCCA(latent_dimensions: int = 1, center: bool = True, c: float | list[float] = 0.0)

Bases: BaseModel

Regularised Canonical Correlation Analysis (canonical ridge).

Finds the pair of linear projections of two views that maximise their correlation subject to regularised within-view variance constraints:

.. math::

\max_{\mathbf{w}_1, \mathbf{w}_2}
    \mathbf{w}_1^\top X_1^\top X_2 \mathbf{w}_2

\text{subject to }
\mathbf{w}_i^\top
\bigl((1 - c_i) X_i^\top X_i + c_i I\bigr) \mathbf{w}_i = 1

The solution is found by whitening each view with its regularised covariance matrix and computing the SVD of the resulting cross-covariance.

:class:CCA (c=0) and :class:PLS (c=1) are special cases.

References

Vinod, H. D. (1976). Canonical ridge and econometrics of joint production. Journal of Econometrics, 4(2), 147–166.

Parameters:

Name Type Description Default
latent_dimensions int

Number of latent dimensions. Default is 1.

1
center bool

Whether to subtract column means before fitting. Default True.

True
c float | list[float]

Ridge regularisation parameter(s) in [0, 1]. A single float is applied to both views; a list [c1, c2] applies per-view regularisation. Default is 0 (standard CCA).

0.0
Example

import numpy as np rng = np.random.default_rng(0) X1 = rng.standard_normal((50, 10)) X2 = rng.standard_normal((50, 8)) model = rCCA(latent_dimensions=2, c=0.1).fit([X1, X2]) scores = model.transform([X1, X2])

Source code in cca_zoo/linear/_rcca.py
def __init__(
    self,
    latent_dimensions: int = 1,
    center: bool = True,
    c: float | list[float] = 0.0,
) -> None:
    super().__init__(latent_dimensions=latent_dimensions, center=center)
    self.c = c

fit

fit(views: list[ArrayLike], y: None = None) -> rCCA

Fit the rCCA model.

Parameters:

Name Type Description Default
views list[ArrayLike]

List of exactly two arrays, each (n_samples, n_features_i).

required
y None

Ignored.

None

Returns:

Name Type Description
self rCCA

Fitted estimator.

Raises:

Type Description
ValueError

If the number of views is not exactly 2.

ValueError

If views have inconsistent numbers of samples.

Source code in cca_zoo/linear/_rcca.py
def fit(self, views: list[ArrayLike], y: None = None) -> rCCA:
    """Fit the rCCA model.

    Args:
        views: List of exactly two arrays, each (n_samples, n_features_i).
        y: Ignored.

    Returns:
        self: Fitted estimator.

    Raises:
        ValueError: If the number of views is not exactly 2.
        ValueError: If views have inconsistent numbers of samples.
    """
    views_: list[np.ndarray] = self._setup_fit(views)
    if self.n_views_ != 2:
        raise ValueError(
            f"rCCA requires exactly 2 views, got {self.n_views_}. "
            "Use MCCA for more than 2 views."
        )
    c_ = perview_parameter("c", self.c, 0.0, 2)
    X1, X2 = views_
    # Whiten each view with its regularised covariance
    X1_w, W1 = svd_whiten(X1, c_[0])
    X2_w, W2 = svd_whiten(X2, c_[1])
    # SVD of the cross-covariance of whitened views
    k = min(self.latent_dimensions, X1_w.shape[1], X2_w.shape[1])
    cross_cov = X1_w.T @ X2_w / (X1.shape[0] - 1)
    U, _, Vt = np.linalg.svd(cross_cov, full_matrices=False)
    U = U[:, :k]
    Vt = Vt[:k, :]
    self.weights_: list[np.ndarray] = [W1 @ U, W2 @ Vt.T]
    return self

PLS

PLS(latent_dimensions: int = 1, center: bool = True)

Bases: rCCA

Partial Least Squares (two-view).

Finds the pair of unit-norm weight vectors that maximise the covariance between the projected views:

.. math::

\max_{\mathbf{w}_1, \mathbf{w}_2}
    \mathbf{w}_1^\top X_1^\top X_2 \mathbf{w}_2

\text{subject to }
\|\mathbf{w}_i\|_2 = 1

This is equivalent to the truncated SVD of the sample cross-covariance matrix :math:X_1^\top X_2 / (n - 1), and corresponds to :class:rCCA with c=1.

References

Wold, H. (1975). Soft modelling by latent variables: the nonlinear iterative partial least squares (NIPALS) approach. Perspectives in Probability and Statistics, 117–142.

Parameters:

Name Type Description Default
latent_dimensions int

Number of latent dimensions. Default is 1.

1
center bool

Whether to subtract column means before fitting. Default True.

True
Example

import numpy as np rng = np.random.default_rng(0) X1 = rng.standard_normal((50, 10)) X2 = rng.standard_normal((50, 8)) model = PLS(latent_dimensions=2).fit([X1, X2]) scores = model.transform([X1, X2])

Source code in cca_zoo/linear/_pls.py
def __init__(
    self,
    latent_dimensions: int = 1,
    center: bool = True,
) -> None:
    super().__init__(
        latent_dimensions=latent_dimensions,
        center=center,
        c=1.0,
    )

fit

fit(views: list[ArrayLike], y: None = None) -> PLS

Fit the PLS model.

Parameters:

Name Type Description Default
views list[ArrayLike]

List of exactly two arrays, each (n_samples, n_features_i).

required
y None

Ignored.

None

Returns:

Name Type Description
self PLS

Fitted estimator.

Raises:

Type Description
ValueError

If the number of views is not exactly 2.

ValueError

If views have inconsistent numbers of samples.

Example

import numpy as np rng = np.random.default_rng(0) X1 = rng.standard_normal((50, 10)) X2 = rng.standard_normal((50, 8)) model = PLS(latent_dimensions=2).fit([X1, X2])

Source code in cca_zoo/linear/_pls.py
def fit(self, views: list[ArrayLike], y: None = None) -> PLS:
    """Fit the PLS model.

    Args:
        views: List of exactly two arrays, each (n_samples, n_features_i).
        y: Ignored.

    Returns:
        self: Fitted estimator.

    Raises:
        ValueError: If the number of views is not exactly 2.
        ValueError: If views have inconsistent numbers of samples.

    Example:
        >>> import numpy as np
        >>> rng = np.random.default_rng(0)
        >>> X1 = rng.standard_normal((50, 10))
        >>> X2 = rng.standard_normal((50, 8))
        >>> model = PLS(latent_dimensions=2).fit([X1, X2])
    """
    return super().fit(views, y)

Multiview methods

MCCA

MCCA(latent_dimensions: int = 1, center: bool = True, c: float | list[float] = 0.0, pca: bool = True, eps: float = 1e-06)

Bases: BaseModel

Multiset Canonical Correlation Analysis.

Finds linear projections of multiple (>=2) views that maximise the sum of pairwise cross-view covariances subject to within-view variance constraints. A ridge regularisation parameter c controls the trade-off between correlation and variance explained.

The primal objective is:

.. math::

\max_{\mathbf{w}} \sum_{i \neq j} \mathbf{w}_i^\top X_i^\top X_j
\mathbf{w}_j

\text{subject to } \mathbf{w}_i^\top
\bigl((1-c_i) X_i^\top X_i + c_i I\bigr) \mathbf{w}_i = 1

This is solved as a generalised eigenvalue problem:

.. math::

A \mathbf{v} = \lambda B \mathbf{v}

where :math:A is the between-view block covariance matrix and :math:B is the block-diagonal regularised within-view covariance matrix.

When pca=True (default), each view is first reduced to its principal components, which makes the problem numerically stable for high-dimensional data and allows an efficient closed-form :math:B.

References

Kettenring, J. R. (1971). Canonical analysis of several sets of variables. Biometrika, 58(3), 433–451.

Vinod, H. D. (1976). Canonical ridge and econometrics of joint production. Journal of Econometrics, 4(2), 147–166.

Parameters:

Name Type Description Default
latent_dimensions int

Number of latent dimensions. Default is 1.

1
center bool

Whether to subtract column means before fitting. Default True.

True
c float | list[float]

Ridge regularisation parameter(s). Either a single float applied to all views or a list of per-view floats in [0, 1]. c=0 gives standard CCA constraints; c=1 gives sphering (PLS-like). Default is 0.

0.0
pca bool

Whether to apply full PCA whitening as a pre-processing step before solving the eigenvalue problem. Highly recommended for high-dimensional data. Default is True.

True
eps float

Small constant added to the eigenvalues of B to ensure positive definiteness. Default is 1e-6.

1e-06
Example

import numpy as np rng = np.random.default_rng(0) X1 = rng.standard_normal((50, 10)) X2 = rng.standard_normal((50, 8)) X3 = rng.standard_normal((50, 6)) model = MCCA(latent_dimensions=2).fit([X1, X2, X3]) scores = model.transform([X1, X2, X3])

Source code in cca_zoo/linear/_mcca.py
def __init__(
    self,
    latent_dimensions: int = 1,
    center: bool = True,
    c: float | list[float] = 0.0,
    pca: bool = True,
    eps: float = 1e-6,
) -> None:
    super().__init__(latent_dimensions=latent_dimensions, center=center)
    self.c = c
    self.pca = pca
    self.eps = eps

fit

fit(views: list[ArrayLike], y: None = None) -> MCCA

Fit the MCCA model.

Parameters:

Name Type Description Default
views list[ArrayLike]

List of arrays, each (n_samples, n_features_i).

required
y None

Ignored.

None

Returns:

Name Type Description
self MCCA

Fitted estimator.

Raises:

Type Description
ValueError

If fewer than 2 views are provided.

ValueError

If views have inconsistent numbers of samples.

Source code in cca_zoo/linear/_mcca.py
def fit(self, views: list[ArrayLike], y: None = None) -> MCCA:
    """Fit the MCCA model.

    Args:
        views: List of arrays, each (n_samples, n_features_i).
        y: Ignored.

    Returns:
        self: Fitted estimator.

    Raises:
        ValueError: If fewer than 2 views are provided.
        ValueError: If views have inconsistent numbers of samples.
    """
    views_: list[np.ndarray] = self._setup_fit(views)
    c_ = perview_parameter("c", self.c, 0.0, self.n_views_)

    if self.pca:
        pca_models = [PCA().fit(v) for v in views_]
        views_pca = [m.transform(v) for m, v in zip(pca_models, views_)]
        A = self._build_A(views_pca)
        B = self._build_B_pca(pca_models, c_)
    else:
        A = self._build_A(views_)
        B = self._build_B(views_, c_)

    splits = np.cumsum([v.shape[1] for v in (views_pca if self.pca else views_)])
    _, eigvecs = gevp(A, B, self.latent_dimensions)

    raw_weights = np.split(eigvecs, splits[:-1], axis=0)
    if self.pca:
        self.weights_: list[np.ndarray] = [
            m.components_.T @ w for m, w in zip(pca_models, raw_weights)
        ]
    else:
        self.weights_ = raw_weights
    return self

GCCA

GCCA(latent_dimensions: int = 1, center: bool = True, c: float | list[float] = 0.0, view_weights: list[float] | None = None, eps: float = 1e-06)

Bases: BaseModel

Generalised Canonical Correlation Analysis.

Finds linear projections of multiple (>=2) views that maximise their joint correlation with a shared auxiliary latent vector:

.. math::

\max_{\mathbf{w}_i, T}
    \sum_{i=1}^M \mathbf{w}_i^\top X_i^\top T

\text{subject to }
T^\top T = I

The solution is obtained by constructing the weighted projection matrix:

.. math::

Q = \sum_{i=1}^M \mu_i X_i
    \bigl((1-c_i) X_i^\top X_i + c_i I\bigr)^{-1} X_i^\top

and computing its top-k eigenvectors :math:V, then recovering the per-view weights as :math:\mathbf{w}_i = X_i^+ V.

References

Tenenhaus, A., & Tenenhaus, M. (2011). Regularized generalized canonical correlation analysis. Psychometrika, 76(2), 257–284.

Parameters:

Name Type Description Default
latent_dimensions int

Number of latent dimensions. Default is 1.

1
center bool

Whether to subtract column means before fitting. Default True.

True
c float | list[float]

Ridge regularisation parameter(s) in [0, 1]. Default is 0.

0.0
view_weights list[float] | None

Per-view weights :math:\mu_i in the GCCA objective. Default is equal weights (1 for all views).

None
eps float

Regularisation floor for within-view matrices. Default is 1e-6.

1e-06
Example

import numpy as np rng = np.random.default_rng(0) X1 = rng.standard_normal((50, 10)) X2 = rng.standard_normal((50, 8)) X3 = rng.standard_normal((50, 6)) model = GCCA(latent_dimensions=2).fit([X1, X2, X3]) scores = model.transform([X1, X2, X3])

Source code in cca_zoo/linear/_gcca.py
def __init__(
    self,
    latent_dimensions: int = 1,
    center: bool = True,
    c: float | list[float] = 0.0,
    view_weights: list[float] | None = None,
    eps: float = 1e-6,
) -> None:
    super().__init__(latent_dimensions=latent_dimensions, center=center)
    self.c = c
    self.view_weights = view_weights
    self.eps = eps

fit

fit(views: list[ArrayLike], y: None = None) -> GCCA

Fit the GCCA model.

Parameters:

Name Type Description Default
views list[ArrayLike]

List of arrays, each (n_samples, n_features_i).

required
y None

Ignored.

None

Returns:

Name Type Description
self GCCA

Fitted estimator.

Raises:

Type Description
ValueError

If fewer than 2 views are provided.

ValueError

If views have inconsistent numbers of samples.

Source code in cca_zoo/linear/_gcca.py
def fit(self, views: list[ArrayLike], y: None = None) -> GCCA:
    """Fit the GCCA model.

    Args:
        views: List of arrays, each (n_samples, n_features_i).
        y: Ignored.

    Returns:
        self: Fitted estimator.

    Raises:
        ValueError: If fewer than 2 views are provided.
        ValueError: If views have inconsistent numbers of samples.
    """
    views_: list[np.ndarray] = self._setup_fit(views)
    c_ = perview_parameter("c", self.c, 0.0, self.n_views_)
    mu = perview_parameter("view_weights", self.view_weights, 1.0, self.n_views_)

    # Build Q = sum_i mu_i X_i (cov_i)^{-1} X_i^T
    Q = np.zeros((self.n_samples_, self.n_samples_))
    for i, (v, ci, mi) in enumerate(zip(views_, c_, mu)):
        cov_i = (1.0 - ci) * np.cov(v, rowvar=False) + ci * np.eye(v.shape[1])
        min_eig = np.linalg.eigvalsh(cov_i).min()
        if min_eig < self.eps:
            cov_i += (self.eps - min_eig) * np.eye(cov_i.shape[0])
        Q += mi * (v @ np.linalg.inv(cov_i) @ v.T)

    _, eigvecs = gevp(Q, None, self.latent_dimensions)
    T = eigvecs[:, : self.latent_dimensions]  # (n_samples, k)
    self.weights_: list[np.ndarray] = [np.linalg.pinv(v) @ T for v in views_]
    return self

TCCA

TCCA(latent_dimensions: int = 1, center: bool = True, c: float | list[float] = 0.0, eps: float = 1e-06, random_state: int | None = None)

Bases: BaseModel

Tensor Canonical Correlation Analysis.

Extends CCA to more than two views by exploiting higher-order cross-view correlations via a tensor product structure. The method constructs the order-M cross-moment tensor:

.. math::

\mathcal{M}_{p_1 p_2 \ldots p_M}
    = \frac{1}{n} \sum_{i=1}^n
        \tilde{x}_{1,i}^{(p_1)}
        \tilde{x}_{2,i}^{(p_2)}
        \cdots
        \tilde{x}_{M,i}^{(p_M)}

where :math:\tilde{X}_j = X_j \Sigma_j^{-1/2} are the whitened views, and then decomposes :math:\mathcal{M} using PARAFAC to recover the canonical directions.

References

Kim, T.-K., Wong, S.-F., & Cipolla, R. (2007). Tensor canonical correlation analysis for action classification. CVPR 2007. IEEE.

Parameters:

Name Type Description Default
latent_dimensions int

Number of latent dimensions. Default is 1.

1
center bool

Whether to subtract column means before fitting. Default True.

True
c float | list[float]

Ridge regularisation in [0, 1]. Default is 0.

0.0
eps float

Regularisation floor for within-view covariance matrices.

1e-06
random_state int | None

Seed for reproducibility (passed to PARAFAC).

None
Example

import numpy as np rng = np.random.default_rng(0) X1 = rng.standard_normal((50, 5)) X2 = rng.standard_normal((50, 5)) X3 = rng.standard_normal((50, 5)) model = TCCA(latent_dimensions=2, random_state=0).fit([X1, X2, X3]) scores = model.transform([X1, X2, X3])

Source code in cca_zoo/linear/_tcca.py
def __init__(
    self,
    latent_dimensions: int = 1,
    center: bool = True,
    c: float | list[float] = 0.0,
    eps: float = 1e-6,
    random_state: int | None = None,
) -> None:
    super().__init__(latent_dimensions=latent_dimensions, center=center)
    self.c = c
    self.eps = eps
    self.random_state = random_state

fit

fit(views: list[ArrayLike], y: None = None) -> TCCA

Fit the TCCA model.

Parameters:

Name Type Description Default
views list[ArrayLike]

List of arrays, each (n_samples, n_features_i).

required
y None

Ignored.

None

Returns:

Name Type Description
self TCCA

Fitted estimator.

Raises:

Type Description
ValueError

If fewer than 2 views are provided.

ValueError

If views have inconsistent numbers of samples.

Source code in cca_zoo/linear/_tcca.py
def fit(self, views: list[ArrayLike], y: None = None) -> TCCA:
    """Fit the TCCA model.

    Args:
        views: List of arrays, each (n_samples, n_features_i).
        y: Ignored.

    Returns:
        self: Fitted estimator.

    Raises:
        ValueError: If fewer than 2 views are provided.
        ValueError: If views have inconsistent numbers of samples.
    """
    views_: list[np.ndarray] = self._setup_fit(views)
    c_ = perview_parameter("c", self.c, 0.0, self.n_views_)
    whitened, cov_invsqrt = self._whiten_views(views_, c_)

    # Build cross-moment tensor via sequential outer products
    M: np.ndarray | None = None
    for i, wv in enumerate(whitened):
        if M is None:
            M = wv
        else:
            for _ in range(len(M.shape) - 1):
                wv = np.expand_dims(wv, 1)
            M = np.expand_dims(M, -1) @ wv
    assert M is not None
    M = np.mean(M, 0)

    tl.set_backend("numpy")
    parafac_result = parafac(
        M,
        self.latent_dimensions,
        verbose=False,
        random_state=self.random_state,
    )
    self.weights_: list[np.ndarray] = [
        cov_invsqrt[i] @ fac for i, fac in enumerate(parafac_result.factors)
    ]
    return self

Gradient-descent methods

PLS_EY

PLS_EY(latent_dimensions: int = 1, center: bool = True, learning_rate: float = 0.01, max_iter: int = 1000, batch_size: int | None = None, tol: float = 1e-06, random_state: int | None = None)

Bases: BaseModel

Stochastic Eckart-Young PLS for large-scale data.

Optimises the Eckart-Young (EY) objective for PLS by mini-batch Riemannian gradient descent on the Stiefel manifold:

.. math::

\min_{U, V \,:\, U^\top U = I,\, V^\top V = I}
    \left\| X_1 U - X_2 V \right\|_F^2

which is equivalent to maximising :math:\mathrm{tr}(U^\top X_1^\top X_2 V) (the PLS objective). At each step the Euclidean gradient is projected onto the tangent space of the Stiefel manifold, and the result is retracted back to the manifold via polar decomposition.

Suitable for high-dimensional or streaming data where forming the full (p × p) cross-covariance matrix is too expensive.

References

Gemp, I., McWilliams, B., Vernade, C., & Graepel, T. (2022). EigenGame Unloaded: When playing games is better than optimizing. ICLR 2022.

Parameters:

Name Type Description Default
latent_dimensions int

Number of latent dimensions. Default is 1.

1
center bool

Whether to subtract column means. Default True.

True
learning_rate float

Riemannian gradient step size. Default is 1e-2.

0.01
max_iter int

Number of gradient iterations. Default is 1000.

1000
batch_size int | None

Mini-batch size. None uses the full dataset.

None
tol float

Convergence tolerance on the objective change. Default is 1e-6.

1e-06
random_state int | None

Seed for reproducibility.

None
Example

import numpy as np rng = np.random.default_rng(0) X1 = rng.standard_normal((200, 500)) X2 = rng.standard_normal((200, 400)) model = PLS_EY(latent_dimensions=4, batch_size=64, random_state=0) model.fit([X1, X2])

Source code in cca_zoo/linear/_gradient.py
def __init__(
    self,
    latent_dimensions: int = 1,
    center: bool = True,
    learning_rate: float = 1e-2,
    max_iter: int = 1000,
    batch_size: int | None = None,
    tol: float = 1e-6,
    random_state: int | None = None,
) -> None:
    super().__init__(latent_dimensions=latent_dimensions, center=center)
    self.learning_rate = learning_rate
    self.max_iter = max_iter
    self.batch_size = batch_size
    self.tol = tol
    self.random_state = random_state

fit

fit(views: list[ArrayLike], y: None = None) -> PLS_EY

Fit PLS_EY by Riemannian gradient descent.

Parameters:

Name Type Description Default
views list[ArrayLike]

List of arrays, each (n_samples, n_features_i).

required
y None

Ignored.

None

Returns:

Name Type Description
self PLS_EY

Fitted estimator.

Raises:

Type Description
ValueError

If fewer than 2 views are provided.

ValueError

If views have inconsistent numbers of samples.

Source code in cca_zoo/linear/_gradient.py
def fit(self, views: list[ArrayLike], y: None = None) -> PLS_EY:
    """Fit PLS_EY by Riemannian gradient descent.

    Args:
        views: List of arrays, each (n_samples, n_features_i).
        y: Ignored.

    Returns:
        self: Fitted estimator.

    Raises:
        ValueError: If fewer than 2 views are provided.
        ValueError: If views have inconsistent numbers of samples.
    """
    views_: list[np.ndarray] = self._setup_fit(views)
    rng = np.random.default_rng(self.random_state)
    n = self.n_samples_
    bs = n if self.batch_size is None else min(self.batch_size, n)

    # Initialise weights on Stiefel manifold
    W = [
        _stiefel_retract(rng.standard_normal((p, self.latent_dimensions)))
        for p in self.n_features_in_
    ]
    prev_obj = np.inf
    for iteration in range(self.max_iter):
        idx = rng.choice(n, bs, replace=False)
        batch = [v[idx] for v in views_]
        obj, W = self._step(batch, W)
        if abs(prev_obj - obj) < self.tol:
            logger.debug("PLS_EY converged at iteration %d", iteration)
            break
        prev_obj = obj
    self.weights_ = W
    return self

CCA_EY

CCA_EY(latent_dimensions: int = 1, center: bool = True, c: float | list[float] = 0.0, learning_rate: float = 0.01, max_iter: int = 1000, batch_size: int | None = None, tol: float = 1e-06, random_state: int | None = None)

Bases: PLS_EY

Eckart-Young CCA for large-scale data.

Equivalent to :class:PLS_EY but applies per-view PCA whitening before the gradient updates, so the resulting objective is the CCA correlation rather than covariance. This makes the method applicable to views with very different scales.

The whitening pre-processing is computed once at the start of fit using the full data, then the gradient updates operate in the whitened space.

References

Gemp, I., McWilliams, B., Vernade, C., & Graepel, T. (2022). EigenGame Unloaded: When playing games is better than optimizing. ICLR 2022.

Parameters:

Name Type Description Default
latent_dimensions int

Number of latent dimensions. Default is 1.

1
center bool

Whether to subtract column means. Default True.

True
c float | list[float]

Ridge regularisation parameter(s) in [0, 1]. Default is 0 (standard CCA whitening); increase for noisy high-dimensional data.

0.0
learning_rate float

Riemannian gradient step size. Default is 1e-2.

0.01
max_iter int

Number of gradient iterations. Default is 1000.

1000
batch_size int | None

Mini-batch size. None uses the full dataset.

None
tol float

Convergence tolerance. Default is 1e-6.

1e-06
random_state int | None

Seed for reproducibility.

None
Example

import numpy as np rng = np.random.default_rng(0) X1 = rng.standard_normal((200, 500)) X2 = rng.standard_normal((200, 400)) model = CCA_EY(latent_dimensions=4, c=0.1, batch_size=64, random_state=0) model.fit([X1, X2])

Source code in cca_zoo/linear/_gradient.py
def __init__(
    self,
    latent_dimensions: int = 1,
    center: bool = True,
    c: float | list[float] = 0.0,
    learning_rate: float = 1e-2,
    max_iter: int = 1000,
    batch_size: int | None = None,
    tol: float = 1e-6,
    random_state: int | None = None,
) -> None:
    super().__init__(
        latent_dimensions=latent_dimensions,
        center=center,
        learning_rate=learning_rate,
        max_iter=max_iter,
        batch_size=batch_size,
        tol=tol,
        random_state=random_state,
    )
    self.c = c

fit

fit(views: list[ArrayLike], y: None = None) -> CCA_EY

Fit CCA_EY with whitening pre-processing.

Parameters:

Name Type Description Default
views list[ArrayLike]

List of arrays, each (n_samples, n_features_i).

required
y None

Ignored.

None

Returns:

Name Type Description
self CCA_EY

Fitted estimator.

Raises:

Type Description
ValueError

If fewer than 2 views are provided.

ValueError

If views have inconsistent numbers of samples.

Source code in cca_zoo/linear/_gradient.py
def fit(self, views: list[ArrayLike], y: None = None) -> CCA_EY:
    """Fit CCA_EY with whitening pre-processing.

    Args:
        views: List of arrays, each (n_samples, n_features_i).
        y: Ignored.

    Returns:
        self: Fitted estimator.

    Raises:
        ValueError: If fewer than 2 views are provided.
        ValueError: If views have inconsistent numbers of samples.
    """
    views_: list[np.ndarray] = self._setup_fit(views)
    c_ = perview_parameter("c", self.c, 0.0, self.n_views_)
    # Whiten each view; store whitening matrices to back-project weights
    whitened = []
    self._whiten_mats: list[np.ndarray] = []
    for v, ci in zip(views_, c_):
        v_w, W_whiten = svd_whiten(v, ci)
        whitened.append(v_w)
        self._whiten_mats.append(W_whiten)

    rng = np.random.default_rng(self.random_state)
    n = self.n_samples_
    bs = n if self.batch_size is None else min(self.batch_size, n)
    latent_dims_clamped = min(
        self.latent_dimensions, *[w.shape[1] for w in whitened]
    )
    W_white = [
        _stiefel_retract(rng.standard_normal((w.shape[1], latent_dims_clamped)))
        for w in whitened
    ]
    prev_obj = np.inf
    for iteration in range(self.max_iter):
        idx = rng.choice(n, bs, replace=False)
        batch = [v[idx] for v in whitened]
        obj, W_white = self._step(batch, W_white)
        if abs(prev_obj - obj) < self.tol:
            logger.debug("CCA_EY converged at iteration %d", iteration)
            break
        prev_obj = obj
    # Back-project from whitened space to original space
    self.weights_ = [wm @ ww for wm, ww in zip(self._whiten_mats, W_white)]
    return self

MCCA_EY

MCCA_EY(latent_dimensions: int = 1, center: bool = True, c: float | list[float] = 0.0, learning_rate: float = 0.01, max_iter: int = 1000, batch_size: int | None = None, tol: float = 1e-06, random_state: int | None = None)

Bases: CCA_EY

Eckart-Young multiview CCA for large-scale data (>=2 views).

Extends :class:CCA_EY to handle more than two views by optimising the multiview EY loss:

.. math::

\min_{\{W_i\}} \sum_{i \neq j}
    \left\| \tilde{X}_i W_i - \tilde{X}_j W_j \right\|_F^2

where :math:\tilde{X}_i are the whitened views, and all weight matrices are constrained to lie on the Stiefel manifold.

References

Gemp, I., McWilliams, B., Vernade, C., & Graepel, T. (2022). EigenGame Unloaded: When playing games is better than optimizing. ICLR 2022.

Parameters:

Name Type Description Default
latent_dimensions int

Number of latent dimensions. Default is 1.

1
center bool

Whether to subtract column means. Default True.

True
c float | list[float]

Ridge regularisation parameter(s) in [0, 1]. Default is 0.

0.0
learning_rate float

Riemannian gradient step size. Default is 1e-2.

0.01
max_iter int

Number of gradient iterations. Default is 1000.

1000
batch_size int | None

Mini-batch size. None uses the full dataset.

None
tol float

Convergence tolerance. Default is 1e-6.

1e-06
random_state int | None

Seed for reproducibility.

None
Example

import numpy as np rng = np.random.default_rng(0) X1 = rng.standard_normal((200, 500)) X2 = rng.standard_normal((200, 400)) X3 = rng.standard_normal((200, 300)) model = MCCA_EY(latent_dimensions=4, c=0.1, batch_size=64, random_state=0) model.fit([X1, X2, X3])

Source code in cca_zoo/linear/_gradient.py
def __init__(
    self,
    latent_dimensions: int = 1,
    center: bool = True,
    c: float | list[float] = 0.0,
    learning_rate: float = 1e-2,
    max_iter: int = 1000,
    batch_size: int | None = None,
    tol: float = 1e-6,
    random_state: int | None = None,
) -> None:
    super().__init__(
        latent_dimensions=latent_dimensions,
        center=center,
        learning_rate=learning_rate,
        max_iter=max_iter,
        batch_size=batch_size,
        tol=tol,
        random_state=random_state,
    )
    self.c = c

fit

fit(views: list[ArrayLike], y: None = None) -> MCCA_EY

Fit MCCA_EY for 2 or more views.

Parameters:

Name Type Description Default
views list[ArrayLike]

List of arrays, each (n_samples, n_features_i).

required
y None

Ignored.

None

Returns:

Name Type Description
self MCCA_EY

Fitted estimator.

Raises:

Type Description
ValueError

If fewer than 2 views are provided.

ValueError

If views have inconsistent numbers of samples.

Source code in cca_zoo/linear/_gradient.py
def fit(self, views: list[ArrayLike], y: None = None) -> MCCA_EY:
    """Fit MCCA_EY for 2 or more views.

    Args:
        views: List of arrays, each (n_samples, n_features_i).
        y: Ignored.

    Returns:
        self: Fitted estimator.

    Raises:
        ValueError: If fewer than 2 views are provided.
        ValueError: If views have inconsistent numbers of samples.
    """
    super().fit(views, y)
    return self

Sparse / iterative methods

PLS_ALS

PLS_ALS(latent_dimensions: int = 1, center: bool = True, max_iter: int = 500, tol: float = 1e-06, random_state: int | None = None)

Bases: _BaseIterative

Alternating Least Squares variant of Partial Least Squares.

Maximises the sum of cross-view covariances using simple power-iteration updates, without regularisation:

.. math::

\mathbf{w}_i \leftarrow
    \frac{X_i^\top \bar{\mathbf{s}}_{\neg i}}
         {\|X_i^\top \bar{\mathbf{s}}_{\neg i}\|_2}

where :math:\bar{\mathbf{s}}_{\neg i} is the normalised sum of projected scores from all views except :math:i.

Parameters:

Name Type Description Default
latent_dimensions int

Number of latent dimensions. Default is 1.

1
center bool

Whether to subtract column means. Default True.

True
max_iter int

Maximum ALS iterations per dimension. Default is 500.

500
tol float

Convergence tolerance. Default is 1e-6.

1e-06
random_state int | None

Seed for reproducibility.

None
Example

import numpy as np rng = np.random.default_rng(0) X1 = rng.standard_normal((50, 10)) X2 = rng.standard_normal((50, 8)) model = PLS_ALS(latent_dimensions=2, random_state=0).fit([X1, X2])

Source code in cca_zoo/linear/_iterative.py
def __init__(
    self,
    latent_dimensions: int = 1,
    center: bool = True,
    max_iter: int = 500,
    tol: float = 1e-6,
    random_state: int | None = None,
) -> None:
    super().__init__(latent_dimensions=latent_dimensions, center=center)
    self.max_iter = max_iter
    self.tol = tol
    self.random_state = random_state

SCCA_PMD

SCCA_PMD(latent_dimensions: int = 1, center: bool = True, tau: float | list[float] = 1.0, max_iter: int = 500, tol: float = 1e-06, random_state: int | None = None)

Bases: _BaseIterative

Sparse CCA via Penalized Matrix Decomposition.

Maximises the cross-view covariance subject to L1 norm constraints on each weight vector:

.. math::

\max_{\mathbf{w}_1, \mathbf{w}_2}
    \mathbf{w}_1^\top X_1^\top X_2 \mathbf{w}_2

\text{subject to }
\|\mathbf{w}_i\|_1 \leq \tau_i \sqrt{p_i},\quad
\|\mathbf{w}_i\|_2 = 1

The update for each view uses bisection to find the soft-threshold that satisfies the L1 constraint exactly.

References

Witten, D. M., Tibshirani, R., & Hastie, T. (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics, 10(3), 515–534.

Parameters:

Name Type Description Default
latent_dimensions int

Number of latent dimensions. Default is 1.

1
center bool

Whether to subtract column means. Default True.

True
tau float | list[float]

L1 bound scaling factor(s) in (0, 1]. The actual L1 bound is tau * sqrt(n_features_i). Default is 1 (no sparsity).

1.0
max_iter int

Maximum ALS iterations. Default is 500.

500
tol float

Convergence tolerance. Default is 1e-6.

1e-06
random_state int | None

Seed for reproducibility.

None
Example

import numpy as np rng = np.random.default_rng(0) X1 = rng.standard_normal((50, 10)) X2 = rng.standard_normal((50, 8)) model = SCCA_PMD(tau=0.5, random_state=0).fit([X1, X2])

Source code in cca_zoo/linear/_iterative.py
def __init__(
    self,
    latent_dimensions: int = 1,
    center: bool = True,
    tau: float | list[float] = 1.0,
    max_iter: int = 500,
    tol: float = 1e-6,
    random_state: int | None = None,
) -> None:
    super().__init__(
        latent_dimensions=latent_dimensions,
        center=center,
        max_iter=max_iter,
        tol=tol,
        random_state=random_state,
    )
    self.tau = tau

fit

fit(views: list[ArrayLike], y: None = None) -> SCCA_PMD

Fit the SCCA_PMD model.

Parameters:

Name Type Description Default
views list[ArrayLike]

List of arrays, each (n_samples, n_features_i).

required
y None

Ignored.

None

Returns:

Name Type Description
self SCCA_PMD

Fitted estimator.

Raises:

Type Description
ValueError

If fewer than 2 views are provided.

Source code in cca_zoo/linear/_iterative.py
def fit(self, views: list[ArrayLike], y: None = None) -> SCCA_PMD:
    """Fit the SCCA_PMD model.

    Args:
        views: List of arrays, each (n_samples, n_features_i).
        y: Ignored.

    Returns:
        self: Fitted estimator.

    Raises:
        ValueError: If fewer than 2 views are provided.
    """
    # Store processed tau for use in _update_weight
    self._tau: list[float] = []  # set in super().fit via _setup_fit
    super().fit(views, y)
    return self

SCCA_ADMM

SCCA_ADMM(latent_dimensions: int = 1, center: bool = True, tau: float | list[float] = 0.1, mu: float = 1.0, max_iter: int = 500, tol: float = 1e-06, random_state: int | None = None)

Bases: _BaseIterative

Sparse CCA via Alternating Direction Method of Multipliers.

Solves the sparse CCA problem using ADMM to enforce both the L1 sparsity constraint on weight vectors and the unit-norm constraint on the projected scores simultaneously.

For view :math:i the ADMM sub-problems are:

  • :math:\mathbf{w}_i update — proximal gradient step w.r.t. the data fidelity term.
  • Auxiliary variable :math:\mathbf{z}_i update — soft thresholding.
  • Dual variable update.
References

Suo, X., Mineiro, P., & Anandkumar, A. (2017). Sparse canonical correlation analysis. arXiv:1705.10865.

Parameters:

Name Type Description Default
latent_dimensions int

Number of latent dimensions. Default is 1.

1
center bool

Whether to subtract column means. Default True.

True
tau float | list[float]

L1 regularisation weight(s). Default is 0.1.

0.1
mu float

ADMM penalty parameter (step size). Default is 1.0.

1.0
max_iter int

Maximum outer iterations. Default is 500.

500
tol float

Convergence tolerance. Default is 1e-6.

1e-06
random_state int | None

Seed for reproducibility.

None
Example

import numpy as np rng = np.random.default_rng(0) X1 = rng.standard_normal((50, 10)) X2 = rng.standard_normal((50, 8)) model = SCCA_ADMM(tau=0.1, random_state=0).fit([X1, X2])

Source code in cca_zoo/linear/_iterative.py
def __init__(
    self,
    latent_dimensions: int = 1,
    center: bool = True,
    tau: float | list[float] = 0.1,
    mu: float = 1.0,
    max_iter: int = 500,
    tol: float = 1e-6,
    random_state: int | None = None,
) -> None:
    super().__init__(
        latent_dimensions=latent_dimensions,
        center=center,
        max_iter=max_iter,
        tol=tol,
        random_state=random_state,
    )
    self.tau = tau
    self.mu = mu

SCCA_IPLS

SCCA_IPLS(latent_dimensions: int = 1, center: bool = True, alpha: float | list[float] = 0.0, l1_ratio: float | list[float] = 1.0, max_iter: int = 500, tol: float = 1e-06, random_state: int | None = None)

Bases: _BaseIterative

Iterative PLS with elastic net penalty on weight vectors.

Alternates between penalised regression sub-problems. For view :math:i:

.. math::

\hat{\mathbf{w}}_i = \arg\min_{\mathbf{w}}
    \frac{1}{2n} \|X_i \mathbf{w} - \bar{\mathbf{s}}_{\neg i}\|_2^2
    + \alpha_i \Bigl(
        l_1 \|\mathbf{w}\|_1
        + \tfrac{1-l_1}{2} \|\mathbf{w}\|_2^2
    \Bigr)

followed by a normalisation step to enforce unit variance of the score.

References

Mai, Q., & Zhang, X. (2019). An iterative penalized least squares approach to sparse canonical correlation analysis. Biometrics, 75(3), 734–744.

Parameters:

Name Type Description Default
latent_dimensions int

Number of latent dimensions. Default is 1.

1
center bool

Whether to subtract column means. Default True.

True
alpha float | list[float]

Elastic net penalty strength(s). Default is 0.

0.0
l1_ratio float | list[float]

Ratio of L1 to total penalty. 1 = lasso, 0 = ridge. Default is 1.

1.0
max_iter int

Maximum ALS iterations. Default is 500.

500
tol float

Convergence tolerance. Default is 1e-6.

1e-06
random_state int | None

Seed for reproducibility.

None
Example

import numpy as np rng = np.random.default_rng(0) X1 = rng.standard_normal((50, 10)) X2 = rng.standard_normal((50, 8)) model = SCCA_IPLS(alpha=0.1, random_state=0).fit([X1, X2])

Source code in cca_zoo/linear/_iterative.py
def __init__(
    self,
    latent_dimensions: int = 1,
    center: bool = True,
    alpha: float | list[float] = 0.0,
    l1_ratio: float | list[float] = 1.0,
    max_iter: int = 500,
    tol: float = 1e-6,
    random_state: int | None = None,
) -> None:
    super().__init__(
        latent_dimensions=latent_dimensions,
        center=center,
        max_iter=max_iter,
        tol=tol,
        random_state=random_state,
    )
    self.alpha = alpha
    self.l1_ratio = l1_ratio

SCCA_Span

SCCA_Span(latent_dimensions: int = 1, center: bool = True, span: int | list[int] | None = None, max_iter: int = 500, tol: float = 1e-06, random_state: int | None = None)

Bases: _BaseIterative

SpanCCA — sparse CCA via truncated power iteration.

Solves sparse CCA by a sparse power iteration where each weight update retains only the span entries with the largest absolute values.

References

Asteris, M., Khanna, R., Kyrillidis, A., & Dimakis, A. G. (2016). Bilinear approaches for online learning over large feature spaces. NeurIPS 2016. (SpanCCA algorithm).

Parameters:

Name Type Description Default
latent_dimensions int

Number of latent dimensions. Default is 1.

1
center bool

Whether to subtract column means. Default True.

True
span int | list[int] | None

Number of non-zero entries to retain per view. Either a single int or a list. Default is None (keep all — no sparsity).

None
max_iter int

Maximum ALS iterations. Default is 500.

500
tol float

Convergence tolerance. Default is 1e-6.

1e-06
random_state int | None

Seed for reproducibility.

None
Example

import numpy as np rng = np.random.default_rng(0) X1 = rng.standard_normal((50, 10)) X2 = rng.standard_normal((50, 8)) model = SCCA_Span(span=5, random_state=0).fit([X1, X2])

Source code in cca_zoo/linear/_iterative.py
def __init__(
    self,
    latent_dimensions: int = 1,
    center: bool = True,
    span: int | list[int] | None = None,
    max_iter: int = 500,
    tol: float = 1e-6,
    random_state: int | None = None,
) -> None:
    super().__init__(
        latent_dimensions=latent_dimensions,
        center=center,
        max_iter=max_iter,
        tol=tol,
        random_state=random_state,
    )
    self.span = span

ElasticCCA

ElasticCCA(latent_dimensions: int = 1, center: bool = True, alpha: float | list[float] = 0.0, l1_ratio: float | list[float] = 0.5, max_iter: int = 500, tol: float = 1e-06, random_state: int | None = None)

Bases: _BaseIterative

Elastic net regularised CCA.

Alternates between elastic net regression sub-problems, regressing each view's score against the sum of all other views' scores:

.. math::

\hat{\mathbf{w}}_i = \arg\min_{\mathbf{w}}
    \frac{1}{2n} \|X_i \mathbf{w} - \mathbf{s}_{\text{all}}\|_2^2
    + \alpha_i \Bigl(
        l_1 \|\mathbf{w}\|_1
        + \tfrac{1 - l_1}{2} \|\mathbf{w}\|_2^2
    \Bigr)

where :math:\mathbf{s}_{\text{all}} = \sum_j X_j \mathbf{w}_j / \|\cdot\|.

References

Waaijenborg, S., de Witt Hamer, P. C. V., & Zwinderman, A. H. (2008). Quantifying the association between gene expressions and DNA-markers by penalized canonical correlation analysis. Statistical Applications in Genetics and Molecular Biology, 7(1).

Parameters:

Name Type Description Default
latent_dimensions int

Number of latent dimensions. Default is 1.

1
center bool

Whether to subtract column means. Default True.

True
alpha float | list[float]

Elastic net regularisation strength. Default is 0.

0.0
l1_ratio float | list[float]

L1 / total penalty ratio. Default is 0.5.

0.5
max_iter int

Maximum ALS iterations. Default is 500.

500
tol float

Convergence tolerance. Default is 1e-6.

1e-06
random_state int | None

Seed for reproducibility.

None
Example

import numpy as np rng = np.random.default_rng(0) X1 = rng.standard_normal((50, 10)) X2 = rng.standard_normal((50, 8)) model = ElasticCCA(alpha=0.1, l1_ratio=0.5, random_state=0).fit([X1, X2])

Source code in cca_zoo/linear/_iterative.py
def __init__(
    self,
    latent_dimensions: int = 1,
    center: bool = True,
    alpha: float | list[float] = 0.0,
    l1_ratio: float | list[float] = 0.5,
    max_iter: int = 500,
    tol: float = 1e-6,
    random_state: int | None = None,
) -> None:
    super().__init__(
        latent_dimensions=latent_dimensions,
        center=center,
        max_iter=max_iter,
        tol=tol,
        random_state=random_state,
    )
    self.alpha = alpha
    self.l1_ratio = l1_ratio

ParkhomenkoCCA

ParkhomenkoCCA(latent_dimensions: int = 1, center: bool = True, tau: float | list[float] = 0.1, max_iter: int = 500, tol: float = 1e-06, random_state: int | None = None)

Bases: _BaseIterative

Sparse CCA via soft-thresholding power iteration (Parkhomenko 2009).

Uses a fixed soft-threshold :math:\tau_i rather than the adaptive bisection search of :class:SCCA_PMD:

.. math::

\mathbf{w}_i \leftarrow
    S_{\tau_i}(X_i^\top \bar{\mathbf{s}}_{\neg i})

where :math:S_\tau is the element-wise soft-threshold operator.

References

Parkhomenko, E., Tritchler, D., & Beyene, J. (2009). Sparse canonical correlation analysis with application to genomic data integration. Statistical Applications in Genetics and Molecular Biology, 8(1).

Parameters:

Name Type Description Default
latent_dimensions int

Number of latent dimensions. Default is 1.

1
center bool

Whether to subtract column means. Default True.

True
tau float | list[float]

Soft-threshold parameter(s). Default is 0.1.

0.1
max_iter int

Maximum ALS iterations. Default is 500.

500
tol float

Convergence tolerance. Default is 1e-6.

1e-06
random_state int | None

Seed for reproducibility.

None
Example

import numpy as np rng = np.random.default_rng(0) X1 = rng.standard_normal((50, 10)) X2 = rng.standard_normal((50, 8)) model = ParkhomenkoCCA(tau=0.1, random_state=0).fit([X1, X2])

Source code in cca_zoo/linear/_iterative.py
def __init__(
    self,
    latent_dimensions: int = 1,
    center: bool = True,
    tau: float | list[float] = 0.1,
    max_iter: int = 500,
    tol: float = 1e-6,
    random_state: int | None = None,
) -> None:
    super().__init__(
        latent_dimensions=latent_dimensions,
        center=center,
        max_iter=max_iter,
        tol=tol,
        random_state=random_state,
    )
    self.tau = tau