CCA-Zoo¶
Multiview Canonical Correlation Analysis for Python — scikit-learn compatible, research-grade, batteries included.
What is CCA?¶
Given two or more views of the same observations — brain imaging and behavioural scores, gene expression and phenotypic data, audio and video features — Canonical Correlation Analysis finds projections that maximise correlation between the projected views.
CCA-Zoo extends classical CCA in several directions:
-
Linear & regularised
Classical CCA, ridge-regularised rCCA, PLS, and seven sparse/elastic-net variants for high-dimensional settings.
-
Kernel & nonparametric
KCCA, KGCCA, and KTCCA bring nonlinear relationships into reach via the kernel trick — no explicit feature map needed.
-
Deep learning
DCCA and variants (EY, NOI, SDL, DCCAE, DVCCA, DTCCA, BarlowTwins, VICReg) using your own
nn.Moduleencoders with PyTorch Lightning. -
Probabilistic
Full Bayesian treatment of CCA via NUTS MCMC with NumPyro — posterior inference over latent variables and loadings.
Unified API¶
Every model follows the same three-step scikit-learn pattern:
from cca_zoo.linear import CCA, rCCA, PLS
from cca_zoo.nonparametric import KCCA
# 1. construct
model = CCA(latent_dimensions=2)
# 2. fit — views is a list of arrays, one per dataset
model.fit([X1, X2])
# 3. use
z1, z2 = model.transform([X1, X2])
corrs = model.score([X1, X2]) # canonical correlations, shape (2,)
W1, W2 = model.weights # weight matrices
Models are sklearn.base.BaseEstimator subclasses, so they work directly with
GridSearchCV, Pipeline, and cross-validation utilities.
Navigate the docs¶
-
Installation, quick start examples, and core concepts.
-
In-depth explanations of each method family with usage guidance.
-
Full class and method documentation auto-generated from source.
-
Development setup, coding standards, and how to contribute.