Getting Started¶
Installation¶
Install the core package with pip:
The core package requires Python ≥ 3.10 and provides all linear and nonparametric methods.
Optional extras¶
Install extras for additional method families:
pip install cca-zoo[deep] # DCCA variants (PyTorch + Lightning)
pip install cca-zoo[probabilistic] # Probabilistic CCA (NumPyro + JAX)
pip install cca-zoo[all] # Everything above
PyTorch installation
For GPU support or specific CUDA versions, follow the
PyTorch installation guide before
installing cca-zoo[deep].
Core concepts¶
Views¶
CCA-Zoo expects data as a list of arrays, one per view:
Each array has shape (n_samples, n_features_i). All views must share the same number of rows.
The estimator API¶
Every model follows the same three-step pattern:
from cca_zoo.linear import CCA
model = CCA(latent_dimensions=2) # 1. construct
model.fit(views) # 2. fit
z = model.transform(views) # 3. use
Or equivalently:
Evaluating fit quality¶
score returns the average pairwise canonical correlation per latent dimension:
corrs = model.score(views) # np.ndarray, shape (latent_dimensions,)
print(corrs) # e.g. [0.94, 0.87]
Inspecting weights¶
After fitting, model.weights is a list of weight matrices (one per view):
Quick start¶
Two-view CCA¶
import numpy as np
from cca_zoo.datasets import JointData
from cca_zoo.linear import CCA
# Simulate data from a linear latent variable model
data = JointData(
n_views=2,
n_samples=200,
n_features=[50, 50],
latent_dimensions=2,
signal_to_noise=2.0,
random_state=0,
)
train_views = data.sample()
test_views = data.sample()
# Fit and evaluate
model = CCA(latent_dimensions=2).fit(train_views)
print("Canonical correlations:", model.score(test_views))
# Project into the shared latent space
z1, z2 = model.transform(test_views)
print("Latent shape:", z1.shape) # (200, 2)
Multiview CCA (≥2 views)¶
from cca_zoo.linear import MCCA
data = JointData(n_views=3, n_samples=200, n_features=30, random_state=0)
views = data.sample()
model = MCCA(latent_dimensions=2).fit(views)
print(model.score(views))
Regularised CCA¶
from cca_zoo.linear import rCCA
# c controls the ridge penalty (0 = CCA, 1 = PLS)
model = rCCA(latent_dimensions=2, c=0.1).fit(train_views)
Kernel CCA¶
from cca_zoo.nonparametric import KCCA
model = KCCA(latent_dimensions=2, kernel="rbf", gamma=0.01, c=0.1).fit(train_views)
z1, z2 = model.transform(test_views)
Hyperparameter search¶
CCA-Zoo's GridSearchCV wraps sklearn's grid search with a multiview interface:
from cca_zoo.model_selection import GridSearchCV
from cca_zoo.nonparametric import KCCA
param_grid = {"c": [0.01, 0.1, 1.0], "gamma": [0.01, 0.1]}
gs = GridSearchCV(KCCA(latent_dimensions=2, kernel="rbf"), param_grid, cv=5)
gs.fit(train_views)
print("Best params:", gs.best_params_)
Next steps¶
- User Guide — Linear Methods — learn which linear model to choose
- User Guide — Nonparametric Methods — kernel CCA explained
- User Guide — Deep Methods — using neural network encoders
- API Reference — full class and method documentation