Model Selection¶
cca_zoo.model_selection provides cross-validated hyperparameter search for multiview models.
It wraps sklearn's GridSearchCV with a multiview-compatible interface.
GridSearchCV¶
GridSearchCV finds the hyperparameters that maximise the average canonical correlation on
held-out folds, using sklearn's cross-validation machinery under the hood.
from cca_zoo.model_selection import GridSearchCV
from cca_zoo.linear import rCCA
param_grid = {"c": [0.001, 0.01, 0.1, 1.0]}
gs = GridSearchCV(rCCA(latent_dimensions=2), param_grid=param_grid, cv=5)
gs.fit([X1, X2])
print("Best c:", gs.best_params_["c"])
print("Best CV score:", gs.best_score_)
# Use the refitted best model directly
best_model = gs.best_estimator_
z1, z2 = best_model.transform([X1, X2])
Per-view parameters¶
Many CCA models accept per-view parameters as a scalar (broadcast to all views) or a list. In the parameter grid, use lists to specify per-view values:
from cca_zoo.model_selection import GridSearchCV
from cca_zoo.nonparametric import KCCA
# Scalar c applies to all views
param_grid = {"c": [0.01, 0.1, 1.0]}
# Per-view c (list of two values for a two-view model)
param_grid = {"c": [[0.01, 0.1], [0.1, 1.0]]}
gs = GridSearchCV(KCCA(latent_dimensions=2, kernel="rbf", gamma=0.01),
param_grid=param_grid, cv=5)
gs.fit([X1, X2])
Accessing results¶
GridSearchCV exposes the standard sklearn attributes:
import pandas as pd
# Full CV results table
df = pd.DataFrame(gs.cv_results_)
print(df[["param_c", "mean_test_score", "std_test_score"]].sort_values("mean_test_score"))
# Best parameters and score
print(gs.best_params_)
print(gs.best_score_)
# Best estimator (already refitted on the full training set)
best = gs.best_estimator_
Full example: tuning kernel CCA¶
import numpy as np
from cca_zoo.datasets import JointData
from cca_zoo.model_selection import GridSearchCV
from cca_zoo.nonparametric import KCCA
# Simulate data
data = JointData(n_views=2, n_samples=200, n_features=[30, 30],
latent_dimensions=2, signal_to_noise=2.0, random_state=0)
views = data.sample()
# Grid search over kernel and regularisation
param_grid = {
"kernel": ["rbf", "poly"],
"c": [0.01, 0.1, 1.0],
"gamma": [0.01, 0.1],
}
gs = GridSearchCV(
KCCA(latent_dimensions=2),
param_grid=param_grid,
cv=5,
)
gs.fit(views)
print("Best params:", gs.best_params_)
print("Best score: ", gs.best_score_)
Tips¶
scoreis the mean canonical correlation across alllatent_dimensions, averaged over all pairwise view combinations.- Cross-validation is done on the full set of views passed to
fit; train/test splits are row-wise (same rows held out across all views). - For sparse CCA methods, tune
tauoralphajust like any other hyperparameter. - When the grid is large, prefer a coarse-to-fine search: run
GridSearchCVon a coarse grid first, then refine around the best value.