cca_zoo.probabilistic¶
Probabilistic CCA via MCMC. Requires pip install cca-zoo[probabilistic].
ProbabilisticCCA ¶
ProbabilisticCCA(latent_dimensions: int = 1, center: bool = True, num_warmup: int = 500, num_samples: int = 1000, random_state: int = 0)
Bases: BaseModel
Probabilistic Canonical Correlation Analysis via NUTS MCMC.
Fits a Bayesian latent variable model with the following generative process for V views::
z ~ N(0, I) (latent variable)
x_i | z ~ N(W_i z + mu_i, Psi_i) (per-view likelihood)
MCMC sampling is performed with the No-U-Turn Sampler (NUTS) from
numpyro. After fitting, :meth:transform returns the posterior
mean of z conditioned on the observed views (computed analytically
using the posterior mean formula for linear Gaussian models).
The weights_ attribute is set to the posterior mean of each W_i
matrix so that :class:~cca_zoo._base.BaseModel's scoring utilities
work without modification.
References
Bach, F. R. & Jordan, M. I. "A probabilistic interpretation of canonical correlation analysis." (2005). Wang, C. "Variational Bayesian approach to canonical correlation analysis." IEEE Transactions on Neural Networks 18.3 (2007).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
latent_dimensions
|
int
|
Dimensionality of the latent space. Default is 1. |
1
|
center
|
bool
|
Whether to center each view before fitting. Default is True. |
True
|
num_warmup
|
int
|
Number of NUTS warm-up (burn-in) steps. Default is 500. |
500
|
num_samples
|
int
|
Number of NUTS posterior samples to draw. Default is 1000. |
1000
|
random_state
|
int
|
Integer seed for JAX PRNG. Default is 0. |
0
|
Example
import numpy as np rng = np.random.default_rng(0) X1 = rng.standard_normal((50, 4)) X2 = rng.standard_normal((50, 3)) model = ProbabilisticCCA( ... latent_dimensions=2, num_warmup=10, num_samples=10 ... ).fit([X1, X2])
Source code in cca_zoo/probabilistic/_pcca.py
fit ¶
Run NUTS MCMC to infer posterior over model parameters and latents.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
views
|
list[ArrayLike]
|
List of arrays, each of shape (n_samples, n_features_i). All arrays must have the same number of rows. |
required |
y
|
None
|
Ignored. Present for scikit-learn API compatibility. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
self |
ProbabilisticCCA
|
Fitted estimator. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If fewer than 2 views are provided. |
ValueError
|
If views have inconsistent numbers of samples. |
Source code in cca_zoo/probabilistic/_pcca.py
transform ¶
Return the posterior mean of the shared latent variable z.
The posterior mean is computed analytically for a linear Gaussian model using the posterior mean W matrices::
Sigma_z|x = (I + sum_i W_i^T Psi_i^{-1} W_i)^{-1}
mu_z|x = Sigma_z|x sum_i W_i^T Psi_i^{-1} (x_i - mu_i)
As an approximation, diagonal noise variances are estimated from the posterior samples.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
views
|
list[ArrayLike]
|
List of arrays, each of shape (n_samples, n_features_i). |
required |
Returns:
| Type | Description |
|---|---|
list[ndarray]
|
List with one numpy array of shape (n_samples, latent_dimensions) |
list[ndarray]
|
containing the posterior mean of z for each observation. |
Raises:
| Type | Description |
|---|---|
NotFittedError
|
If |