Academic Publications
Efficient Algorithms for the CCA Family: Unconstrained Objectives with Unbiased Gradients
James Chapman, Ana Lawry Aguila, Lennie Wells
arXiv preprint arXiv:2310.01012, 2023
CodeFusilli: A Python package housing a collection of deep-learning multi-modal data fusion method pipelines!
Florence J Townend, James Chapman, and James H Cole
, 2023
CodeCCA-Zoo: A collection of Regularized, Deep Learning based, Kernel, and Probabilistic CCA methods in a scikit-learn style framework
James Chapman, Hao-Ting Wang
Journal of Open Source Software, 2021
PDF CodeCanonical Correlation Analysis Zoo: A collection of Regularized, Deep Learning based, Kernel, and Probabilistic methods in a scikit-learn style framework
Canonical correlation analysis and partial least squares for identifying brain-behaviour associations: a tutorial and a comparative study
Agoston Mihalik, James Chapman, Rick A Adams, Nils R Winter, Fabio S Ferreira, John Shawe-Taylor, Janaina Mourão-Miranda, Alzheimer’s Disease Neuroimaging Initiative
Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 2022
PDF CodeCanonical correlation analysis (CCA) and partial least squares (PLS) are powerful multivariate methods for capturing associations across 2 modalities of data (e.g., brain and behavior). However, when the sample size is similar to or smaller than the number of variables in the data, standard CCA and PLS models may overfit, i.e., find spurious associations that generalize poorly to new data. Dimensionality reduction and regularized extensions of CCA and PLS have been proposed to address this problem, yet most studies using these approaches have some limitations. This work gives a theoretical and practical introduction into the most common CCA/PLS models and their regularized variants. We examine the limitations of standard CCA and PLS when the sample size is similar to or smaller than the number of variables. We discuss how dimensionality reduction and regularization techniques address this problem and explain their main advantages and disadvantages. We highlight crucial aspects of the CCA/PLS analysis framework, including optimizing the hyperparameters of the model and testing the identified associations for statistical significance. We apply the described CCA/PLS models to simulated data and real data from the Human Connectome Project and Alzheimer’s Disease Neuroimaging Initiative (both of n > 500). We use both low- and high-dimensionality versions of these data (i.e., ratios between sample size and variables in the range of ∼1–10 and ∼0.1–0.01, respectively) to demonstrate the impact of data dimensionality on the models. Finally, we summarize the key lessons of the tutorial.
A Generalized EigenGame with Extensions to Multiview Representation Learning
James Chapman, Ana Lawry Aguila, Lennie Wells
arXiv preprint arXiv:2211.11323, 2022
PDFGeneralized Eigenvalue Problems (GEPs) encompass a range of interesting dimensionality reduction methods. Development of efficient stochastic approaches to these problems would allow them to scale to larger datasets. Canonical Correlation Analysis (CCA) is one example of a GEP for dimensionality reduction which has found extensive use in problems with two or more views of the data. Deep learning extensions of CCA require large mini-batch sizes, and therefore large memory consumption, in the stochastic setting to achieve good performance and this has limited its application in practice. Inspired by the Generalized Hebbian Algorithm, we develop an approach to solving stochastic GEPs in which all constraints are softly enforced by Lagrange multipliers. Then by considering the integral of this Lagrangian function, its pseudo-utility, and inspired by recent formulations of Principal Components Analysis and GEPs as games with differentiable utilities, we develop a game-theory inspired approach to solving GEPs. We show that our approaches share much of the theoretical grounding of the previous Hebbian and game theoretic approaches for the linear case but our method permits extension to general function approximators like neural networks for certain GEPs for dimensionality reduction including CCA which means our method can be used for deep multiview representation learning. We demonstrate the effectiveness of our method for solving GEPs in the stochastic setting using canonical multiview datasets and demonstrate state-of-the-art performance for optimizing Deep CCA.
Conditional VAEs for Confound Removal and Normative Modelling of Neurodegenerative Diseases
Ana Lawry Aguila, James Chapman, Mohammed Janahi, Andre Altmann
, 2022
Understanding pathological mechanisms for heterogeneous brain disorders is a difficult challenge. Normative modelling provides a statistical description of the ‘normal’ range that can be used at subject level to detect deviations, which relate to disease presence, disease severity or disease subtype. Here we trained a conditional Variational Autoencoder (cVAE) on structural MRI data from healthy controls to create a normative model conditioned on confounding variables such as age. The cVAE allows us to use deep learning to identify complex relationships that are independent of these confounds which might otherwise inflate pathological effects. We propose a latent deviation metric and use it to quantify deviations in individual subjects with neurological disorders and, in an independent Alzheimer’s disease dataset, subjects with varying degrees of pathological ageing. Our model is able to identify these disease cohorts as deviations from the normal brain in such a way that reflect disease severity.
Multi-modal Variational Autoencoders for normative modelling across multiple imaging modalities
Ana Lawry Aguila, James Chapman, Andre Altmann
arXiv e-prints, 2023
PDFOne of the challenges of studying common neurological disorders is disease heterogeneity including differences in causes, neuroimaging characteristics, comorbidities, or genetic variation. Normative modelling has become a popular method for studying such cohorts where the 'normal' behaviour of a physiological system is modelled and can be used at subject level to detect deviations relating to disease pathology. For many heterogeneous diseases, we expect to observe abnormalities across a range of neuroimaging and biological variables. However, thus far, normative models have largely been developed for studying a single imaging modality. We aim to develop a multi-modal normative modelling framework where abnormality is aggregated across variables of multiple modalities and is better able to detect deviations than uni-modal baselines. We propose two multi-modal VAE normative models to detect subject level deviations across T1 and DTI data. Our proposed models were better able to detect diseased individuals, capture disease severity, and correlate with patient cognition than baseline approaches. We also propose a multivariate latent deviation metric, measuring deviations from the joint latent space, which outperformed feature-based metrics.
Teaching Experience
- Foundations of AI: Lectured for 50 students, including kernel methods.
- Supervised Learning: Teaching Assistant.
- Numerical Optimisation: Teaching Assistant.
- Machine Learning for Domain Specialists: Teaching Assistant.