If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
We revisit a classical problem of selecting a subset of participants from a larger cohort such that a statistical model estimated using only the smaller subset yields similar parameter estimates as a model where all participants are used. This setup is important in AD studies: selecting a subset of “representative” participants (using baseline imaging, clinical, cognitive data) from a larger cohort for longitudinal measurements may be necessary due to budget/logistic constraints. We present the first known algorithm for the regime where the baseline predictors are high-dimensional data (e.g., imaging or genetic data) and we must select a specific subset of individuals that will maximize power for estimating parameters of a sparse linear model with a pre-specified sample size restriction.
Methods
The selection problem is only given access to data available at baseline. We must select a statistically diverse set of participants that best represent the underlying distribution of the cohort from which recruitment is performed. But enumerating all subsets is combinatorially large. Based on interesting geometric observations related to D-optimality in statistics, we give an algorithm for subject selection. We perform evaluations using ADNI2 data where the dependent variables are longitudinal cognitive outcomes and the predictors are image-based ROIs available at baseline.
Results
Using cognitive scores and risk factors for decline, we demonstrate that the subset selected by indeed approximates the full cohort. Figure 1 shows the error in the covariates picked by the full cohort model versus the selected subset model. Errors decrease gradually as the budget or number of allowed predictors (in the sparse model) increases. Figures 2-3 show the goodness of the selected subset in data-fitting (e.g., using a linear model). Figures 4-5 show the ratios of 1st to 4th moments of the ADAS and CDR samples generated from the full cohort and the subset, showing a good approximation for the full cohort.
Conclusions
We proposed machine learning algorithms for conducting AD focused longitudinal neuroimaging studies on a budget. Our experiments show that an optimized selection can maintain power while saving costs in longitudinal studies, including clinical trials.