Optimizing PiB-PET change-over-time measurement by analysis of longitudinal reliability, plausibility, and separability


      Automatic measurement of Standardized Uptake Value Ratio (SUVR) from PiB-PET images is complicated by many methodological and implementational choices, such as reference region, gray matter (GM) target segmentation, and use of partial volume correction (PVC). These variations directly influence measurement reproducibility and confound cross-site comparability. Such choices have been hotly debated in the literature, but few studies have examined reliability, plausibility, or separability of change-over-time values, despite their potential importance for amyloid-modifying clinical trials.


      We studied 68 participants in the Mayo Clinic Study of Aging and Alzheimer’s Disease Research Center studies with 3 serial PiB-PET scans each. Pseudo-steady-state (late uptake) PiB scans were registered to corresponding 3T T1-w MP-RAGE MRI, and an in-house standard template/atlas was warped to each MRI using ANTs software. Cortical GM was segmented using SPM12b and Longitudinal Freesurfer 5.3. PiB scans were (optionally) PVC’ed using a two-compartment (Meltzer) model. We calculated SUVR values for each PiB scan using 180 total variations, each with a different combination of reference, target, and PVC. Results were analyzed using a linear mixed-effects model. We compared methods on three independent criteria: longitudinal reliability (Rˆ2 of serial values), longitudinal plausibility (frequency of implausible apparently-decreasing within-subject trajectories), and group separability (AUROC of predicting unlikely vs. likely PiB accumulators from slope values).


      82/180 methods achieved at least 0.90 performance on all criteria; differences between the best performing methods were not significant. In general, sharply-segmented GM segmentations outperformed broader ones. SPM vs Freesurfer had mixed tradeoffs. Reference regions using supratentorial WM were highly reliable but performed poorly on plausibility criteria. Cerebellar GM was outperformed by cerebellar WM, whole cerebellum, crus, and pons, which were all roughly equivalent. Methods with PVC were each better or not significantly worse than those without.


      Our results support the use of PVC, narrow GM-segmentation targets, and whole-cerebellum, cerebellum-WM, crus, or pons reference regions for SUVR calculations.
      Figure thumbnail fx1
      Figure thumbnail fx2