canlab_help_7_multivariate_prediction

Load dataset with images and print descriptives
Collect some variables we need
Plot the ratings
Prediction
Run the Base model
Plot the results
Summarize within-person classification accuracy
Visualize weight map
Bootstrap weights: Get most reliable weights and p-values for voxels
Try normalizing/scaling data
Try selecting an a priori 'network of interest'
re-run prediction and plot
Other ideas

Dependencies:

% Matlab statistics toolbox
    % Matlab signal processing toolbox
    % Statistical Parametric Mapping (SPM) software https://www.fil.ion.ucl.ac.uk/spm/
    % For full functionality, the full suite of CANlab toolboxes is recommended. See here: Installing Tools

% About this dataset
% -------------------------------------------------------------------------
%
% This dataset includes 33 participants, with brain responses to six levels
% of heat (non-painful and painful).
%
% Aspects of this data appear in these papers:
% Wager, T.D., Atlas, L.T., Lindquist, M.A., Roy, M., Choong-Wan, W., Kross, E. (2013).
% An fMRI-Based Neurologic Signature of Physical Pain. The New England Journal of Medicine. 368:1388-1397
% (Study 2)
%
% Woo, C. -W., Roy, M., Buhle, J. T. & Wager, T. D. (2015). Distinct brain systems
% mediate the effects of nociceptive input and self-regulation on pain. PLOS Biology. 13(1):
% e1002036. doi:10.1371/journal.pbio.1002036
%
% Lindquist, Martin A., Anjali Krishnan, Marina López-Solà, Marieke Jepma, Choong-Wan Woo,
% Leonie Koban, Mathieu Roy, et al. 2015. ?Group-Regularized Individual Prediction:
% Theory and Application to Pain.? NeuroImage.
% http://www.sciencedirect.com/science/article/pii/S1053811915009982.
%

Load dataset with images and print descriptives

This dataset is shared on figshare.com, under this link: https://figshare.com/s/ca23e5974a310c44ca93

Here is a direct link to the dataset file with the fmri_data object: https://ndownloader.figshare.com/files/12708989

The key variable is image_obj This is an fmri_data object from the CANlab Core Tools repository for neuroimaging data analysis. See https://canlab.github.io/

image_obj.dat contains brain data for each image (average across trials) image_obj.Y contains pain ratings (one average rating per image)

image_obj.additional_info.subject_id contains integers coding for which Load the data file, downloading from figshare if needed

fmri_data_file = which('bmrk3_6levels_pain_dataset.mat');

if isempty(fmri_data_file)

    % attempt to download
    disp('Did not find data locally...downloading data file from figshare.com')

    fmri_data_file = websave('bmrk3_6levels_pain_dataset.mat', 'https://ndownloader.figshare.com/files/12708989');

end

load(fmri_data_file);

descriptives(image_obj);

 
Source: BMRK3 dataset from CANlab, PI Tor Wager
Data: .dat contains 6 images per participant, activation estimates during heat on L arm from level 1(44.3 degrees C) to level 6(49.3), in 1 degree increments.
 

____________________________________________________________________________________________________________________________________________
 Wager, T.D., Atlas, L.T., Lindquist, M.A., Roy, M., Choong-Wan, W., Kross, E. (2013). An fMRI-Based Neurologic Signature of Physical Pain. 
The New England Journal of Medicine. 368:1388-1397 (Study 2)                                                                                
                                                                                                                                            
 Woo, C. -W., Roy, M., Buhle, J. T. & Wager, T. D. (2015). Distinct brain systems mediate the effects of nociceptive input and 
self-regulation on pain. PLOS Biology. 13(1): e1002036. doi:10.1371/journal.pbio.1002036                                       
                                                                                                                               
Lindquist, Martin A., Anjali Krishnan, Marina López-Solà, Marieke Jepma, Choong-Wan Woo, Leonie Koban, Mathieu Roy, et al. 2015. 
“Group-Regularized Individual Prediction: Theory and Application to Pain.” NeuroImage.                                           
http://www.sciencedirect.com/science/article/pii/S1053811915009982.                                                              
                                                                                                                                 
____________________________________________________________________________________________________________________________________________
 
Summary of dataset
______________________________________________________
Images: 198	Nonempty: 198	Complete: 198
Voxels: 223707	Nonempty: 223707	Complete: 223707
Unique data values: 35583281
 
Min: -11.529	Max: 7.862	Mean: -0.004	Std: 0.227
 
    Percentiles      Values  
    ___________    __________

        0.1            -1.529
        0.5          -0.88055
          1          -0.67632
          5          -0.31891
         25         -0.087519
         50        0.00034864
         75          0.087461
         95           0.29757
         99           0.60449
       99.5           0.77676
       99.9            1.3177

Collect some variables we need

% subject_id is useful for cross-validation
%
% this used as a custom holdout set in fmri_data.predict() below will
% implement leave-one-subject-out cross-validation

subject_id = image_obj.additional_info.subject_id;

% ratings: reconstruct a subjects x temperatures matrix
% so we can plot it
%
% The command below does this only because we have exactly 6 conditions
% nested within 33 subjects and no missing data.

ratings = reshape(image_obj.Y, 6, 33)';

% temperatures are 44 - 49 (actually 44.3 - 49.3) in order for each person.

temperatures = image_obj.additional_info.temperatures;

Plot the ratings

-------------------------------------------------------------------------

create_figure('ratings');
lineplot_columns(ratings, 'color', [.7 .3 .3], 'markerfacecolor', [1 .5 0]);
xlabel('Temperature');
ylabel('Rating');

Prediction

There are many options

% Base model: Linear, whole brain prediction
%
% Outcome distribution:
% Continuous: Regression (LASSO-PCR) Categorical/2 choice: SVM, logistic regression
%
% Model Options:
% - Feature selection (part of the brain)
% - Feature integration (new features)
%
% Testing options:
% - Input data (what level)
% - Cross-validation
% - What testing metric (classification accuracy? RMSE?) and at what level
% (single-trial, condition averages, one-map-per-subject)
%
% Inference options:
% - Component maps, voxel-wise maps
% - Thresholding (Bootstrapping, Permutation)
% - Global null hypothesis testing / sanity checks (Permutation)

Run the Base model

% relevant functions:
% predict method (fmri_data)
% predict_test_suite method (fmri_data)

% Define custom holdout set.  If we use subject_id, which is a vector of
% integers with a unique integer per subject, then we are doing
% leave-one-subject-out cross-validation.

% let's build five-fold cross-validation set that leaves out ALL the images
% from a subject together. That way, we are always predicting out-of-sample
% (new individuals). If not, dependence across images from the same
% subjects may invalidate the estimate of predictive accuracy.

holdout_set = zeros(size(subject_id));      % holdout set membership for each image
n_subjects = length(unique(subject_id));
C = cvpartition(n_subjects, 'KFold', 5);
for i = 1:5
    teidx = test(C, i);                     % which subjects to leave out
    imgidx = ismember(subject_id, find(teidx)); % all images for these subjects
    holdout_set(imgidx) = i;
end

algoname = 'cv_lassopcr'; % cross-validated penalized regression. Predict pain ratings

[cverr, stats, optout] = predict(image_obj, 'algorithm_name', algoname, 'nfolds', holdout_set);

Cross-validated prediction with algorithm cv_lassopcr,   5 folds

Completed fit for all data in:   0 hours   0 min  5 secs 
Fold 1/5 done in:   0 hours   0 min  4 sec
Fold 2/5 done in:   0 hours   0 min  4 sec
Fold 3/5 done in:   0 hours   0 min  3 sec
Fold 4/5 done in:   0 hours   0 min  3 sec
Fold 5/5 done in:   0 hours   0 min  4 sec

Total Elapsed Time =   0 hours   0 min 22 sec

Plot the results

% Plot cross-validated predicted outcomes vs. actual

% Critical fields in stats output structure:
% stats.Y = actual outcomes
% stats.yfit = cross-val predicted outcomes
% pred_outcome_r: Correlation between .yfit and .Y
% weight_obj: Weight map used for prediction (across all subjects).

% Continuous outcomes:

create_figure('scatterplots', 1, 2);
plot(stats.yfit, stats.Y, 'o')
axis tight
refline
xlabel('Predicted pain');
ylabel('Observed pain');

% Consider that each subject has their own line:

for i = 1:n_subjects
    YY{i} = stats.Y(subject_id == i);
    Yfit{i} = stats.yfit(subject_id == i);
end

subplot(1, 2, 2);
line_plot_multisubject(Yfit, YY, 'nofigure');
xlabel('Predicted pain');
ylabel('Observed pain');
axis tight

% Yfit is a cell array, one cell per subject, with fitted values for that subject

average within-person r = 0.60, t( 32) = 7.93, p = 0.000000, num. missing:   0

Summarize within-person classification accuracy

Turn into classification:

% If the stim intensity increases by 1 degree, how often do we correctly
% predict an increase?
diffs = cellfun(@diff, Yfit, 'UniformOutput', false); % successive increases in predicted values with increases in stim intensity

% Subjects (rows) x comparisons (cols), each comparison 1 degree apart
diffs = cat(2, diffs{:})';

acc_per_subject = sum(diffs > 0, 2) ./ size(diffs, 2);
acc_per_comparison = (sum(diffs > 0) ./ size(diffs, 1))';
cohens_d_per_comparison = (mean(diffs) ./ std(diffs))';

fprintf('Mean acc is : %3.2f%% across all successive 1 degree increments\n', 100*mean(acc_per_subject));

% Create and display a table:

comparison = {'45-44' '46-45' '47-46' '48-47' '49-48'}';
results_table = table(comparison, acc_per_comparison, cohens_d_per_comparison);
disp(results_table)

% 45-44 degrees, 46-45, 47-46, etc.

Mean acc is : 73.94% across all successive 1 degree increments
    comparison    acc_per_comparison    cohens_d_per_comparison
    __________    __________________    _______________________

     '45-44'           0.48485                  -0.0587        
     '46-45'           0.81818                  0.76512        
     '47-46'           0.84848                  0.93114        
     '48-47'            0.9697                    1.432        
     '49-48'           0.57576                  0.37981

Visualize weight map

w = stats.weight_obj;  % This is an image object that we can view, etc.

orthviews(w)

create_figure('montage');
o2 = montage(w);
o2 = title_montage(o2, 5, 'Predictive model weights');

SPM12: spm_check_registration (v6245)              17:48:58 - 03/08/2018
========================================================================
Display <a href="matlab:spm_image('display','/Users/torwager/Documents/GitHub/CanlabCore/CanlabCore/canlab_canonical_brains/Canonical_brains_surfaces/keuken_2014_enhanced_for_underlay.img,1');">/Users/torwager/Documents/GitHub/CanlabCore/CanlabCore/canlab_canonical_brains/Canonical_brains_surfaces/keuken_2014_enhanced_for_underlay.img,1</a>
Grouping contiguous voxels:   3 regions

ans =

  1×1 cell array

    {1×3 region}

Setting up fmridisplay objects
This takes a lot of memory, and can hang if you have too little.
Grouping contiguous voxels:   3 regions
sagittal montage: 4350 voxels displayed, 219357 not displayed on these slices
sagittal montage: 4292 voxels displayed, 219415 not displayed on these slices
sagittal montage: 3990 voxels displayed, 219717 not displayed on these slices
axial montage: 32223 voxels displayed, 191484 not displayed on these slices
axial montage: 34538 voxels displayed, 189169 not displayed on these slices

Bootstrap weights: Get most reliable weights and p-values for voxels

% Re-run prediction, adding a 'bootstrap' flag.
% For a final analysis, at least 5K bootstrap samples is a good idea

% (This is not run in the example; see help fmri_data.predict for how to
% run bootstrapping)

Try normalizing/scaling data

% z-score every image, removing the mean and dividing by the std.
% Remove some scaling noise, but also some signal...
 image_obj = rescale(image_obj, 'zscoreimages');

% re-run prediction and plot

[cverr, stats, optout] = predict(image_obj, 'algorithm_name', algoname, 'nfolds', holdout_set);

create_figure('scatterplots');

for i = 1:n_subjects
    YY{i} = stats.Y(subject_id == i);
    Yfit{i} = stats.yfit(subject_id == i);
end

line_plot_multisubject(Yfit, YY, 'nofigure');
xlabel('Predicted pain');
ylabel('Observed pain');
axis tight

fprintf('Overall prediction-outcome correlation is %3.2f\n', stats.pred_outcome_r)

figure; o2 = montage(stats.weight_obj);
o2 = title_montage(o2, 5, 'Predictive model weights');

Cross-validated prediction with algorithm cv_lassopcr,   5 folds

Completed fit for all data in:   0 hours   0 min  5 secs 
Fold 1/5 done in:   0 hours   0 min  4 sec
Fold 2/5 done in:   0 hours   0 min  3 sec
Fold 3/5 done in:   0 hours   0 min  3 sec
Fold 4/5 done in:   0 hours   0 min  3 sec
Fold 5/5 done in:   0 hours   0 min  4 sec

Total Elapsed Time =   0 hours   0 min 22 sec
average within-person r = 0.59, t( 32) = 13.18, p = 0.000000, num. missing:   0
Overall prediction-outcome correlation is 0.59
Setting up fmridisplay objects
This takes a lot of memory, and can hang if you have too little.
Grouping contiguous voxels:   3 regions
sagittal montage: 4350 voxels displayed, 219357 not displayed on these slices
sagittal montage: 4292 voxels displayed, 219415 not displayed on these slices
sagittal montage: 3990 voxels displayed, 219717 not displayed on these slices
axial montage: 32223 voxels displayed, 191484 not displayed on these slices
axial montage: 34538 voxels displayed, 189169 not displayed on these slices

Try selecting an a priori 'network of interest'

% re-load
load('bmrk3_6levels_pain_dataset.mat', 'image_obj')

% load an a priori pain-related mask and apply it

mask = fmri_data(which('pain_2s_z_val_FDR_05.img.gz')); % in Neuroimaging_Pattern_Masks repository

image_obj = apply_mask(image_obj, mask);

% show mean data with mask
m = mean(image_obj);
figure; o2 = montage(m);
o2 = title_montage(o2, 5, 'Mean image data, masked');

Using default mask: /Users/torwager/Documents/GitHub/CanlabCore/CanlabCore/canlab_canonical_brains/Canonical_brains_surfaces/brainmask.nii
loading mask. mapping volumes. 
checking that dimensions and voxel sizes of volumes are the same. 
Pre-allocating data array. Needed: 1409312 bytes
Loading image number:     1
Elapsed time is 0.019081 seconds.
Image names entered, but fullpath attribute is empty. Getting path info.
Setting up fmridisplay objects
This takes a lot of memory, and can hang if you have too little.
Grouping contiguous voxels:  27 regions
sagittal montage: 1088 voxels displayed, 47895 not displayed on these slices
sagittal montage: 1171 voxels displayed, 47812 not displayed on these slices
sagittal montage: 980 voxels displayed, 48003 not displayed on these slices
axial montage: 7496 voxels displayed, 41487 not displayed on these slices
axial montage: 7950 voxels displayed, 41033 not displayed on these slices

re-run prediction and plot

[cverr, stats, optout] = predict(image_obj, 'algorithm_name', algoname, 'nfolds', holdout_set);

create_figure('scatterplots');

for i = 1:n
    YY{i} = stats.Y(subject_id == i);
    Yfit{i} = stats.yfit(subject_id == i);
end

line_plot_multisubject(Yfit, YY, 'nofigure');
xlabel('Predicted pain');
ylabel('Observed pain');
axis tight

fprintf('Overall prediction-outcome correlation is %3.2f\n', stats.pred_outcome_r)

figure; o2 = montage(stats.weight_obj);
o2 = title_montage(o2, 5, 'Predictive model weights');

Cross-validated prediction with algorithm cv_lassopcr,   5 folds

Completed fit for all data in:   0 hours   0 min  1 secs 
Fold 1/5 done in:   0 hours   0 min  1 sec
Fold 2/5 done in:   0 hours   0 min  1 sec
Fold 3/5 done in:   0 hours   0 min  1 sec
Fold 4/5 done in:   0 hours   0 min  1 sec
Fold 5/5 done in:   0 hours   0 min  1 sec

Total Elapsed Time =   0 hours   0 min  4 sec
average within-person r = 0.60, t( 32) = 6.59, p = 0.000000, num. missing:   0
Overall prediction-outcome correlation is 0.60
Setting up fmridisplay objects
This takes a lot of memory, and can hang if you have too little.
Grouping contiguous voxels:  27 regions
sagittal montage: 1088 voxels displayed, 47895 not displayed on these slices
sagittal montage: 1171 voxels displayed, 47812 not displayed on these slices
sagittal montage: 980 voxels displayed, 48003 not displayed on these slices
axial montage: 7496 voxels displayed, 41487 not displayed on these slices
axial montage: 7950 voxels displayed, 41033 not displayed on these slices

Other ideas

There are many ways to build from this basic analysis.

% Data
% ------------
% Examine/clean outliers
% Transformations of data: Scaling/centering to remove global mean,
% ventricle mean
% Identify unaccounted sources of variation/subgroups (experimenter sex,
% etc.)

% Features
% ------------
% A priori regions / meta-analysis
% Feature selection in cross-val loop: Univariate, recursive feature elim

Contents