TuesdayMorningAnalysis
Timeline: Who and When to analyze these datasets?
To begin with, every site to emphasize collecting 5 patients and 5 controls completely for initial analysis.
Which intersite calibrations to be done first?
Which specific hypotheses need to be addressed?
Which Regions of Interest need to be identified and how?
The analysis people to break off and continue the discussion of analysis for Phase II dataset.
SIRP Phase II task: It includes fixation, encode epochs, and probe epochs. Usually the probe epochs are what is compared to baseline/fixation. Do we want to model encode epochs as a separate condition because it accounts for some of the variance?
Do we want the "Learn" event modeled separately from encode epochs (1.5 s)? Consensus appears to be yes--Learn is a single condition regardless of which condition it precedes (since subjects don't know). Encode epoch is 6 s; only 18 presentations all loads together. Probes have 14 per block, with 6 blocks per scan. That can be analyzed as block or event-related.
Probes can be targets (in-set) or foils (out of the memory set). In the past these are lumped together as a "working memory" condition and the errors are few. Do we want to analyze these separately?
Target/foil distinctions can be done prospectively. Correct/incorrect trials are done retrospectively. (MGH has an Excel Macro that will pull out the behavioral analysis of Phase I and Phase II SIRP E-prime data.) Patients are going to make more errors, but not enough to give you the power to analyze them separately. But even if we don't analyze this, some may want to model it separately.
Do we want to do this event-related or as a block? For a large study and other sources of variance, the block analysis is more robust to timing errors. There will be differences in the timing errors between scanner and stimulus data for different sites. There is also no random jitter in the stimuli; and we don't have the HRF for each subject. If the event-related design models it well we'd be better off, but if we don't model it well we'll be worse off.
The error trials are more randomly spaced; we might be able to pull out error-related activity. We can also include errors/behavior as correlate.
Prioritize: Block design first to work out the other issues (pragmatics, pipeline, intersite calibrations). Event-related design could come second, using the ROIs found in the block analysis. We can look at the effects of 5t vs 1t and 5t vs fixation, etc.
We also need a secondary analysis that includes encode and probe together, to extract the waveform (e.g. similar to Duke's analysis of SIRP data, to combine the probe effect and encode effect).
Do we need to include a time derivative? Not necessary to include for block design.
To the 2nd level, group analysis: Usually the betas are what are passed to 2nd level analysis. Betas from different scanners have different ranges (particularly 3Ts vs 1.5Ts). But the grand mean scaling should take care of that? It will scale the mean but not the dynamic range within the betamap. Passing beta weights are going to show site effects regardless of subject/task differences. But if you include both the betas and standard deviations, this can be handled (acc. to Mark Vangel and Hal).
List of ROIs mentioned: Motor, DLPFC, IPS
Lunchtime discussion: Intensity of raw images from each site, without scaling--the ranges are very different. As is, this causes concerns about beta-weights. However, it appears that if the means are scaled, the variances/range would scale similarly. Lee can look at the Phase I dataset analyzed by Doug, which includes grand mean scaling to 1000, to see if that removes this issue.
Greg B's reliability analysis also found that beta's were the least reliable; percent signal change was better, for just this reason.
Doug wants to work this out without working on human data but using phantom data.
What about reliability weighting? Weighting sites that are more robust/reliable more strongly than those that were less reliable? Does this fall out of hierarchical modeling?
The QA data does give a hardware performance measure; could that be used? We have SNR or SFNR measures for each site and can use the ratios. This gives only the system noise but does not include the physiological noise; the physio. noise is hopefully larger than the system noise so the agar phantom might not be the best measure.
Weighting each subject by the residual error might be more reflective of the differences in noise.
Effect sizes: Are there statistical reasons for preferring effect sizes vs beta weights for higher order analysis? Everyone prefers effect sizes. Effect sizes might not be distributed normally, though, but there are methods for handling that.
MIND has been collecting and analyzing the same SIRP design: The Sternberg is still not a very powerful design. The effect sizes are quite small, which might lead to reliability issues. The FBIRN Auditory Oddball is extremely robust and we are collecting more than enough trials.