FBIRN:NeuroInfStatsTuesAM2006
From NAMIC Wiki
Home < FBIRN:NeuroInfStatsTuesAM2006
Data Mining
- History of Data Mining
- 1960's Tukey
- Predictive Modeling (CART)
- Machine Learning (CS)
- Relational Databases
- Web
- Hardware
- Convergence of above technologies has lead to data mining
- There is a lot of hype regarding data mining
- FBIRN has a small N in the data mining sense
- Clustering, look for subgroups in the schizophrenia patients
- fBIRN data has a rich structure
- Focused on data analysis right now, do algorithm comparisons
- Make sure that the raw spatial data, timecourse data is available
- Standardized data for data analyses
- MRI Machine --> "raw data" --> intermediate data --> Beta-maps, etc
- inter-subject issues in preprocessing
- MRI Machine --> "raw data" --> intermediate data --> Beta-maps, etc
- data analysis recommendations
- give someting standard and consistent across sites
- should data vetting be performed, runs or subjects tossed based on results of first level analysis
- time series of variance, auto-correlation
- having standard datasets available so other disciplines (e.g., computer scientists, statisticians) can look at the data
- proposal, FIPS 1.0 produces standard results (beta-maps) on phaseII, then spend 6 months improving the pipeline
- make data available after each major preprocessing step
- after pre-whitening/smoothing
- after motion correction (would need to modify FIPS/FEAT)
- after slice timing correction (would need to modify FIPS/FEAT)
- currently FIPS final products create the output of FEAT
- phaseII
- uncompressed raw data is just over 1Gb/subject visit
- intermediate time series 20Gb/subject visit
- may be problematic to store all of the intermediate steps, due to disk space and downloading times
- what can we do for every subject?
- what can we do on a subset of subjects?
- FIAC
- standard data products
- 1st level
- raw
- motion corrected
- motion parameters
- slice-time corrected
- meta-data, details of the paradigm
- best practices, what we think the design matrix should be
- what happened at what time, how long the blocks were, etc.
- 2nd level
- constrast copes in standard space
- varcopes in standard space
- meta-data,
- what effect
- degrees of freedom
- smoothness
- T image
- threshold
- Standard Data Products Plan
- Establish a working group
- Establish the requirements of what is released besides raw data and when
- QA reports on raw and on intermediate data products
- Data QA
- Variance of image over time series
- Variacne oc (Xt-Xt+1) over time series
- scale median to 100
- Good for residuals
- (Outlier count per image / Expected outlier count ) * 100 for each image yields a time series
- Normality test at each voxel, look at how many p values smaller than 0.05
- (Number of significant voxels / Expected significant voxels) * 100