Mbirn: Proof that defacing algorithm leaves brain untouched
From NAMIC Wiki
Home < Mbirn: Proof that defacing algorithm leaves brain untouched
- Goals
- To share human data in compliance with federal, state and local regulations including the recently enacted Health Insurance Portability and Accountability Act of 1996 (HIPAA), it is crucial to have in place robust practices and procedures that protect the welfare of the individuals who participate in the research. These practices must include measures that ensure the privacy of the individual. One of the HIPAA-defined 18 identifiers that must be removed from any data in order for it to qualify as sharable under the “safe harbor” regulations is “Full face photographic images and any comparable images.” With the increasing resolution of morphometric scans, it has become increasingly possible to reconstruct detailed images showing facial anatomy. Thus, automated techniques to obscure or remove an individual’s facial features from structural images have become an important part of the data sharing process for large-scale, multi-site projects such as the Biomedical Informatics Research Network (BIRN). In addition to rendering a subject unidentifiable, a method must be robust, removing only nonbrain tissue, yet leaving the brain tissue intact. Further, it must have no debilitating effects on later data processing and analysis.
- Methods
- In the present study we investigated the performance of an automated defacing algorithm on image sets which differed by age and diagnosis. To help quantify the outcome, we compared images by applying MRI Watershed (HWA: Ségonne, 2004, in FreeSurfer) or Brain Surface Extractor (BSE: Sandor, 1997; Shattuck, 2001) to both defaced and nondefaced images. These were compared to two gold standards (i.e., manual stripping by two anatomists) to determine 1) similarity of results across methods; 2) sensitivity to classification of tissue as brain; and 3) ability to specify tissue as nonbrain. These skull-stripping algorithms were selected based upon their performance in a previous analysis (Fennema-Notestine, accepted; see Mbirn:_Recent_Publications for accepted version) as being fairly robust across diagnoses.
- Data
- 16 prospective datasets (4 YNC, 4EC, 4 AD, 4 DEP)
- Analyses
- The 16 contemporary datasets selected for statistical analysis were processed in four ways: 1) the original images were normalized with Non-parametric Non-uniform intensity Normalization (N3), followed by skull-stripping with HWA or 2) BSE; 3) the original images were defaced, followed by image normalization with N3, and finally skull-stripped with HWA or 4) BSE. Therefore, for each initial dataset, there were four processed datasets for subsequent analysis. Six sagittal slices, known to be problematic in skull-stripping, were then selected from the HWA-stripped and the defaced + HWA-stripped datasets and compared statistically to manually stripped images by two trained anatomists (i.e., gold standard) to determine similarity across methods as well as ability to correctly classify voxels as brain or nonbrain. Finally, whole brain analyses were conducted by using all sagittal slices in which each subject’s representative slice contained brain tissue (total slices per dataset = 86). Within this analysis, comparisons were made between skull-stripped and defaced + skull-stripped images, using either BSE or HWA as the skull-stripping method.
- Defaced images were visually inspected to determine whether the defacing mask encroached upon brain tissue. Visualization was conducted using AFNI (Cox, 1996, http://afni.nimh.nih.gov/afni/) to examine each subject’s structural image across all three planes. After visual inspection to determine if there were a loss of brain tissue, the image was rendered into a three-dimensional image and further inspected to determine if facial features were adequately removed.
- Four statistical methods chosen for both six slice and whole brain analyses were as follows:
- Set-difference. This technique examined the difference in the number of voxels left behind by skull-stripping which were removed in defacing.
- Jaccard similarity index. The Jaccard measures the degree of correspondence, or overlap, for each image slice.
- Hasudorff distance comparison. The Hausdorff examines the degree of mismatch between the contours of two image sets.
- Expectation-maximization algorithm. This algorithm calculates the maximum likelihood estimate of the underlying agreements among all methods. There are two main outcomes of this method:
- Sensitivity: This metric determines the relative frequency of correct brain classification by one method relative to all methods
- Specificity: This determines the relative frequency of correct nonbrain classification by one method relative to other methods.
- Results
- Visual Inspection. None of the contemporary defaced datasets have had brain tissue removed. Three-dimensional renderings of the defaced images indicated that identifying facial features (eyes, nose, mouth, chin), from, on average, the nasion downward.
- Six slice statistical comparison. Using the set difference comparison, 2.538 % (SD 2.572) of the voxels retained by skull-stripping were removed by defacing. Visual examination showed these retained voxels were nonbrain tissue. Descriptive analyses of the Jaccard Similarity and Hausdorff Distance methods suggest the results were similar both across methods (HWA with and without prior defacing) and across anatomists. The descriptive analyses for EM Sensitivity and Specificity were at or near ceiling for the two methods. Thus, defacing did not appreciably influence HWA in its ability to correctly classify tissue as brain or nonbrain.
- Whole brain statistical comparison. Here, the set difference comparison indicated that 2.612 % (SD 2.433) of the voxels retained by skull stripping with HWA were removed by defacing. Visual examination indicated that the retained voxels were nonbrain tissue. The subsequent analyses using the Jaccard Similarity, Hausdorff Distance, and E-M methods were conducted by comparing the automated (HWA or BSE) stripped datasets with and without prior defacing. The mean descriptives for the Hausdorff Distance and Jaccard Similarity were not appreciably different. As with the six slice analysis, the EM Sensitivity and Specificity for both HWA and BSE was at or near ceiling, indicating that for either case, defacing did not appreciably interfere with the abilities of HWA or BSE to differentiate brain from nonbrain tissue.