FBIRN:Ontology Notes March 2005
Ontology experiences and status for the FBIRN: J. Turner, March 2005
The HID is designed with “concept ID” and “ontology” fields in several tables, notably: the clinical assessment table and if desired each item can be tagged with its own concept; and at the level of the fMRI session (e.g., sensorimotor, working memory scan), as well as for each condition or event that occurs within a scan. We will also need the demographic data to be tagged.
The main goal of this, from my point of view, is to have a user be able to sit in front of the Mediator interface and input a query, and the Mediator will know if and where in each database the relevant information can be found. All characteristics that might be relevant to understanding the data need to be interpreted at some level. Similarly and an even bigger challenge is the best way to store derived results in the database, and how that might be interpreted for a query (e.g., find me all single-subject datasets that showed activation in the STG).
Clinical assessments: For the Phase I data, for example, we need to identify the relevant ontology and concept ID from that ontology for the following clinical assessments:
- Beck Depression Inventory (BDI)
- North American Adult Reading Test (NAART)
- Self Administered Anxiety Scale (SAS)
- Structured Clinical Interview for Diagnosis, Non-patient (SCID-NP)
Issues this raises: Should the BDI itself be tagged with the same concept ID as all depression scales? Or should there be a special concept ID for BDI which is linked to the concept ID for depression scales? (and similarly for all the scales)—how will the mediator search? Can a user ask for either BDI specifically, or for depression scales of any sort?
Similarly for versions: In the database we have a way to store the version of the scale, since different experiments will use different versions—does that need to be represented at the level of the ontology?
Demographics: We need to be able to find the tables in the database which contain the subject demographic data: * Age * Gender * Handedness * Diagnosis * Assessments acquired (?? Maybe not, but this might speed up the search)
That way we can create a query to find all male left-handed schizophrenics under the age of 23 who completed the SANS scale, for example. (Where in the HID does it need to be encoded that Gender = 1 means male or female, though, or that Age is in months or years?)
Cognitive task descriptions: For the imaging data, we need to be able to find in all databases the following information:
The description of the experiment:
- The kind of functional cognitive task it was (sensorimotor, working memory, etc.)
- The conditions used (auditory tones? What frequency? Visual stimuli? What sort? Rest condition? With or without responses?)
Issues this raises: Again there’s the question of the level of description—how much of this is in the XML header for the data? And if so, does that mean some of the detail doesn’t need to be explicitly identified in the DB?
Will calling it a “visual working memory” scan do, or do we need to have a concept just for visual working memory scans with a number as the item to be remembered? Do we need to distinguish between “sensorimotor tasks” and “multi-modal sensorimotor tasks”, e.g., to distinguish between the sensorimotor task we used in Phase I and the version we are using in Phase II (which does not have an auditory component)?
Do we want to somehow link the concept to the concepts for the cortical or subcortical regions we think are involved with the task? (How would that benefit the user doing the mediated query?)
Much of this will build off of what comes out of the BrainMap/fMRIDC/FBIRN interactions first, since they have already put a lot of thought into this level, and into the basic description of subject groups (age range, gender, handedness, etc.). They also have had a lot of experience with people searching their databases and the usability of their filters. They both tend to describe experiments at the experiment level, and not on a subject by subject level; but the variables they identify as useful will be good for us to use as well. They either have or we can build something of a taxonomy (if not an ontology) of these terms and relationships among cognitive tasks and conditions, based on what we already have as a larger group. Linking these taxonomies appropriately into the HID, however, may be another issue.
Imaging Parameters: The above will help the user of the federated database system find the datasets they might be interested in, e.g., working memory fMRI data on Alzheimer’s patients of particular sorts. Once those datasets are identified, the assumption at the moment is that they will be downloaded for offline analysis (or analyzed over the grid). Either way, we need a complete description of the scanning methods and parameters for each dataset.
The below is also what is in the XML header, so does this need to be part of the DB querying process? Or can we leverage the XML standards? If we limit users so they can’t search on TR, for example, or include spiral vs linear EPI scans in their queries, then it would seem these parameters do not need to be included in the database per se—once the data are retrieved they will be present in the XML header and can be filtered at that point…?
The description of the scanning parameters: * Type of scan: structural, functional * If structural, what kind of scan: SPGR? Other? * If functional: Transversal of k-space: Linear? Spiral? Other? ** TR ** TE ** Number of slices? (whole brain or single-slab?) ** Slice thickness/gap thickness ** Slice acquisition order (interleaved or serial)
Derived fMRI data: The way the FBIRN and the Transdisciplinary Imaging Genetics Center and the National Alliance for Medical Image Computing are working together, it is most likely that mean activations or some such summary data for various cortical regions (ROIs) may be stored as a result of single-subject analysis. That way, the activation in various cortical areas can be summarized (which will be great for the large and consistently analyzed datasets that FBIRN is collecting). Data mining and other techniques then can be applied to look for differences in activation across pre-defined sites (such as the DLPFC, angular gyrus, etc.).
This is going to be a big task. The pipeline for consistent analysis is in good shape, but automatically identifying these regions on a subject by subject basis is still in development. In the short-term, 6-8 months future, users will probably download the data or analyses and extract the ROI results using their preferred methods, for clustering or other searches. In the long term, however, that will become infeasible and the databases will have to be made interoperable with standard datamining software.
This is where the neuroanatomy ontologies come in, for the FBIRN. We will need to know what the ROI is and which naming scheme it came from (e.g., a Brodmann’s area, or a sulcal/gyral area, etc.). We’ll need to know how it was defined (Talairach atlas? MNI atlas? LONI atlas? Or subject-specific regions?) and what the statistic is. Again, much of that will be captured by the data provenance methods (I hope) but whether that is explicitly in the DB or in XML headers somewhere is another question.
I have undoubtedly blurred the lines between the HID and the SRB, the query and the ontology, but this is how I’m thinking of it at the moment.
I do not know what other ontologies are out there that might be relevant to this, other than what Brain Map and fMRI DC have been working on.