Comments from Bill Bug

From NAMIC Wiki
Jump to: navigation, search
Home < Comments from Bill Bug

This is a posting (Peter Lyster) which captures an emial thread from Bill Bug.

Top page of SDIWG web site

From: William Bug <William.Bug@DrexelMed.edu>

What's up with this discussion? NCBC Working Group on Scientific Ontology <http://na-mic.org/Wiki/index.php/SDIWG:_NCBC_Scientific_Ontologies> _NCBC_Scientific_Ontologies>http://www.na-mic.org/Wiki/index.php/ SDIWG:_NCBC_Scientific_Ontologies

What's this about MeSH? Oy.

MeSH is for the literature and really mustn't be thought of as something extending beyond that, unless one devotes a lot of resources to the process and even then the semantic quality of the resulting Knowledge Representation (KR) will not be very useful to scientists at the bleeding edge of knowledge in their field.

Heaven (or Leibniz) help you if you hope to take what you produce via MeSH and integrate it with other Knowledge Maps, despite the SEEMING utility of the UMLS Metathesaurus for this purpose. That is not the intended purpose of the Metathesaurus (http://www.nlm.nih.gov/pubs/factsheets/umlsmeta.html):

"(T)he Metathesaurus does not represent a comprehensive NLM-authored ontology of biomedicine or a single consistent view of the world (except at the high level of the semantic types assigned to all its concepts). The Metathesaurus preserves the many views of the world present in its source vocabularies because these different views may be useful for different tasks."

This statement implies the only "shared" view of the world resides in the UMLS Semantic Network. Given the coarse granularity of the SN, and it's lack of ontological formality and domain orthogonality, it cannot be relied on to support Knowledge Map integration, again, unless one develops custom algorithms and has a means to provide QA on the results. This can be done with a modicum of success for isolated applications of limited scope (both breadth & depth of semantic specification), but not for the sort of field-wide, large scale integration which is becoming an ever-increasing need in all branches of biomedical informatics.

The process by which NLM indexers apply MeSH terms (or perform QA on algorithmically applied MeSH terms) is quite complex. Using all the qualifiers and other detailed "contextual" subtleties MeSH can provide is not for the faint of heart. It is also full of granularity unevenness, cycles, and, even when applied by a deft expert who knows how to use it, can lead to irresolvable ambiguities - at least irresolvable by an algorithm.

I feel like we keep going over this same argument.

After Olivier Bodenreider gave a nice general talk describing the UMLS for the W3C Semantic Web HCLS Interest Group BioRDF Tcon, he - and I in follow-up discussions - had to repeatedly stress (and this goes for MeSH, too, of course), UMLS had/has particular Use Cases and related design goals. "Buyer beware" should you try to use it as an ontology or for other purposes for which it was not intended.

Many seem not to understand the fundamental differences between these knowledge frameworks, let alone recognize how what Barry Smith, Suzi Lewis, Michael Ashburner, and others are trying to do via the OBO Foundry is distinctly different than what has typically been done in the past.

I keep trying to stress by fixing on a single set of relations and reference ontologies, and strongly encouraging their use (e.g., the OBO Foundry Principles [1]), the task of creating more complex coordinations (cross-products as the GO folk refer to them) for higher level domains such as neurodevelopment, immunology, or disease, will be much more tractable, as the foundational universals will be explicitly identical and clearly defined.

The same can be said for the task of integrating knowledge maps/ association files created for disparate data repositories - e.g., inter-relating knowledge maps we create to data in BIRN to those we'd like to link to within the NCI caBIG initiative.

Am I wrong - or being too simplistic - when I make this assertion?

This is not to imply distinct modalities of knowledge extraction (KE) & KR – e.g., formal ontological frameworks, domain-specific lexicons, NLP, voxel-based object segmentation (for bioimages), set theory-based classification methods, statistical analysis of pooled data in general – can’t be combined synergistically to great effect. What I think needs to be kept clear in order to productively pursue such a holistic approach is to be very clear about to proper use and limits of each approach and clearly define the points of effective overlap amongst the techniques.

I certainly hope the NCBC Ontology WG doesn't end up wasting time plodding through these same arguments yet again.

I do think it would be very valuable for this WG to establish a standard operating procedure for handling issues related to use of biomedical knowledge resources. Even doing so for the limited scope of NCBC collaboration would likely be valuable and extend to the larger community. I know many are looking to NCBO for this sort of support.

To my mind, the 25 slides prepared by Suzi Lewis & Michael Ashburner (slides 11 - 37) in the following presentation summing up the hard won lessons of the GO Consortium do a better job at establishing such guidelines than anything else I've seen anywhere:

Principles for Building Biomedical Ontologies http://www.geneontology.org/teaching_resources/presentations/2005-10_ISMB_Ontology-Building_various.ppt

I'd specifically draw attention to the procedural flowchart on slide 19 as being an excellent foundation from which to build more specific guidelines, some of which can be culled from other portions of the presentation and further refined through follow-on discussions within this working group.

Comment by Zak

I consider myself a card carrying formal ontology builder from the hoary old 1980's so I would like to agree with the above. However, I see pragmatics that would suggest that for those domains where we do not have the richness of annotation of a MeSH, waiting for a 'proper' ontology to be built with that domain coverage may be the wrong strategy for the NCBC's Also, there are plenty of representational problems even using "modern" representation languages (see a classical review by Szolovits et al.) . As we discussed in our TCON, there are three categories of Terminologies and Ontologies that we wish to identify:


1. Those that we wish to endorse today for general NCBC use (the "States 8" )

2. Those that we wish were better but will 'hold our nose' and use because they are what are available now.

3. Those that are in progress, being built by others that we should cheer on (e.g. GO) or that are needed and have no current advocates/builders.


MeSH could very well fall into the second category unless we have a good alternative for the concepts covered by MeSH in the very short term (i.e. for use by NCBC's today).