Projects:NonParametricClustering
Back to NA-MIC Collaborations, Georgia Tech Algorithms
Non Parametric Clustering for Biomolecular Structural Analysis
High accuracy imaging and image processing techniques allow for collecting structural information of biomolecules with atomistic accuracy. Direct interpretation of the dynamics and the functionality of these structures with physical models, is yet to be developed. Clustering of molecular conformations into classes seems to be the first stage in recovering the formation and the functionality of these molecules. The lack of prior knowledge such as number and shape of the clusters in the data space can be resolved most efficiently by non parametric clustering methods. We are currently developing an application based on a Potts model method that was proposed by Blatt Wiseman and Domany to deal with biomolecules structure.
Description
The purpose of this work is to adapt a non-parametric clustering algorithm for data mining of RNA structures. One of the main challenges of bioinformatics is to develop data mining tools for the available RNA structures from data banks in order to establish structure-function relationship. To do so a coherent objective classification method is required. To test such methods we are currently analyzing the conformational data space of single and double nucleotides only (Figure 1).
Our method of choice for the clustering of the data space is based on a physical Potts model. The N points of our dataset are referred as magnetic sites and are assigned Potts spins. These spins take one of q integer values. Interaction term that is proportional to the distance between nearest neighbor’s data points is added to the model. The spin configuration of our model is dependent on a parameter T that physically corresponds to a temperature. Such Potts systems are known to form a phase with island of similar Potts state (similar magnetic state.) Revealing the clusters in the data space is converted into Monte Carlo search for the magnetic islands in the equivalent physical model. While this method is slow comparing to other non parametric hierarchical methods. It is by far superior in robustness and its classification is more coherent due to its physical interpretation.
Project status
Thus far we have applied the method to classify single nucleotide conformation. Comparison of the resulting clustering with previous prior knowledge based K mean algorithm reveals an excellent match (Fig 2). We have also reconstructed with high fidelity the consensus base pair classification (Fig 3). At the current stage we are developing classification nomenclature for base stacking, an interaction that have not been given an adequate physical model nor been classified.
Project aim
A variant of the Potts model classification can be used to find clustering in network of interactions between molecules. We plan to use the Potts model with results from projects of polymers adsorption that we are currently working on to develop model for docking interactions between polymers and of polymer with surfaces.
Key Investigators
- Georgia Tech: E. Hershkovits, X. Le Faucheur, R. Tannenbaum and A. Tannenbaum
Publications
In press
- X. Le Faucheur, E. Hershkovits, R. Tannenbaum and A. Tannenbaum. Non-Parametric Clustering for studying RNA conformation. Publication in submission.
- E. Hershkovits, A. Tannenbaum, and R. Tannenbaum. Adsorption of Block Copolymers from Selective Solvents on Curved Surfaces. To be published in Macromolecules. 2008.
- E. Hershkovits, A. Tannenbaum, and R. Tannenbaum. Scaling Aspects of Block Co-Polymer Adsorption on Curved Surfaces from Nonselective Solvents. To be published in Phys Chem. 2008.
- E. Hershkovits, A. Tannenbaum, and R. Tannenbaum. Polymers Adsorption on Curved Surfaces A Geometric Approach. J. Chem. Phys B. 2007. 111 12369-12375.
- E. Hershkovits, G. Sapiro, A. Tannenbaum and L. Williams. Statistical Analysis of RNA Backbone. IEEE/ACM Trans. Comp. Biol. 2006 3 33-46.