Difference between revisions of "SDIWG:NCBC Software Classification MAGNet Examples"

From NAMIC Wiki
Jump to: navigation, search
 
(4 intermediate revisions by the same user not shown)
Line 70: Line 70:
 
* '''URL''': http://trantor.bioc.columbia.edu/grasp2
 
* '''URL''': http://trantor.bioc.columbia.edu/grasp2
 
* '''Organization''': MAGNet
 
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': SoftwareFunction --> Protein Modeling and Classification --> Molecular Visualization Package
+
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Molecular Visualization Package
  
  
Line 133: Line 133:
  
 
===MEDUSA and Gorgon===
 
===MEDUSA and Gorgon===
* '''Description''': MEDUSA is an algorithm for learning predictive models of transcriptional gene regulation from gene expression and promoter sequence data.  By using a statistical learning approach based on boosing, MEDUSA learns cis regulatory motifs, condition-specific regulators, and regulatory programs that predict the differential expression of target genes.  The regulatory program is specified as an alternating decision tree (ADT).  The Java implementation of MEDUSA will allow a number of visualizations of the regulatory program and other inferred regulatory information, implemented in the accompanying Gorgon tool, including hits of significant and condition-specific motifs along the promoter sequences of target genes and regulatory network figures viewable in Cytoscape.       
+
* '''Description''': MEDUSA is an algorithm for learning predictive models of transcriptional gene regulation from gene expression and promoter sequence data.  By using a statistical learning approach based on boosting, MEDUSA learns cis regulatory motifs, condition-specific regulators, and regulatory programs that predict the differential expression of target genes.  The regulatory program is specified as an alternating decision tree (ADT).  The Java implementation of MEDUSA allows a number of visualizations of the regulatory program and other inferred regulatory information, implemented in the accompanying Gorgon tool, including hits of significant and condition-specific motifs along the promoter sequences of target genes and regulatory network figures viewable in Cytoscape.       
 
* '''Data Input''': Discretized (up/down/baseline) gene expression data in plain text format, promoter sequences in FASTA format, list of candidate transcriptional regulators and signal transducers in plain text format.
 
* '''Data Input''': Discretized (up/down/baseline) gene expression data in plain text format, promoter sequences in FASTA format, list of candidate transcriptional regulators and signal transducers in plain text format.
 
* '''Data Output''': Regulatory program represented as a Java serialized object file readable by Gorgon and as a human readable XML file.  Gorgon currently generates views of learned PSSMs, positional hits along promoter sequences, and views of the ADT as HTML files, and generates network figures as Cytoscape format files.
 
* '''Data Output''': Regulatory program represented as a Java serialized object file readable by Gorgon and as a human readable XML file.  Gorgon currently generates views of learned PSSMs, positional hits along promoter sequences, and views of the ADT as HTML files, and generates network figures as Cytoscape format files.
Line 144: Line 144:
 
* '''URL''': http://www.cs.columbia.edu/compbio/medusa (MATLAB),http://compbio.sytes.net:8090/medusa (Java beta version)
 
* '''URL''': http://www.cs.columbia.edu/compbio/medusa (MATLAB),http://compbio.sytes.net:8090/medusa (Java beta version)
 
* '''Organization''': MAGNet
 
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':
+
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Genomic & Phenotypic Analysis --> Regulatory/Signaling network reconstruction
  
 
===String kernel package===
 
===String kernel package===
Line 158: Line 158:
 
* '''URL''': http://www.cs.columbia.edu/compbio/string-kernels
 
* '''URL''': http://www.cs.columbia.edu/compbio/string-kernels
 
* '''Organization''': MAGNet
 
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':
+
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification
  
 
===MatrixREDUCE===
 
===MatrixREDUCE===
Line 172: Line 172:
 
* '''URL''': http://www.bussemakerlab.org/software/MatrixREDUCE
 
* '''URL''': http://www.bussemakerlab.org/software/MatrixREDUCE
 
* '''Organization''': MAGNet
 
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':
+
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Genomic & Phenotypic Analysis --> Regulatory/Signaling network reconstruction
  
  
Line 187: Line 187:
 
* '''URL''': http://www.t-profiler.org
 
* '''URL''': http://www.t-profiler.org
 
* '''Organization''': MAGNet
 
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':
+
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Genomic & Phenotypic Analysis --> Network characterization
  
 
===TranscriptionDetector===
 
===TranscriptionDetector===
Line 213: Line 213:
 
* '''License''':  n/a
 
* '''License''':  n/a
 
* '''Keywords''': Phenotypic integration, computational phenotypes  
 
* '''Keywords''': Phenotypic integration, computational phenotypes  
* '''URL''': www.phenogo.org
+
* '''URL''': http://www.phenogo.org
 
* '''Organization''': MAGNet
 
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':
+
* '''NCBC Ontology Classification''': Biotool --> Data Management --> Information retrieval, traversal and querying; Atomic --> SoftwareFunction --> Natural Language Processing
  
 
===MINDY===
 
===MINDY===
Line 244: Line 244:
 
* '''Organization''': MAGNet
 
* '''Organization''': MAGNet
 
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Interaction Modeling
 
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Interaction Modeling
 +
 +
===ARACNE===
 +
* '''Description''': ARACNE is an algorithm for inferring gene regulatory networks from a set of microarray experiments. The method uses mutual information to identify genes that are co-expressed and then applies the data processing inequality to filter out interactions that are likely to be indirect.
 +
* '''Data Input''': Text file containing measurements from a set of microarray experiments.
 +
* '''Data Output''':  Text file containing predicted interactions.
 +
* '''Implementation Language''': C++, Java
 +
* '''Version, Date, Stage''': Version 1, June, 2006
 +
* '''Authors''':  Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla Favera R, Califano A.
 +
* '''Platforms Tested''': Window, Linux
 +
* '''License''':  Open source
 +
* '''Keywords''': Reverse engineering, mutual information, genetic networks, microarray
 +
* '''URL''': http://amdec-bioinfo.cu-genome.org/html/ARACNE.htm
 +
* '''Organization''': MAGNet
 +
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Genomic & Phenotypic Analysis --> Regulatory/Signaling network reconstruction
 +
 +
 +
===geWorkbench===
 +
* '''Description''': geWorkbench is a Java application that provides users with an integrated suite of genomics tools. It is built on an open-source, extensible architecture that promotes interoperability and simplifies the development of new as well as the incorporation of pre-existing components. The resulting system provides seamless access to a multitude of both local and remote data and computational services through an integrated environment that offers a unified user experience. Over 50 data analysis and visualization components have been developed for the framework, covering a wide range of genomics domains including gene expression, sequence, structure and network data.
 +
* '''Data Input''': Gene epxression data (Affy, GenPix, RMA), Sequence (FASTA), Structure (PDB).
 +
* '''Data Output''': Analysis results (multiple formats).
 +
* '''Implementation Language''': Java
 +
* '''Version, Date, Stage''': 1.0.5, 3/23/07, stable production release
 +
* '''Authors''':  A. Califano, A. Floratos. M. Kustagi, K. Smith, J. Watkinson, M. Hall, K. Keshav, X. Zhang, K. Kushal, B. Jagla, E. Daly, M. VanGinhoven, P. Morozov.
 +
* '''Platforms Tested''': Windows XP, Linux, Mac OS 10.x.
 +
* '''License''': Free.
 +
* '''Keywords''': Analysis suite, gene expression analysis, sequence analysis, network reconstruction, structure predcition, visualization.
 +
* '''URL''': http://www.geworkbench.org
 +
* '''Organization''': MAGNet
 +
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Genomic & Phenotypic Analysis; Atomic --> SoftwareFunction --> Interaction Modeling; Atomic --> SoftwareFunction --> Protein Modeling and Classification; Atomic --> SoftwareFunction --> Software Engineering and Development Tool --> Integration --> Resource Integration Components; Atomic --> SoftwareFunction --> Software Engineering and Development Tool --> Integration --> Grid Computing Resources; Atomic --> SoftwareFunction --> Visualization

Latest revision as of 04:52, 15 May 2007

Home < SDIWG:NCBC Software Classification MAGNet Examples

DelPhi

  • Description: DelPhi provides numerical solutions to the Poisson-Boltzmann equation (both linear and nonlinear form) for molecules of arbitrary shape and charge distribution. The current version is fast (the best relaxation parameter is estimated at run time), accurate (calculation of the electrostatic free energy is less dependent on the resolution of the lattice) and can handle extremely high lattice dimensions. It also includes flexible features for assigning different dielectric constants to different regions of space and treating systems containing mixed salt solutions.
  • Data Input: DelPhi takes as input a coordinate file format of a molecule or equivalent data for geometrical objects and/or charge distributions
  • Data Output: electrostatic potential in and around the system
  • Implementation Language: Fortran and C
  • Version, Date, Stage: Stable public release
  • Authors: E.Alexov, R.Fine, M.K.Gilson, A.Nicholls, W.Rocchia, K.Sharp, and B. Honig.
  • Platforms Tested: Unix-SGI IRIX, linux, PC (requires Fortran and C compilers), AIX IBM version and Mac.
  • License: Freely available to academia; pay model for commercial users.
  • Keywords: Finite Difference Poisson-Boltzman Solver
  • URL: http://trantor.bioc.columbia.edu/delphi
  • Organization: MAGNet
  • NCBC Ontology Classification: Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Numerical Calculation of Electrostatic Potential


GRASP

  • Description: A molecular visualization and analysis program. It is particularly useful for the display and manipulation of the surfaces of molecules and their electrostatic properties.
  • Data Input: PDB files, potential maps from DelPhi
  • Data Output: molecular graphics.
  • Implementation Language: Fortran
  • Version, Date, Stage: v1.3.6 .Stable public release.
  • Authors: Anthony Nicholls and Barry Honig.
  • Platforms Tested: SGI machines: irix 5.x and 6.x (INDYs, INDIGOs including Impact, Octane and O2) systems.
  • License: Freely available to academia.
  • Keywords: molecular visualization
  • URL: http://trantor.bioc.columbia.edu/grasp
  • Organization: MAGNet
  • NCBC Ontology Classification: Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Molecular Visualization Package

Nest

  • Description: Modeling protein structure based on a sequence-template alignment. The current server works only for modeling with a single template. Part of jackal, which can be downloaded.
  • Data Input: pir and PDB files
  • Data Output:
  • Implementation Language: C++
  • Version, Date, Stage: Stable public release.
  • Authors: Xiang, Z. and Honig, B.
  • Platforms Tested: platform independent (web based tool)
  • License: Freely available to academia.
  • Keywords: modeling, protein structure, sequence-template alignment.
  • URL: http://honiglab.cpmc.columbia.edu/cgi-bin/jackal/nest.cgi
  • Organization: MAGNet
  • NCBC Ontology Classification: Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Homology Modeling


JACKAL

  • Description: Jackal is a collection of programs designed for the modeling and analysis of protein structures. Its core program is a versatile homology modeling package nest. JACKAL has the following capabilities: 1) comparative modeling based on single, composite or multiple templates; 2) side-chain prediction; 3) modeling residue mutation, insertion or deletion; 4) loop prediction; 5) structure refinement; 6) reconstruction of protein missing atoms;7) reconstruction of protein missing residues; 8) prediction of hydrogen atoms; 9) fast calculation of solvent accessible surface area; 10) structure superimposition.
  • Data Input:
  • Data Output:
  • Implementation Language: C++
  • Version, Date, Stage: Version: 1.5 as of Oct, 20, 2002, Stable public release.
  • Authors: Z. Xiang and B. Honig
  • Platforms Tested: SGI 6.5, Intel Linux and Sun solaris
  • License: Freely available to academia.
  • Keywords: Protein Structure Modeling
  • URL: http://trantor.bioc.columbia.edu/programs/jackal
  • Organization: MAGNet
  • NCBC Ontology Classification: Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Prediction of Side-chain Conformations


GRASP2

  • Description: GRASP2 is an updated version of the GRASP program used for macromolecular structure and surface visualization, contains a large number of new features and scientific tools: Enhanced GUI; Structure alignment and domain database scanning; A gaussian surface generator and new surface coloring schemes; Sequence visualization and alignment; Completed work can be stored in "project files; Among the many objects that can be stored in a project file are views of the structure; defined subsets, surfaces; Direct printing to printers at full printer resolution.
  • Data Input: PDB files, potential maps from DelPhi, sequence alignments.
  • Data Output: molecular graphics, structural alignments.
  • Implementation Language: C++
  • Version, Date, Stage: Stable public release
  • Authors: Donald Petrey and Barry Honig.
  • Platforms Tested: Windows, Linux
  • License: Freely available to academia.
  • Keywords: molecular visualization
  • URL: http://trantor.bioc.columbia.edu/grasp2
  • Organization: MAGNet
  • NCBC Ontology Classification: Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Molecular Visualization Package


PrISM

  • Description: PrISM is an integrated computational system where computational tools are implemented for protein sequence and structure analysis and modeling.
  • Data Input:
  • Data Output:
  • Implementation Language: Fortran
  • Version, Date, Stage: Stable public release
  • Authors: Wang, L, Yang, A. S. & Honig, B.
  • Platforms Tested:SGI-irix, Intel-linux
  • License: Freely available to academia.
  • Keywords: protein analysis/modeling
  • URL: http://trantor.bioc.columbia.edu/programs/PrISM/
  • Organization: MAGNet
  • NCBC Ontology Classification: Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Homology Modeling


Protein-DNA interface alignment

  • Description: The protein-DNA alignment software allows one to align the interfacial amino acids from two protein-DNA complexes based on the geometric relationship of each amino acid to its local DNA.
  • Data Input: two PDB files that both contain protein-DNA complexes
  • Data Output: The programs will output the aligned residues and their corresponding residue-residue similarity scores, s(i,j).
  • Implementation Language: C++ and Perl
  • Version, Date, Stage:Stable public release.
  • Authors: Siggers, T.W., Silkov, A & Honig, B.
  • Platforms Tested: Linux
  • License: Freely available to academia
  • Keywords: protein-DNA interface
  • URL: http://trantor.bioc.columbia.edu/programs/intfc_aln
  • Organization: MAGNet
  • NCBC Ontology Classification: Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Prediction of Side-chain Conformations

SURFace

  • Description: SURFace algorithms are programs that calculate solvent accessible surface area and curvature corrected solvent accessible surface area
  • Data Input:
  • Data Output:
  • Implementation Language:
  • Version, Date, Stage: Stable public release.
  • Authors: Nicholls, A., Sharp, K., Sridharan, S. and Honig, B.
  • Platforms Tested: SGI
  • License: Freely available to academia.
  • Keywords: solvent accessible surface area
  • URL: http://trantor.bioc.columbia.edu/surf/
  • Organization: MAGNet
  • NCBC Ontology Classification: Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Caculation of Solvent Accessible Area


Target Explorer

  • Description: Automated process of prediction of complex regulatory elements for specified set of transcription factors in Drosophila melanogaster genome. Target Explorer is a complex tool with user-friendly self-explanatory Web-interface that allows to user: 1. create customized library of TF binding site matrices based on user defined sets of training sequences; 2. search for new clusters of binding sites for specified set of TFs; 3.extract annotation for potential target genes.
  • Data Input: genomic sequences
  • Data Output: clusters of known binding sites
  • Implementation Language: perl, cgi
  • Version, Date, Stage: Stable public release.
  • Authors: Sosinsky A, Bonin CP, Mann RS, Honig B.
  • Platforms Tested: platform independent (web based tool)
  • License: Freely available to academia.
  • Keywords: prediction of binding sites for transcription factors
  • URL: http://trantor.bioc.columbia.edu/Target_Explorer/
  • Organization: MAGNet
  • NCBC Ontology Classification: Atomic --> SoftwareFunction --> Genomic & Phenotypic Analysis--> Sequence Annotation


MEDUSA and Gorgon

  • Description: MEDUSA is an algorithm for learning predictive models of transcriptional gene regulation from gene expression and promoter sequence data. By using a statistical learning approach based on boosting, MEDUSA learns cis regulatory motifs, condition-specific regulators, and regulatory programs that predict the differential expression of target genes. The regulatory program is specified as an alternating decision tree (ADT). The Java implementation of MEDUSA allows a number of visualizations of the regulatory program and other inferred regulatory information, implemented in the accompanying Gorgon tool, including hits of significant and condition-specific motifs along the promoter sequences of target genes and regulatory network figures viewable in Cytoscape.
  • Data Input: Discretized (up/down/baseline) gene expression data in plain text format, promoter sequences in FASTA format, list of candidate transcriptional regulators and signal transducers in plain text format.
  • Data Output: Regulatory program represented as a Java serialized object file readable by Gorgon and as a human readable XML file. Gorgon currently generates views of learned PSSMs, positional hits along promoter sequences, and views of the ADT as HTML files, and generates network figures as Cytoscape format files.
  • Implementation Language: Java (prototyped in MATLAB)
  • Version, Date, Stage: Version 2.0, July 2006, pre-release beta version; Version 1.0 (MATLAB), April 2005, stable public release
  • Authors: David Quigley, Manuel Middendorf, Steve Lianoglou, Anshul Kundaje, Yoav Freund, Chris Wiggins, Christina Leslie
  • Platforms Tested: Windows, Linux, Mac OS X
  • License: Open source
  • Keywords:
  • URL: http://www.cs.columbia.edu/compbio/medusa (MATLAB),http://compbio.sytes.net:8090/medusa (Java beta version)
  • Organization: MAGNet
  • NCBC Ontology Classification: Atomic --> SoftwareFunction --> Genomic & Phenotypic Analysis --> Regulatory/Signaling network reconstruction

String kernel package

  • Description:The string kernel package contains implementations for the mismatch and profile string kernels for use with support vector machine (SVM) classifiers for protein sequence classification. Both kernels compute similarity between protein sequences based on common occurrences of k-length subsequences ("k-mers") counted with substitutions. Kernel functions for protein sequence data enable the training of SVMs for a range of prediction problems, in particular protein structural class prediction and remote homology detection. A version of the Spider MATLAB machine learning package is also bundled with the code, which allows users to train SVMs and evaluate performance on test sets with the packaged software.
  • Data Input: The mismatch kernel requires sequence data in FASTA format. The profile string kernel uses probabilistic profiles, such as those produced by PSI-BLAST, in place of the original sequences. The Spider SVM implementation requires both the kernel matrix and a label file of binary or multi-class labels for the training data; this data must be loaded into MATLAB variables before using Spider routing.
  • Data Output:The kernel code produces a kernel matrix for the input data in tab-delimited text format. The Spider package trains SVMs and stores the learns classifier and results from applying the classifier on test data as MATLAB objects.
  • Implementation Language: String kernel code is implemented in C. Spider is a set of object-oriented MATLAB routines.
  • Version, Date, Stage: Version 1.2, September 2004, stable public release
  • Authors: Eleazar Eskin, Rui Kuang, Eugene Ie, Ke Wang, Jason Weston, Bill Noble, Christina Leslie
  • Platforms Tested: Windows, Linux
  • License: Open source
  • Keywords:
  • URL: http://www.cs.columbia.edu/compbio/string-kernels
  • Organization: MAGNet
  • NCBC Ontology Classification: Atomic --> SoftwareFunction --> Protein Modeling and Classification

MatrixREDUCE

  • Description: Regulation of gene expression by a transcription factor requires physical interaction between the factor and the DNA, which can be described by astatistical mechanical model. Based on this model, the MatrixREDUCE algorithm uses genome-wide occupancy data for a transcription factor (e.g.ChIP-chip or mRNA expression data) and associated nucleotide sequences to discover the sequence-specific binding affinity of the transcription factor. The sequence specificity of the transcription factor's DNA-binding domain is modeled using a position-specific affinity matrix (PSAM), representing the change in the binding affinity (Kd) whenever a specific position within a reference binding sequence is mutated. The PSAM can be transformed into affinity logo for visualization using the utility program AffinityLogo, and a MatrixREDUCE run can be summarized in an easy-to-navigate webpage using HTMLSummary.
  • Data Input: sequence file in FASTA format; and expression data file in tab-delimited text format.
  • Data Output: PSAMs in numeric and graphical format, parameters of the fitted model, and an HTML summary page.
  • Implementation Language: ANSI C, making use of Numerical Recipes routines.
  • Version, Date, Stage: Version 1.0, July 10, 2006, extensively tested in lab.
  • Authors: Barrett Foat, Xiang-Jun Lu, Harmen J. Bussemaker
  • Platforms Tested: Linux, Cygwin (Windows), Mac OS X
  • License:
  • Keywords: position-specific affinity matrix, binding affinity, cis-regulatory element, expression data, ChIP-chip, transcription factor
  • URL: http://www.bussemakerlab.org/software/MatrixREDUCE
  • Organization: MAGNet
  • NCBC Ontology Classification: Atomic --> SoftwareFunction --> Genomic & Phenotypic Analysis --> Regulatory/Signaling network reconstruction


T-profiler

  • Description: T-profiler is a web-based tool that uses the t-test to score changes in the average activity of pre-defined groups of genes. The gene groups are defined based on Gene Ontology categorization, ChIP-chip experiments, upstream matches to a consensus transcription factor binding motif, and location on the same chromosome, respectively. If desired, an iterative procedure can be used to select a single, optimal representative from sets of overlapping gene groups. A jack-knife procedure is used to make calculations more robust against outliers. T-profiler makes it possible to interpret microarray data in a way that is both intuitive and statistically rigorous, without the need to combine experiments or choose parameters.
  • Data Input: Currently, gene expression data from Saccharomyces cerevisiae and Candida albicans are supported.
  • Data Output:
  • Implementation Language: T-profiler is written in PHP, data is managed by a MYSQL database server
  • Version, Date, Stage:
  • Authors: André Boorsma, Barrett C. Foat, Daniel Vis, Frans Klis, Harmen J. Bussemaker
  • Platforms Tested: Web-based application
  • License:
  • Keywords: gene expression, transcriptome, ChIP-chip, Gene Ontology
  • URL: http://www.t-profiler.org
  • Organization: MAGNet
  • NCBC Ontology Classification: Atomic --> SoftwareFunction --> Genomic & Phenotypic Analysis --> Network characterization

TranscriptionDetector

  • Description:A tool for finding probes measuring significantly expressed loci in a genomic array experiment. Given expression data from some tiling array experiment, TranscriptionDetector decides the likelihood that a probe is detecting transcription from the locus in which it resides. Probabilities are assigned by making use of a background signal intensity distribution from a set of negative control probes. This tool is useful for the functional annotation of genomes as it allows for the discovery of novel transcriptional units independently of any genomic annotation.
  • Data Input: Expression data (GEO or other platforms) and designation of which probes represent negative controls and which are data probes.
  • Data Output: A text file with a list of probes corresponding to significantly expressed loci.
  • Implementation Language: ANSI C, making use of GSL.
  • Version, Date, Stage:
  • Authors: Xiang-Jun Lu, Gabor Halasz, Marinus F. van Batenburg
  • Platforms Tested: Linux, Cygwin (Windows), Mac OS X
  • License:
  • Keywords: tiling arrays, expression, transcriptome
  • URL: http://www.bussemakerlab.org/software/TranscriptionDetector/
  • Organization: MAGNet
  • NCBC Ontology Classification:

PhenoGO

  • Description: PhenoGO adds phenotypic contextual information to existing associations between gene products and Gene Ontology (GO) terms as specified in GO Annotations (GOA). PhenoGO utilizes an existing Natural Language Processing (NLP) system, called BioMedLEE, an existing knowledge-based phenotype organizer system (PhenOS) in conjunction with MeSH indexing and established biomedical ontologies. The system also encodes the context to identifiers that are associated in different biomedical ontologies, including the UMLS, Cell Ontology, Mouse Anatomy, NCBI taxonomy, GO, and Mammalian Phenotype Ontology. In addition, PhenoGO was evaluated for coding of anatomical and cellular information and assigning the coded phenotypes to the correct GOA; results obtained show that PhenoGO has a precision of 91% and recall of 92%, demonstrating that the PhenoGO NLP system can accurately encode a large number of anatomical and cellular ontologies to GO annotations. The PhenoGO Database may be accessed at www.phenogo.org.
  • Data Input: Gene Ontology Annotations Files and Medline Abstracts
  • Data Output: XML file and www.phenogo.org Web Portal
  • Implementation Language: A variety of modules, the web portal is in Java and MySQL, the computational terminology component (phenOS) is written in Perl scripts that queries tables in IBM DB2, the natural language processing component is written in PROLOG.
  • Version, Date, Stage: Version 2, Feb 2006
  • Authors: Yves Lussier and Carol Friedman are the principal investigators. The programmers are Jianrong Li, Lee Sam, and Tara Borlawsky
  • Platforms Tested: n/a
  • License: n/a
  • Keywords: Phenotypic integration, computational phenotypes
  • URL: http://www.phenogo.org
  • Organization: MAGNet
  • NCBC Ontology Classification: Biotool --> Data Management --> Information retrieval, traversal and querying; Atomic --> SoftwareFunction --> Natural Language Processing

MINDY

  • Description: Given a transcription factor of interest, MINDY uses a large set of gene expression profile data to identify potential post-transcriptional modulators of the transcription factor's activity. MINDY is based on a three-way statistical interaction model that captures the post-transcriptional regulatory event where the ability of a transcription factor to activate/repress its target genes is monotonically controlled by a potential modulator gene.
  • Data Input: Gene expression data in the EXP format, and a user-specified transcription factor of interest
  • Data Output: Lists of the putative modulators and target genes of the transcription factor, and the modulatory interactions involving them
  • Implementation Language: C++ and MATLAB, Java
  • Version, Date, Stage: Stable release, April 2007
  • Authors: Kai Wang, Ilya Nemenman, Adam Margolin, Riccardo Dalla-Favera, Andrea Califano
  • Platforms Tested: Linux, Cygwin
  • License: n/a
  • Keywords: gene expression, transcriptional interaction, modulator
  • URL: n/a
  • Organization: MAGNet
  • NCBC Ontology Classification: Atomic --> SoftwareFunction --> Genomic & Phenotypic Analysis --> Regulatory/Signaling network reconstruction

B Cell Interactome

  • Description: The B cell interactome (BCI) is a network of protein-protein, protein-DNA and modulatory interactions in human B cells. The network contains known interactions (reported in public databases) and predicted interactions by a Bayesian evidence integration framework which integrates a variety of generic and context specific experimental clues about protein-protein and protein-DNA interactions - such as a large collection of B cell expression profiles - with inferences from different reverse engineering algorithms, such as GeneWays and ARACNE. Modulatory interactions are predicted by MINDY, an algorithm for the prediction of modulators of transcriptional interactions.
  • Data Input: n/a
  • Data Output: text file of binary interations associated with a probability.
  • Implementation Language: Perl
  • Version, Date, Stage: Version 2, March 2007
  • Authors: Lefebvre C, Lim WK, Basso K, Dalla Favera R, and Califano A.
  • Platforms Tested: n/a
  • License:
  • Keywords: Naive Bayes, Mixed-Interaction Network, human B cells.
  • URL: http://amdec-bioinfo.cu-genome.org/html/BCellInteractome.html
  • Organization: MAGNet
  • NCBC Ontology Classification: Atomic --> SoftwareFunction --> Interaction Modeling

ARACNE

  • Description: ARACNE is an algorithm for inferring gene regulatory networks from a set of microarray experiments. The method uses mutual information to identify genes that are co-expressed and then applies the data processing inequality to filter out interactions that are likely to be indirect.
  • Data Input: Text file containing measurements from a set of microarray experiments.
  • Data Output: Text file containing predicted interactions.
  • Implementation Language: C++, Java
  • Version, Date, Stage: Version 1, June, 2006
  • Authors: Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla Favera R, Califano A.
  • Platforms Tested: Window, Linux
  • License: Open source
  • Keywords: Reverse engineering, mutual information, genetic networks, microarray
  • URL: http://amdec-bioinfo.cu-genome.org/html/ARACNE.htm
  • Organization: MAGNet
  • NCBC Ontology Classification: Atomic --> SoftwareFunction --> Genomic & Phenotypic Analysis --> Regulatory/Signaling network reconstruction


geWorkbench

  • Description: geWorkbench is a Java application that provides users with an integrated suite of genomics tools. It is built on an open-source, extensible architecture that promotes interoperability and simplifies the development of new as well as the incorporation of pre-existing components. The resulting system provides seamless access to a multitude of both local and remote data and computational services through an integrated environment that offers a unified user experience. Over 50 data analysis and visualization components have been developed for the framework, covering a wide range of genomics domains including gene expression, sequence, structure and network data.
  • Data Input: Gene epxression data (Affy, GenPix, RMA), Sequence (FASTA), Structure (PDB).
  • Data Output: Analysis results (multiple formats).
  • Implementation Language: Java
  • Version, Date, Stage: 1.0.5, 3/23/07, stable production release
  • Authors: A. Califano, A. Floratos. M. Kustagi, K. Smith, J. Watkinson, M. Hall, K. Keshav, X. Zhang, K. Kushal, B. Jagla, E. Daly, M. VanGinhoven, P. Morozov.
  • Platforms Tested: Windows XP, Linux, Mac OS 10.x.
  • License: Free.
  • Keywords: Analysis suite, gene expression analysis, sequence analysis, network reconstruction, structure predcition, visualization.
  • URL: http://www.geworkbench.org
  • Organization: MAGNet
  • NCBC Ontology Classification: Atomic --> SoftwareFunction --> Genomic & Phenotypic Analysis; Atomic --> SoftwareFunction --> Interaction Modeling; Atomic --> SoftwareFunction --> Protein Modeling and Classification; Atomic --> SoftwareFunction --> Software Engineering and Development Tool --> Integration --> Resource Integration Components; Atomic --> SoftwareFunction --> Software Engineering and Development Tool --> Integration --> Grid Computing Resources; Atomic --> SoftwareFunction --> Visualization