Mbirn: UCSD Workflow Development Retreat (July 17- 18, 2005)
Contents
Action Items (site responsible/target date)
- LONI Pipeline version 3 beta 2.0 to be deployed on the staging server at BIRN-CC (BIRN-CC, UCLA/as soon as BIRN rack upgrade is complete, ~2 weeks)
- Development of a LONI pipeline module for newest version of Freesurfer (UCLA, MGH/TBD at next conference call)
- SRB integration with LONI Pipeline (BIRN-CC, UCLA/ Sept. 1)
- XNAT launch of LONI Pipeline deployed on the BIRN staging server to do any of the currently available processing modules (Wash U, UCLA, BIRN-CC/ Oct AHM)
- Optional: HID launch of LONI Pipeline deployed on the BIRN staging server to do any of the currently available processing modules (UCI, UCLA, BIRN-CC/ Oct AHM)
- SASHA utilization of Condor scheduling services (JHU, BIRN-CC/Sept.1)
- Kepler instance of Freesurfer pipeline (MGH, BIRN-CC/ Spring 2006)
- XNAT launch of Kepler Workflow (?Freesufer) (Wash U, MGH, BIRN-CC/ Spring 2006)
Mechanism for coordinated development of BIRN Workflows/Pipelines:
- Post these meeting notes (to be corrected by all present) and other documents shared at meeting on Wiki
- Regularly updated Wiki pages
- Monthly teleconference calls (August 4th, and September 1st at 1 PM PST/4 PM EST)
- Dedicated 4 hour session at October AHM
see bottom of page for full meeting report
Logistics
- When?
- Sunday July 17: Kick off dinner
- Monday, July 18: Full-day meeting
- Where? BIRN CC Conference Center, UCSD
- Homework?
- Read S. Murphy's report Mbirn: Workflow Updates (May 17, 2005 updates)
- Each testbed prepare use cases of expected workflow needs, both short term and long term goals, post report here in advance of meeting.
- Participants
MGH: S. Murphy, D. Kennedy, R. Gollub, M. Mendis BWH: S. Pieper JHU: A. Kolasny WashU: D. Marcus UCLA: A. Toga, M. Pan UCI: D. Keator, J.Turner UCSD: A. Dale UCSD (Mouse): J. Tran BIRN CC: J. Grethe, A. Lin, M. Ellisman, M. James SDSC: Ilkay Altintas (altintas@sdsc.edu, http://users.sdsc.edu/~altintas) Note: All testbeds are represented
Agenda
Sunday, July 17
- ** 7pm Dinner Reservation at Japengos
- The Aventine Center, 8960 University Centre Lane, La Jolla, California 92037, United States
- Tel: 858 450 3355
- Reservation for 10 people under the name of Jorge Jovicich
Please write your name here if you'll be available A. Kolasny, S. Pieper, D. Marcus, J. Grethe
Monday, July 18
9:00-10:30am:
What is a BIRN Workflow? Collect list of use cases
- . Some are interested in focusing on analysis workflows – keeping track of which data processing steps have been done
- . A more advanced form of data provenance
- . Workflows as a management system - i.e., a procedure that keeps track of what steps need to be executed next.
- . Active systems that keep track of progress
- . Clinical and experiment management systems
- . Used to order, schedule, and keep track of patients as they progress through a study.
Agree on the ones that we will focus our developmental efforts
10:30-11:00am: Coffee break
11:00-12:30pm:
Detailed discussion of workflow issues
12:30:1:30pm: Lunch break
1:30-3:00pm:
Discussion of possible first step in developing workflow
3:00-3:30pm: Coffee break
3:30-5:00pm:
Wrap up delegate who will do what, plans for AHM and milestones
UCLA report
- The Web Ontology Language for Services (OWL-S) is a good starting point for the description of the necessary knowledge constructs for describing workflows. The OWL-S submission to the W3C for becoming a W3C recommendation can be found at http://www.w3.org/Submission/2004/07/
- We have a working draft of a paper, Michael J Pan, David E Rex, Arthur W Toga. Coordination of scientific workflows for collaborative research, on scientific workflow issues and how the Pipeline addresses these issues on the Pipeline publications page.
Large Scale Distributed Processing
Instructions from BIRN-CC are available as part of the NAMIC pages for distributed computation via condor: Engineering:Project:Condor_Job_Submission
trying out the GraphViz methods
MEETING NOTES
Action Items (site responsible/target date):
1- LONI Pipeline version 3 beta 2.0 to be deployed on the staging server at BIRN-CC (BIRN-CC, UCLA/as soon as BIRN rack upgrade is complete, ~2 weeks) 2- Development of a LONI pipeline module for newest version of Freesurfer (UCLA, MGH/TBD at next conference call) 3- SRB integration with LONI Pipeline (BIRN-CC, UCLA/ Sept. 1) 4- XNAT launch of LONI Pipeline deployed on the BIRN staging server to do any of the currently available processing modules (Wash U, UCLA, BIRN-CC/ Oct AHM) 5- Optional: HID launch of LONI Pipeline deployed on the BIRN staging server to do any of the currently available processing modules (UCI, UCLA, BIRN-CC/ Oct AHM) 6- SASHA utilization of Condor scheduling services (JHU, BIRN-CC/Sept.1) 7- Kepler instance of Freesurfer pipeline (MGH, BIRN-CC/ Spring 2006) 8- XNAT launch of Kepler Workflow (?Freesufer) (Wash U, MGH, BIRN-CC/ Spring 2006)
Mechanism for coordinated development of BIRN Workflows/Pipelines: 1- Post these meeting notes (to be corrected by all present) and other documents shared at meeting on Wiki 2- Regularly updated Wiki pages 3- Monthly teleconference calls (August 4th, and September 1st at 1 PM PST/4 PM EST) 4- Dedicated 4 hour session at October AHM 5-
Consensus vision for BIRN Workflows/Pipelines emerged from the discussions. A successful BIRN Workflow/Pipeline will enable “Low Tech” users to process data like a “High Tech” user. As the parts of this plan are developed and implemented, then new capabilities will emerge, e.g. the workflow engine will be able to suggest appropriate analysis modules such as which segmentation algorithm to use for a given scan acquisition sequence.
We will: 1- Focus primarily on image data analysis workflows, not on data management solutions or on tasks such as data upload or data down load. Although the Kepler expert, Ilkay Altintas, raised the counterpoint that once we develop the essential features of a Kepler workflow that is interoperable with our databases, it is not conceptually different to make data management workflows or data analysis workflows. 2- Develop solutions that will span the range of implementation from a local analysis pipeline a single investigator can use to analyze their own data with tools at their own site to a web based pipeline that can access data and analysis capabilities distributed across multiple sites. 3- Solutions will include the capability of initiation and completion of grid computing large batch jobs. 4- Develop over time the capability to “launch” BIRN workflow pipelines from within a DBMS, from a workflow engine and from a webpage. It is expected that this will require many years of development and will proceed sequentially in the order listed here. 5- Support multiple DBMS (beginning with XNAT, HID, LONI DB, others to be added as need arises) 6- Support multiple workflow engines (including the LONI Pipeline and a new Kepler/Taverna one to be newly developed)
To accommodate these design decisions we will: 8- Develop specifications for data types to be supported and other programming standards that are required. 9- Develop a consensus Workflow language layer (XSLT) to permit interoperability. This will be a limited universal BIRN workflow representation 10- Develop or adapt analysis modules (pipelets) as needed to create a repository of analysis capabilities. 11- Develop APIs (application program interfaces) to allow this interoperability 12- Develop the WSDL specs to support the ability to run the workflows from a web page.
The key collaborating projects to drive the pipeline development include two of our prototype projects, SASHA and MAD, and the ADNI, VETSA, and F-BIRN projects.
- SASHA will drive the development of improvements in SRB performance, especially in the domain of data transfer. SASHA will also drive the development of improved grid computing scheduling and staging by use of Condor tools. Anthony Kolasny will work on this with BIRN-CC contacts Jeff Grethe, Able Lim, and Vicky Rowley.
- MAD will drive the development of the LONI and Kepler BIRN workflows in that it consists of multi-site image data (total n=~400 healthy and AD subjects, with another ~37 MCI subjects available) that can either be analyzed from a single site (all of it is at MGH now) or multi-site later if we need to test this aspect as it comes from UCSD, Wash U and MGH/BWH. It is ready to be loaded into the newest version of the XNAT DBMS at MGH this summer. It has already been analyzed with Freesurfer, but there are many research questions that require re-analysis of the same data with Freesurfer, making systematic changes in analysis parameters (e.g. exploration of the impact of different atlases, of different software versions, of fully automated versus manually tweaked processing, and others).
- VETSA (n ~=700 subjects, 350 twin pairs) needs improved automated Freesurfer ouputs as well as integration with large volume of non-imaging meta data. Has multiple structural MR acquisition parameters (MPRAGE and ME FLASH) that could be systematically studied. Also includes DTI data. Needs improved registration modules.
- ADNI (two phases of data acquisition, Prep Phase already collected has 60 healthy and 60 AD subjects, the healthy ones scanned twice within 2 weeks, representing 6 sites, two field strengths (1.5T and 3T), and three scan platforms (Siemens, GE, and Phillips), and multiple acquisition sequences (MPRAGE, ME-FLASH, IR-SPGR, T2FSE). Anders is eager to collaborate with BIRN on the analysis of the Prep Phase data. One idea being implemented is using the MAD data for some methods development that will be used on this data. New post-doc working on this project is Cooper Roddey who will integrate with the MAD work group.
The Clinical phase is either not yet started or recently begun. A key feature of this data set is the longitudinal component.
- F-BIRN has plans for large scale data acquisition over the proposed 5 years of funding (~450 subjects/year) that they would like to have analyzed with the Freesurfer tools.
Development plans: 1-The initial BIRN module will be to wrap the Freesurfer analysis software. This was selected both because it is required by most of our major clinical collaborating projects (ADNI, VETSA, and F-BIRN) as well as by many individual collaborating investigators and because the workflow programming demands are extensive but can be developed in smaller sub-modules. For example, manual checking and/or editing of interim analysis results are required that necessitate the development of robust communication workflows between the analysis software and the user. 2- The other initial development effort will be led by Anthony Kolasny and the BIRN-CC folks
Documents circulated at meeting and to be attached here: 1- FreeSurfer Surface Reconstruction Workflow Design Document (a Java based workflow) by Burak Ozyurt 2- FreeSurfer Kepler Workflow by Michael Mendis