Slicer3:Remote Data Handling
Contents
- 1 Current status of Slicer's (local) data handling
- 2 Goal for how Slicer would upload/download from remote data stores
- 3 Suggested first pass implementation driven by two use cases
- 4 ITK-based mechanism handling remote data (for command line modules, batch processing, and grid processing) --Nicole
- 5 vtkMRMLStorageNode methods for handling remote data (for loading and saving data for interactive use) --Wendy
- 6 Asynchronous I/O Manager --Wendy
Current status of Slicer's (local) data handling
Currently, MRML files, Xcede catalog files, XNAT archives and individual datasets are all loaded from local disk. All remote datasets are downloaded (via web interface or command line) outside of Slicer:
Goal for how Slicer would upload/download from remote data stores
Eventually, we would like to download uris remotely or locally from the Application itself, and have the option of uplaoding to remote stores as well (here's a sketch -- does this look right?):
Suggested first pass implementation driven by two use cases
For BIRN, we'd like to demonstrate two use cases:
- First, is loading a combined FIPS/FreeSurfer analysis, specified in an Xcede catalog (.xcat) file, and view this with Slicer's QueryAtlas.
- Second, is running a batch job in Slicer that processes a set of remotely held datasets. Each iteration would take as arguments the XML file parameterizing the EMSegmenter, the uri for the remote dataset, and a uri for storing back the segmentation results.
The subset of functionality we'd need is shown below:
... Our approach in the first use case would be to manually upload a test Xcede catalog file and its constituent datasets to some place on the SRB. We'll keep a copy of the catalog file locally, read it and query SRB for each uri in the .xcat file.
ITK-based mechanism handling remote data (for command line modules, batch processing, and grid processing) --Nicole
vtkMRMLStorageNode methods for handling remote data (for loading and saving data for interactive use) --Wendy
The first goal is to figure out what workflows to support, and a good implementation approach.
Currently, Load Scene, Import Scene, and Add Data options in Slicer all encapsulate two steps:
- locating a dataset, usually accomplished through a file browser, and
- selecting a dataset to initiate loading.
Then MRML files, Xcede catalog files, or individual datasets are loaded from local disk.
For loading remote datasets, the following options are available:
- break these two steps apart explicitly (easiest option),
- bind them together under the hood,
- or support both of these paradigms.
Breaking apart "find data" and "load data":
Possible workflow A
- User downloads .xcat or .xml (MRML) file to disk using the HID or an XNAT web interface
- From the Load Scene file browser, user selects the .xcat or .xml archive. If no locally cached versions exist, each remote file listed in the archive is downloaded to /tmp directory (always locally cached) by the Download Manager, and then loaded into Slicer via a vtkMRMLStorageNode method when download is complete.
Possible workflow B
- User downloads .xcat or .xml (MRML) file to disk using the HID or an XNAT web interface
- From the Load Scene file browser, user selects the .xcat or .xml archive. If no locally cached versions exist, each remote file in the archive is downloaded to /tmp (only if a flag is set) by the Download Manager, and loaded directly into Slicer via a vtkMRMLStorageNode method when download is complete. (How does load work if we don't save to disk first?)
Possible workflow C
- User locates a MRML file, .xcat archive, or individual dataset on the HID or an XNAT web interface
- User types the uri into the Load Scene, Import Scene, or Add Data interfaces.
- If no locally cached versions exist, each remote file in the archive is cached to /tmp by the Download Manager, and loaded directly into Slicer via a vtkMRMLStorageNode method when download is complete.
In each workflow, the data gets saved to disk first and then loaded into Slicer. Here's a first pass at how things might work -- we can discuss at meeting:
Or, bundling together "find data" and "load data":
Possible workflow D
In this workflow, Slicer would make calls to HID or XNAT webservices to determine what data of interest is available... Questions:
- How might this work?
- Do we really want to re-implement functionality in the HID web interface?
- Maybe Slicer can implement a workflow (A-C) but also offer a simplified BIRN query interface that has functionality like:
- Request BIRNIDs for all subjects who have a complete FIPS/FreeSurfer analysis
- Request an xcat for one of these BIRNIDs
Saving Data back to remote site
- Since we have no plan for where to save MRML files on HID, can we have a webservices function we can call from Slicer that writes a file to /dev/null on HID in the meanwhile?
What data do we need in an .xcat file?
For the fBIRN QueryAtlas use case, we need a combination of FreeSurfer morphology analysis and a FIPS analysis of the same subject. With the combined data in Slicer, we can view activation overlays co-registered to and overlayed onto the high resolution structural MRI using the FIPS analysis, and determine the names of brain regions where activations occur using the co-registered morphology analysis.
The required analyses including all derived data are in two standard directory structures on local disk, and *hopefully* somewhere on the HID within a standard structure (check with Burak). These directory trees contain a LOT of files we don't need... Below are the files we *do* need for fBIRN QueryAtlas use case.
FIPS analysis (.feat) directory and required data
For instance, the FIPS output directory in our example dataset from Doug Greve at MGH is called sirp-hp65-stc-to7-gam.feat. Under this directory, QueryAtlas needs the following datasets:
- sirp-hp65-stc-to7-gam.feat/reg/example_func.nii
- sirp-hp65-stc-to7-gam.feat/reg/freesurfer/anat2exf.register.dat
- sirp-hp65-stc-to7-gam.feat/stats/(all statistics files of interest)
- sirp-hp65-stc-to7-gam.feat/design.gif (this image relates statistics files to experimental conditions)
FreeSurfer analysis directory, and required data
For instance, the FreeSurfer morphology analysis directory in our example dataset from Doug Greve at MGH is called fbph2-000670986943. Under this directory, QueryAtlas needs the following datasets:
- fbph2-000670986943/mri/brain.mgz
- fbph2-000670986943/mri/aparc+aseg.mgz
- fbph2-000670986943/surf/lh.pial
- fbph2-000670986943/surf/rh.pial
- fbph2-000670986943/label/lh.aparc.annot
- fbph2-000670986943/label/rh.aparc.annot
What do we want HID webservices to provide?
- Question: are FIPS and FreeSurfer analyses (including QueryAtlas required files listed above) for subjects available on the HID yet?
- The BIRN HID webservices shouldn't really need to know the subset of data that QueryAtlas needs... maybe the web interface can take a BIRN ID and create a FIPS/FreeSurfer xcede catalog with all uris (http://....) in the FIPS and FreeSurfer directories, and package these into an Xcede catalog.
- The catalog could be requested and downloaded from the HID web GUI, with a name like .xcat or .xcat.gzip or whatever. QueryAtlas could then open this file (or unzip and open) and filter for the relevant uris for an fBIRN or Qdec QueryAtlas session.
- Maybe later, this catalog could be requested programmatically from a Slicer webservices client, that gives a particular BIRN ID. (For now, it's reasonable to go thru the HID GUI).
- Then, for each uri in a catalog (or .xml MRML file), we'll use curl to download; so we need all datasets to be publicly readable.
- Can we create a directory (even a temporary one) on the HID for Slicer scene uploads?
- We need some kind of upload service, a function call that takes a dataset and a BIRNID and uploads data to appropriate directory on HID.
See this page for more discussion of QueryAtlas's current use of Xcede catalogs, and assumptions...
Asynchronous I/O Manager --Wendy
vtkMRMLStorageNode superclass needs to have methods which handle remote or local data loading, whether the uris are contained in an xcat or mrml file. Kind of like this:
- Each subclass of vtkMRMLStorageNode will call the superclass method first.
- Superclass method will look at uri, and decide if dataset is local or remote.
- If local, the subclass will load the data and return.
- If remote, the superclass will check to see if the data is cached on disk (in /tmp or wherever).
- If data is cached, subclass method will load that dataset from disk and return.
- If data is not cached, subclass method will spawn an independent thread of control that will interact with the Asynchronous I/O Manager, passing it the type of storage node required for the dataset:
- Thread will create a new download entry and observe the cancel button
- it will make whatever call it needs to download (http)
- it will display progress on a progress meter.
- and when complete, it will call a method on the vtkMRMLStorageNode subclass to load dataset from local cache.