Slicer3:Remote Data Handling

From NAMIC Wiki
Jump to: navigation, search
Home < Slicer3:Remote Data Handling

Back to Slicer3 Projects List


ITK-based mechanism handling remote data (for command line modules, batch processing, and grid processing) --Nicole

vtkMRMLStorageNode methods for handling remote data (for loading and saving data for interactive use) --Wendy

The first goal is to figure out what workflows to support, and a good implementation approach.

Currently, Load Scene, Import Scene, and Add Data options in Slicer all encapsulate two steps:

  • locating a dataset, usually accomplished through a file browser, and
  • selecting a dataset to initiate loading.

For loading remote datasets, the following options are available:

  • break these two steps apart explicitly,
  • bind them together under the hood,
  • or support both of these paradigms.

Breaking apart "find data" and "load data":

(this seems the more reasonable solution).

Possible workflow A

  • User downloads .xcat or .xml (MRML) file to disk using the HID or an XNAT web interface
  • From the Load Scene file browser, user selects the .xcat or .xml archive. If no locally cached versions exist, each remote file listed in the archive is downloaded to /tmp directory (always locally cached) by the Download Manager, and then loaded into Slicer via a vtkMRMLStorageNode method when download is complete.

Possible workflow B

  • User downloads .xcat or .xml (MRML) file to disk using the HID or an XNAT web interface
  • From the Load Scene file browser, user selects the .xcat or .xml archive. If no locally cached versions exist, each remote file in the archive is downloaded to /tmp (only if a flag is set) by the Download Manager, and loaded directly into Slicer via a vtkMRMLStorageNode method when download is complete.

Possible workflow C

  • User locates a MRML file, .xcat archive, or individual dataset on the HID or an XNAT web interface
  • User types the uri into the Load Scene, Import Scene, or Add Data interfaces.
  • If no locally cached versions exist, each remote file in the archive is cached to /tmp by the Download Manager, and loaded directly into Slicer via a vtkMRMLStorageNode method when download is complete.

Or, bundling together "find data" and "load data":

Possible workflow D

In this workflow, Slicer would make calls to HID or XNAT webservices to determine what data of interest is available... Questions:

  • How might this work?
  • Do we really want to re-implement functionality in the HID web interface?
  • Maybe Slicer can implement a workflow (A-C) but also offer a simplified BIRN query interface that has functionality like:
    • Request BIRNIDs for all subjects who have a complete FIPS/FreeSurfer analysis
    • Request an xcat for one of these BIRNIDs

Saving Data back to remote site

  • Since we have no plan for where to save MRML files on HID, can we have a webservices function we can call from Slicer that writes a file to /dev/null on HID in the meanwhile?

What data do we need in an .xcat file?

For the fBIRN QueryAtlas use case, we need a combination of FreeSurfer morphology analysis and a FIPS analysis of the same subject. With the combined data in Slicer, we can view activation overlays co-registered to and overlayed onto the high resolution structural MRI using the FIPS analysis, and determine the names of brain regions where activations occur using the co-registered morphology analysis.

The required analyses including all derived data are in two standard directory structures on local disk, and *hopefully* somewhere on the HID within a standard structure (check with Burak). These directory trees contain a LOT of files we don't need... Below are the files we *do* need for fBIRN QueryAtlas use case.

FIPS analysis (.feat) directory and required data

For instance, the FIPS output directory in our example dataset is called sirp-hp65-stc-to7-gam.feat. Under this directory, QueryAtlas needs the following datasets:

  • sirp-hp65-stc-to7-gam.feat/reg/example_func.nii
  • sirp-hp65-stc-to7-gam.feat/reg/freesurfer/anat2exf.register.dat
  • sirp-hp65-stc-to7-gam.feat/stats/(all statistics files of interest)
  • sirp-hp65-stc-to7-gam.feat/design.gif (this image relates statistics files to experimental conditions)

FreeSurfer analysis directory, and required data

For instance, the FreeSurfer morphology analysis directory in our example dataset is called fbph2-000670986943. Under this directory, QueryAtlas needs the following datasets:

  • fbph2-000670986943/mri/brain.mgz
  • fbph2-000670986943/mri/aparc+aseg.mgz
  • fbph2-000670986943/surf/lh.pial
  • fbph2-000670986943/surf/rh.pial
  • fbph2-000670986943/label/lh.aparc.annot
  • fbph2-000670986943/label/rh.aparc.annot

What do we want HID webservices to provide?

  • The BIRN HID webservices shouldn't really need to know the subset of data that QueryAtlas needs... maybe the web interface can take a BIRN ID and create a FIPS/FreeSurfer xcede catalog with all uris in the FIPS and FreeSurfer directories, and package these into an Xcede catalog.
  • The catalog could be requested and downloaded from the HID web GUI, with a name like .xcat or .xcat.gzip or whatever. QueryAtlas could then open this file (or unzip and open) and find only the relevant uris for an fBIRN QueryAtlas session.
  • Then, for each uri in a catalog (or .xml MRML file), we need to be able to download programmatically using http:// or ftp:// etc. We are assuming all datasets are publicly readable.

See this page for more discussion of QueryAtlas's current use of Xcede catalogs, and assumptions...

Asynchronous I/O Manager --Wendy

vtkMRMLStorageNode superclass needs to have methods which handle remote or local data loading, whether the uris are contained in an xcat or mrml file. Kind of like this:

  • Each subclass of vtkMRMLStorageNode will call the superclass method first.
  • Superclass method will look at uri, and decide if dataset is local or remote.
  • If local, the subclass will load the data and return.
  • If remote, the superclass will check to see if the data is cached on disk (in /tmp or wherever).
  • If data is cached, subclass method will load that dataset from disk and return.
  • If data is not cached, subclass method will spawn an independent thread of control that will interact with the Asynchronous I/O Manager, passing it the type of storage node required for the dataset:
    • Thread will create a new download entry and observe the cancel button
    • it will make whatever call it needs to download (http)
    • it will display progress on a progress meter.
    • and when complete, it will call a method on the vtkMRMLStorageNode subclass to load dataset from local cache.