Difference between revisions of "Slicer3:Remote Data Handling"
Line 7: | Line 7: | ||
= Goal for how Slicer would upload/download from remote data stores = | = Goal for how Slicer would upload/download from remote data stores = | ||
− | Eventually, we would like to download uris remotely or locally from the Application itself, and have the option of uplaoding to remote stores as well. A draft sketch is below (feedback please!), which features a new Remote Reference Handler, owned by the MRMLScene, and an | + | Eventually, we would like to download uris remotely or locally from the Application itself, and have the option of uplaoding to remote stores as well. A draft sketch is below (feedback please!), which features a new Remote Reference Handler, owned by the MRMLScene, and an (asynchronous) Data I/O and caching mechanism: |
[[image:DataLoadingTarget.png ]] | [[image:DataLoadingTarget.png ]] | ||
Line 86: | Line 86: | ||
== RemoteIO library classes, depend on vtk and libcmcurl== | == RemoteIO library classes, depend on vtk and libcmcurl== | ||
− | === New /Libs/RemoteIO/''' | + | === New /Libs/RemoteIO/'''vtkDataIOManager''' class: === |
− | Slicer main application creates a | + | Slicer main application creates a vtkDataIOManager class and |
each registered URIHandler gets its IOManager member set to this address. | each registered URIHandler gets its IOManager member set to this address. | ||
The IOManager keeps a collection of data transfers and methods to | The IOManager keeps a collection of data transfers and methods to | ||
Line 95: | Line 95: | ||
Upload() methods... how to monitor status and terminate these? | Upload() methods... how to monitor status and terminate these? | ||
− | ''' | + | '''vtkDataIOManager members:''' |
* DataTransferCollection | * DataTransferCollection | ||
− | ''' | + | '''vtkDataIOManager methods:''' |
* Get/Set DataTransferCollection | * Get/Set DataTransferCollection | ||
* AddDataTransfer ( transferID , *srcURI, *destURI, *vtkMRMLStorageNode) | * AddDataTransfer ( transferID , *srcURI, *destURI, *vtkMRMLStorageNode) | ||
Line 134: | Line 134: | ||
* LocalFile | * LocalFile | ||
* vtkPasswordPrompter *PasswordPrompter | * vtkPasswordPrompter *PasswordPrompter | ||
− | * | + | * vtkDataIOManager *IOManager |
* ErrorString | * ErrorString | ||
* FileSize | * FileSize | ||
Line 240: | Line 240: | ||
== Changes to /Base/GUI == | == Changes to /Base/GUI == | ||
− | === New /Base/GUI/''' | + | === New /Base/GUI/'''vtkSlicerDataIOManager''' class, derived from '''vtkDataIOManager''': === |
− | ''' | + | '''vtkSlicerDataIOManager members:''' |
* vtkKWTopLevel | * vtkKWTopLevel | ||
* ... | * ... | ||
− | ''' | + | '''vtkSlicerDataIOManager methods:''' |
=== New /Base/GUI/vtkSlicerDataTransferWidget class, derived from vtkKWWidget=== | === New /Base/GUI/vtkSlicerDataTransferWidget class, derived from vtkKWWidget=== | ||
Line 288: | Line 288: | ||
'''Possible workflow A''' | '''Possible workflow A''' | ||
* User downloads .xcat or .xml (MRML) file to disk using the HID or an XNAT web interface | * User downloads .xcat or .xml (MRML) file to disk using the HID or an XNAT web interface | ||
− | * From the Load Scene file browser, user selects the .xcat or .xml archive. If no locally cached versions exist, each remote file listed in the archive is downloaded to /tmp directory (always locally cached) by the | + | * From the Load Scene file browser, user selects the .xcat or .xml archive. If no locally cached versions exist, each remote file listed in the archive is downloaded to /tmp directory (always locally cached) by the Data I/O Manager, and then cached (local) uri is passed to vtkMRMLStorageNode method when download is complete. |
'''Possible workflow B''' | '''Possible workflow B''' | ||
* User downloads .xcat or .xml (MRML) file to disk using the HID or an XNAT web interface | * User downloads .xcat or .xml (MRML) file to disk using the HID or an XNAT web interface | ||
− | * From the Load Scene file browser, user selects the .xcat or .xml archive. If no locally cached versions exist, each remote file in the archive is downloaded to /tmp (only if a flag is set) by the | + | * From the Load Scene file browser, user selects the .xcat or .xml archive. If no locally cached versions exist, each remote file in the archive is downloaded to /tmp (only if a flag is set) by the Data IO Manager, and loaded directly into Slicer via a vtkMRMLStorageNode method when download is complete. (How does load work if we don't save to disk first?) |
'''Workflow C''' | '''Workflow C''' | ||
Line 340: | Line 340: | ||
[[ Slicer3:XCEDE_use_cases | For more info, see this page for more discussion of QueryAtlas's current use of Xcede catalogs, and assumptions... ]] | [[ Slicer3:XCEDE_use_cases | For more info, see this page for more discussion of QueryAtlas's current use of Xcede catalogs, and assumptions... ]] | ||
− | = Slicer's | + | = Slicer's Data I/O Manager = |
[[image:DataIOManager.png]] | [[image:DataIOManager.png]] |
Revision as of 21:27, 12 February 2008
Home < Slicer3:Remote Data HandlingContents
- 1 Current status of Slicer's (local) data handling
- 2 Goal for how Slicer would upload/download from remote data stores
- 3 TWO Use CASES can drive a first pass implementation
- 4 Draft implementation design
- 5 ITK-based mechanism handling remote data (for command line modules, batch processing, and grid processing) (Nicole)
- 6 Workflows to support
- 7 What do we want HID webservices to provide?
- 8 Slicer's Data I/O Manager
Current status of Slicer's (local) data handling
Currently, MRML files, XCEDE catalog files, XNAT archives and individual datasets are all loaded from local disk. All remote datasets are downloaded (via web interface or command line) outside of Slicer. In the BIRN 2007 AHM we demonstrated downloading .xar files from a remote database, and loading .xar and .xcat files into Slicer from local disk using Slicer's XNAT archive reader and XCEDE2.0 catalog reader. Slicer's current scheme for data handling is shown below:
Goal for how Slicer would upload/download from remote data stores
Eventually, we would like to download uris remotely or locally from the Application itself, and have the option of uplaoding to remote stores as well. A draft sketch is below (feedback please!), which features a new Remote Reference Handler, owned by the MRMLScene, and an (asynchronous) Data I/O and caching mechanism:
TWO Use CASES can drive a first pass implementation
For BIRN, we'd like to demonstrate two use cases:
- First, is loading a combined FIPS/FreeSurfer analysis, specified in an Xcede catalog (.xcat) file that contains uris pointing to remote datasets, and view this with Slicer's QueryAtlas. (...if we cannot get an .xcat via the HID web GUI, our approach would be to manually upload a test Xcede catalog file and its constituent datasets to some place on the SRB. We'll keep a copy of the catalog file locally, read it and query SRB for each uri in the .xcat file.)
- Second, is running a batch job in Slicer that processes a set of remotely held datasets. Each iteration would take as arguments the XML file parameterizing the EMSegmenter, the uri for the remote dataset, and a uri for storing back the segmentation results.
The subset of functionality we'd need is shown below. This is a sketch proposed for discussion and refinement:
Draft implementation design
Initial class design that extends MRML and the Slicer3 Application Settings, defines a vtkURIHander (or vtkMRMLURIHandler?), and an interface to Data IO Manager:
New Application Interface Settings
Slicer3 Application Interface Settings:
- CacheDirectory
- Enable/Disable asynchronous I/O
- Instance & Register URI Handlers (?)
URIs that we’d want to handle
- ftp://user:pw@host:port/path/to/volume.nrrd (read and write)
- http://host/file.utp (read only)
We want this to work with or without the asynchronous read/write turned on, and with or without the dataIOManager GUI interface.
How it works for reading remote data (rough sketch)
- a vtkMRML<DataType>StorageNode is created for a new uri
- it's URI is set
- its URI and the MRMLScene->CacheDirectory are used to set its CacheFileName.
- vtkMRML<DataType>StorageNode calls its ReadData(*uri) method.
- vtkMRML<DataType>StorageNode's ReadData(uri) method calls Superclass::StageReadData( CacheFileName, caller) method, where caller = vtkmRML<DataType>StorageNode's *this ptr (?)
- vtkMRMLStorageNode's StageReadData():
- calls Scene's FindURIHandler( uri) method,
- vtkMRMLScene's FindURIHandler(uri) method:
- loops through collection of registered vtkURIHandlers
- calls each registered vtkURIHandler's CanHandleURI() method until one returns true
- returns ptr to vtkURIHandler that works, or NULL
- If urihandler != NULL, StageReadData() method calls urihandler->SetURI(uri) method
- calls urihandler->SetCacheFile(CacheFileName) method
- calls urihandler->SetCaller() method.
- calls urihandler-> Download() method.
- urihandler's Download():
- determines if prompting is required; calls its PasswordPrompt() method if so
- if urihandler->IOManager->Asynchronous == false, and urihandler->RemoteFlag == 1:
- downloads data
- records the datatransfer in the log via LogDataTransfer( caller, urihandler, transferID, source, destination) method
- calls urihandler->Notify ( caller, event ) method
- which calls caller->ReadData(uri).
- OR if uri->handler->IOManager->Asynchronous == false and urihandler->RemoteFlag == 0:
- loads data and returns.
- if urihandler->IOManager->Asynchronous == true,
- creates new thread of control to download data using curl and returns some unique pID (?how does it work)
- records the datatransfer in the log via LogDataTransfer( caller, urihandler, pID, transferType, source, destination) method. transferType could be: upload, download, fromdisk, todisk
- gets the vtkSlicerDataIOManager (who owns it and how does uri handler get it?)
- creates new control thread which calls vtkSlicerDataIOManager->Manage(pID)
- vtkSlicerDataIOManager's Manage(pID) method:
- Loops, watching the pid and observing for finish and cancel, maybe timeout events
- If event occurs, gets uriHandler for pID from DataLog
- and somehow gets the data back thru uriHandler to node...
- vtkSlicerDataIOManager's Manage(pID) method:
...
Class architecture and dependencies sketch
Text below is work in progress... description is not final spec
RemoteIO library classes, depend on vtk and libcmcurl
New /Libs/RemoteIO/vtkDataIOManager class:
Slicer main application creates a vtkDataIOManager class and each registered URIHandler gets its IOManager member set to this address. The IOManager keeps a collection of data transfers and methods to inspect them, query them and cancel active transfers by ID. ID is associated to the separate thread of control created by the URIHandler's Download() or Upload() methods... how to monitor status and terminate these?
vtkDataIOManager members:
- DataTransferCollection
vtkDataIOManager methods:
- Get/Set DataTransferCollection
- AddDataTransfer ( transferID , *srcURI, *destURI, *vtkMRMLStorageNode)
- CancelDataTransfer ( transferID )
- QueryTransferStatus ( transferID )
- gets some return value that says "going" or "finished"?
New /Libs/RemoteIO/vtkDataTransfer class:
- SourceURI
- DestinationURI
- Cancelled
- Finished
- TransferID
New /Libs/RemoteIO vtkPasswordPrompter class:
vtkPasswordPrompter members:
- Title
- Prompt
- UserName
- Password
- HelpString
- Service
- SavePasswordService (?)
vtkPasswordPrompter methods:
- Get/Set on members
- Prompt () ( ?stdio text prompting?)
New /Libs/RemoteIO/vtkURIHandler class
vtkURIHandler members:
- URI
- CacheDir
- LocalFile
- vtkPasswordPrompter *PasswordPrompter
- vtkDataIOManager *IOManager
- ErrorString
- FileSize
- ThreadID (?)
vtkURIHandler (virtual) methods:
- Get/Set on members ...
- int CanHandleURI ()
- QueryFileSize ()
- Download (filter watcher?)
- Upload (filter watcher?)
- QueryProgress ()
- First pass, returns value representing either "active" or "finished"
- Maybe later returns percent finished.
- (virtual) FinishTransfer ( )
New /Libs/RemoteIO classes, derived from vtkURIHandler
- vtkFileHandler
- int CanHandleURI()
- PasswordPrompter->Prompt() if required
- Download ( ) (filter watcher?)
- Separate thread of control for asynchronous download
- Returns some kind of threadID (not sure how to do this?)
- Upload ( ) (filter watcher?)
- Separate thread of control for asynchronous download
- Returns some kind of threadID (not sure how to do this?)
- FinishTransfer()
- Returns some value to the caller saying transfer is complete
- QueryProgress ()
- vtkHttpHandler
- vtkFtpHandler
- vtkSRBHandler
- vtkXNATHandler
- vtkS3Handler
- vtkSTDIOHandler
Extending MRML
vtkMRMLScene methods (and members)
vtkMRMLScene members:
- CacheDirectory
- URIHandlerCollection
- vtkMRMLFileHandler (this subclass of vtkFileHandler contains a pointer to a vtkMRMLStorageNode and methods to Get/Set it)
- vtkMRMLHttpHandler (this subclass of vtkHttpHandler contains a pointer to a vtkMRMLStorageNode and methods to Get/Set it)
- vtkMRMLFtpHandler (this subclass of vtkFtpHandler contains a pointer to a vtkMRMLStorageNode and methods to Get/Set it)
- ...
vtkMRMLScene methods:
- Get/Set on members...
- RegisterURIHandler ( *vtkURIHandler )
- Adds a URIHandler to the collection
New /Libs/MRML/vtkMRML<uri>Handler class, derived from vtk<uri>Handler
This includes vtkMRMLFilehandler, vtkMRMLHttpHandler, vtkMRMLFtpHandler, etc.
vtkMRMLFileHandler members:
- vtkMRMLStorageNode *Caller;
vtkMRMLFileHandler methods:
- Get/Set on members
New vtkMRMLStorageNode members and methods:
vtkMRMLStorageNode members:
- URI
- ErrorString?
vtkMRMLStorageNode methods:
- Get/Set URI
- Get/Set ErrorString
- void StageReadData ( const char *cacheFile, vtkMRMLNode *refNode )
- Clear error status
- Gets registered uri handlers from the scene
- Checks each handler in sequence to see which can handle the vtkMRMLStorageNode::URI
- If appropriate handler is found:
- handler->SetURI ( *uri)
- handler->SetCacheFile ( *cacheFile )
- handler->SetDestinationNode ( *refNode )
- handler->Download()
- void StageWriteData ( const char *cacheFile, vtkMRMLNode *refNode )
- Clear error status
- Gets registered uri handlers from the scene
- Checks each handler in sequence to see which can handle the vtkMRMLStorageNode::URI
- If appropriate handler is found:
- handler->SetURI ( *uri)
- handler->SetCacheFile ( *cacheFile )
- handler->SetDestinationNode ( *refNode )
- handler->Upload().
New vtkMRML<DataType>StorageNode members and methods:
vtkMRML<DataType>StorageNode members:
- CacheFileName
vtkMRML<DataType>StorageNode methods:
- Get/Set CacheFileName()
- ReadData( uri )
- Superclass::StageDataRead ( *CacheFileName, *this )
- WriteData( uri )
- Superclass::StageDataWrite ( *CacheFileName, *this )
Changes to /Base/GUI
New /Base/GUI/vtkSlicerDataIOManager class, derived from vtkDataIOManager:
vtkSlicerDataIOManager members:
- vtkKWTopLevel
- ...
vtkSlicerDataIOManager methods:
New /Base/GUI/vtkSlicerDataTransferWidget class, derived from vtkKWWidget
vtkSlicerDataTransferWidget members:
- ...(fills out interface to interact with one data transfer)
vtkSlicerDataTransfer methods:
- ...
New /Base/GUI/vtkSlicerPasswordPrompter class, derived from /Libs/RemoteIO/vtkPasswordPrompter
vtkSlicerPasswordPrompter members:
- vtkKWDialog *PromptWindow
- ...
vtkSlicerPasswordPrompter methods:
- DisplayPrompt () (KW dialog-based prompt?)
ITK-based mechanism handling remote data (for command line modules, batch processing, and grid processing) (Nicole)
This one is tenatively on hold for now.
Workflows to support
The first goal is to figure out what workflows to support, and a good implementation approach.
Currently, Load Scene, Import Scene, and Add Data options in Slicer all encapsulate two steps:
- locating a dataset, usually accomplished through a file browser, and
- selecting a dataset to initiate loading.
Then MRML files, Xcede catalog files, or individual datasets are loaded from local disk.
For loading remote datasets, the following options are available:
- break these two steps apart explicitly (easiest option),
- bind them together under the hood,
- or support both of these paradigms.
For now, we choose the first option.
Breaking apart "find data" and "load data":
Possible workflow A
- User downloads .xcat or .xml (MRML) file to disk using the HID or an XNAT web interface
- From the Load Scene file browser, user selects the .xcat or .xml archive. If no locally cached versions exist, each remote file listed in the archive is downloaded to /tmp directory (always locally cached) by the Data I/O Manager, and then cached (local) uri is passed to vtkMRMLStorageNode method when download is complete.
Possible workflow B
- User downloads .xcat or .xml (MRML) file to disk using the HID or an XNAT web interface
- From the Load Scene file browser, user selects the .xcat or .xml archive. If no locally cached versions exist, each remote file in the archive is downloaded to /tmp (only if a flag is set) by the Data IO Manager, and loaded directly into Slicer via a vtkMRMLStorageNode method when download is complete. (How does load work if we don't save to disk first?)
Workflow C
- describe batch processing example here, which includes saving to local or remote.
In each workflow, the data gets saved to disk first and then loaded into StorageNode or uploaded to remote location from cache.
What data do we need in an .xcat file?
For the fBIRN QueryAtlas use case, we need a combination of FreeSurfer morphology analysis and a FIPS analysis of the same subject. With the combined data in Slicer, we can view activation overlays co-registered to and overlayed onto the high resolution structural MRI using the FIPS analysis, and determine the names of brain regions where activations occur using the co-registered morphology analysis.
The required analyses including all derived data are in two standard directory structures on local disk, and *hopefully* somewhere on the HID within a standard structure (check with Burak). These directory trees contain a LOT of files we don't need... Below are the files we *do* need for fBIRN QueryAtlas use case.
FIPS analysis (.feat) directory and required data
For instance, the FIPS output directory in our example dataset from Doug Greve at MGH is called sirp-hp65-stc-to7-gam.feat. Under this directory, QueryAtlas needs the following datasets:
- sirp-hp65-stc-to7-gam.feat/reg/example_func.nii
- sirp-hp65-stc-to7-gam.feat/reg/freesurfer/anat2exf.register.dat
- sirp-hp65-stc-to7-gam.feat/stats/(all statistics files of interest)
- sirp-hp65-stc-to7-gam.feat/design.gif (this image relates statistics files to experimental conditions)
FreeSurfer analysis directory, and required data
For instance, the FreeSurfer morphology analysis directory in our example dataset from Doug Greve at MGH is called fbph2-000670986943. Under this directory, QueryAtlas needs the following datasets:
- fbph2-000670986943/mri/brain.mgz
- fbph2-000670986943/mri/aparc+aseg.mgz
- fbph2-000670986943/surf/lh.pial
- fbph2-000670986943/surf/rh.pial
- fbph2-000670986943/label/lh.aparc.annot
- fbph2-000670986943/label/rh.aparc.annot
What do we want HID webservices to provide?
- Question: are FIPS and FreeSurfer analyses (including QueryAtlas required files listed above) for subjects available on the HID yet? --Burak says not yet.
- Given that, can we manually upload an example .xcat and the datasets it points to the SRB, and download each dataset from the HID in Slicer, using some helper application (like curl)?
- (Eventually.) The BIRN HID webservices shouldn't really need to know the subset of data that QueryAtlas needs... maybe the web interface can take a BIRN ID and create a FIPS/FreeSurfer xcede catalog with all uris (http://....) in the FIPS and FreeSurfer directories, and package these into an Xcede catalog.
- (Eventually.) The catalog could be requested and downloaded from the HID web GUI, with a name like .xcat or .xcat.gzip or whatever. QueryAtlas could then open this file (or unzip and open) and filter for the relevant uris for an fBIRN or Qdec QueryAtlas session.
- Then, for each uri in a catalog (or .xml MRML file), we'll use (curl?) to download; so we need all datasets to be publicly readable.
- Can we create a directory (even a temporary one) on the SRB/BWH HID for Slicer data uploads?
- We need some kind of upload service, a function call that takes a dataset and a BIRNID and uploads data to appropriate remote directory.