Difference between revisions of "CTSC Ellen Grant, CHB"
Line 121: | Line 121: | ||
Rudolph has own MatLAB script that can do batch anonymization -- but if possible XNAT package should probably provide a means for that. | Rudolph has own MatLAB script that can do batch anonymization -- but if possible XNAT package should probably provide a means for that. | ||
− | === | + | === Draft approach to uploading data for Rudolph === |
* Create a Project on the webGUI | * Create a Project on the webGUI | ||
* Write a webservices-client script that will batch: | * Write a webservices-client script that will batch: |
Revision as of 14:53, 3 September 2009
Home < CTSC Ellen Grant, CHBBack to CTSC Imaging Informatics Initiative
Contents
Mission
Use-Case Goals
We will approach this use-case in three distinct steps, including Basic Data Management, Query Formulation and Processing Support.
- Step 1: Data Management
- Step 1a.: Describe and upload retrospective datasets (roughly 1 terabyte) onto the CHB XNAT instance and confirm appropriate organization and naming scheme via web GUI.
- Step 1b.: Describe and upload new acquisitions as part of data management process.
- Step 2: Query Formulation
- making specific queries using XNAT web services,
- data download conforming to specific naming convention and directory structure, using XNAT web services
- ensure all queries required to support processing workflow are working.
- Step 3: Data Processing
- Implement & execute the script-driven tractography workflow using web services,
- describe and upload results.
- ensure results are appropriately structured and named in repository, and queriable via web GUI and web services.
Participants
- sites involved: MGH NMR center, MGH Radiology, CHB Radiology
- number of users: ~10
- PI: Ellen Grant
- staff: Rudolph Pienaar
- clinicians
- IT staff
Outcome Metrics
Step 1: Data Management
- Visual confirmation (via web GUI) that all data is present, organized and named appropriately
- other?
Step 2: Query Formulation
- Successful tests that responses to XNAT queries for all MRIDs given a protocol name match results returned from currently-used search on the local filesystem.
- Query/Response should be efficient
Step 3: Data Processing
- Pipeline executes correctly
- Pipeline execution not substantially longer than when all data is housed locally
- other?
Overall
- Local disk space saved?
- Data management more efficient?
- Data management errors reduced?
- Barriers to sharing data lowered?
- Processing time reduced?
- User experience improved?
Fundamental Requirements
- System must be accessible 24/7
- System must be redundant (no data loss)
- Need a better client than current web GUI provides:
- faster
- PACS-like interface.
- image viewer should open in SAME window (not pop up a new)
- number of clicks to get to image view should be as few as possible.
Outstanding Questions
Plans for improving web GUI?
Data
Retrospective data consists of ~1787 studies, ~1TB total. Data consists of
- MR data, DICOM format
- Demographics from DICOM headers
- Subsequent processsing generates ".trk" files
- ascii text files ".txt"
- files that contain protocol information
Workflows
Current Data Management Process
DICOM raw images are produced at radiology PACS at MGH, and are manually pushed to the PACS hosted on KAOS resided at MGH NMR center. The images are processed by a set of PERL scripts to be renamed and re-organized into a file structure where all images for a study are saved into a directory named for the study. DICOM images are currently viewed with Osiris on Macs.
Target Data Management Process (Step 1)
Step 1: Develop an Image Management System for BDC (IMS4BDC) with which at least the following can be done:
- Move images from MGH (KAOS) to a BDC machine at Children's
- Step 1a: Import legacy data into IMS4BDC from existing file structure and CDs
- Step 1b: Write scripts to execute upload of newly acquired data.
Target Query Formulation (Step 2)
Step 2. Develop Query capabilities using scripted client calls to XNAT web services, such as:
Show all subjectIDs scanned with protocol_name = ProtocolName Show all diffusion studies where patients ages are < 6
- Scripting capabilities: Scripts need to query and download data into appropriate directory structure, and support appropriate naming scheme to be compatible with existing processing workflow.
Target Processing Workflow (Step 3)
Step 3:
- Execute query/download scripts
- Run processing locally, on cluster, etc.
- Describe & upload processing results
- (eventually want to) Share images with clinical physicians
- (eventually want to) Export post-processed data back to clinical PACS
Fitting Data to XNAT Data Model
Test data from Rudolph
I think we have:
- MRID = SubjectID (1687 subjects?)
- each subject may have single experiment, but multiple MRSessions within that experiment
- each session directory = MRSessionID
- each scan listed = Scan in the MRsession
- important metadata contained in dicom headers, and in the .toc file in each session directory.
As regards anonymization
Rudolph doesn't specifically need XNAT to do the anonymization. Wants XNAT to contain all relevant data and where/if necessary export/transmit DICOM data anonymized.
Rudolph has own MatLAB script that can do batch anonymization -- but if possible XNAT package should probably provide a means for that.
Draft approach to uploading data for Rudolph
- Create a Project on the webGUI
- Write a webservices-client script that will batch:
- create subject
- create experiment for subject
- create mrsession for subject
- for each scan in mrsession
- do dicom markup
- add other metadata from toc.txt and *.log files
- upload scan data into db
The top level directory contains
- a dcm_MRID.log file that contains a mapping between MRID's (PatientIDs?) and unique MRSessionNames
- a dcm_MRID_age.log file that maps MRID's to ages in months and years
- a dcm_MRID_age_days.log file tha tmaps MRID's to ages in days
- subdirectories named for MRSessions.
- each subdirectory contains a toc.txt file that includes patient and session information and a list of scans and scan types.
See examples below:
Questions sent to Rudolph about test data:
First, in the top-level dir, there are three log files: dcm_MRID.log dcm_MRID_age.log dcm_MRID_ageDays.log
1. do these files contain the MRIDs for *all* subjects in the entire retrospective study?
2. Some MRIDs appear to be purely numerical, and some alphanumerical. (3_S_658300). Is that correct?
3. Two age files, one contains age in months or years (1687 entries) -- the other contains age in days (1687 entries):
- does this mean there are 1687 MRID's in total?
- what does age (days) = -1 mean?
4. And the two dataset directories you shared:
Avanto-26039-20080130-134825-078000/
GENESIS_SIGNA-000000000000234-20041122-211850/
each directory contain data and a .toc file that includes:
- the "PatientID" is this equivalent to MRID?
- and some other info including age, scan date, etc.
- the filenames and scan types of a *set* of scans:
- collected in one MRsession on that scan date?
- or in the entire retrospective project?
- and is all the data for the set of scans listed contained in this directory?
Other Information
Rudolph has worked with XNAT support group at Harvard.
- 7/25/09 - Rudolph and Wendy are beginning experiments to upload representative data and metadata to CHB's XNAT instance.