AHM 2006:ProjectsSlicerDataModel
Contents
- 1 Project
- 2 Open Questions from Programmer's Week Discussions
- 3 Goals
- 4 Requirements
- 5 DataModel API
- 6 XML versus SQL
- 7 SQL Options
- 8 API Issues: Strawman Answers
- 9 Path-based MRML3 proposal
- 10 Current Status
- 11 Current Tools
- 12 Test Data
- 13 Team Members
- 14 Slides
Project
Designing a Data Centric Model for Slicer 3.
See also Feature set description.
Open Questions from Programmer's Week Discussions
- Syntax of the factory for itk (is the extra layer needed?) - Jim Miller
- Keeping the VTK and ITK factory syntax parallel
- How can developers add new data types to mrml? - Lauren O'Donnell
- Current slicer supports the idea of modules having their own data types
- Implementation is difficult and not well documented.
- Who will be doing what? (Alex, Xiaodong, Mathieu to allocate time and effort)
- Synatax and tie-in to Mike's MRML Path
Goals
Design and Implement a prototype of a Data Model server for Slicer 3
- It should represent a scene graph
- It should compute and return Transforms between objects in the scene graph.
- It should be suitable for Image Guided Surgery. This may require it to be compatible with Real-Time OS
- It should return datasets (image data)
- It should return surface models (vtkPolydata?)
Requirements
- It must work as a service
- It must be accessible from Batch programs as well as GUI programs
- It must be computationally efficient
- It must be multi-platform
- It must be memory efficient
Use Cases
Some of these use cases were taken from the Slicer requirements for IGS applications
Slicer 3 and IGSTK integration (Nobuhiko Hata, Luis Ibanez, Patrick Cheng)
The Data Model will act as a service that offers to clients the options of
- Storing data along with tags (MetaData Identifiers)
- Retrieving data using the Identifiers
- Modifying data in place
Current Use Cases
The basic Data Model in Slicer supports instances as,
- Volumes
- Scalar Types
- Label Maps (segmentation result)
- Reference to Lookup Table
- Models
- Named Field Data (scalars, vectors, labels) at points and cells (FreeSurferReaders)
- Color, Clipping State, Visibility, Scalar Visibility, LookupTable
- Transforms*
- Fiducials, Fiducial Lists
- Name, Label#, Diffuse/Ambient/Specular.
The Data Model API in Slicer allows adding, deleting, reading, and modifying medical image data types (Volumes, Models, Transforms, Fiducials, etc).
Use Cases to Add
In addition to the Data Model provided by Slicer, we will develop additional instances required uniquely for RFA.
- State information
- Transformation matrix for CT-to-patinet registration in the tracker’s coordinate system
- Predicted error from the CT-to-patient registration
- Locations of tracker attached to the RFA applicator and US transducer
- Transformation matrix from calibration of tracker to the US image coordinate system.
- Magnitude and gain of the US imager in the last state of the imaging.
- Location of fiducial markers
Strawman "Hello MRML" programs
No client server, just read an xml file
main () { vtkMrmlTree *mrml = vtkMrmlTree::New(); mrml->Connect("file://data.xml"); mrml->PrintSelf(); mrml->Delete(); }
Connect to a server, modify, commit
main () { vtkMrmlTree *mrml = vtkMrmlTree::New(); mrml->Connect("mrml://mrml.na-mic.org/data"); vtkMrmlTransformNode *trans = vtkMrmlTransformNode::New(); mrml->AddNode(trans); mrml->Commit(); trans->Delete(); mrml->Delete(); }
Open mrml file, run a vtk filter, save new file. This example use separate mrml, and vtkmrml libraries.
#include "mrml.h" #include "vtkmrml.h" main () { // get mrml tree mrml::Tree *mrml = mrml::Tree::New(); mrml->Connect("file://data.xml"); // get input image in vtk format mrml::VolumeNode *volNode = mrml->GetNthVolume(0); vtkmrml::VolumeData *inData = vtkmrml::VolumeData::New(); inData->SetSourceNode(volNode); vtkImageData *imgData = inData->GetImageData(); // converts data from internal format to vtk // vtk pipeline vtkImageGaussianSmooth *igs = vtkImageGaussianSmooth::New(); igs->SetInput(imgData); igs->GetOutput()->Update(); // put output volume in a new mrml volume node mrml::VolumeNode *volNodeOut = mrml::VolumeNode::New(); vtkmrml::VolumeData *outData = vtkmrml::VolumeData::New(); outData->SetTargetNode(volNodeOut); outData->SetSourceImage(igs->GetOutput()); outData->Update(); // converts data fom vtkImage into internal format // add node to the mrml tree mrml->AddNode(vol); // save new file mrml->Save("file://data1.xml"); igs->Delete(); mrml->Delete(); // Do we need this? vtk style or smartPointers? inData->Delete(); // Do we need this? vtk style or smartPointers? outData->Delete(); // Do we need this? vtk style or smartPointers? volNodeOut->Delete(); // Do we need this? vtk style or smartPointers? }
Connect to a server, run an ITK filter, commit
main () { vtkMrmlTree *mrml = vtkMrmlTree::New(); mrml->Connect("mrml://mrml.na-mic.org/data"); vtkMrmlVolumeNode *vol = mrml->GetNthVolume(0); typedef itk::NormalizeImageFilter<<float,3>,<float,3>> ImageFilterType; ImageFilterType::Pointer norm = ImageFilterType::New(); norm->SetInput(vol->GetITKDataF()); norm->GetOutput()->Update(); vtkMrmlVolumeNode *vol = vtkMrmlVolumeNode::New(); vol->SetITKDataF(norm->GetOutput()); mrml->AddNode(vol); mrml->Commit(); vol->Delete(); mrml->Delete(); }
ITK Style
using namespaces and ITK idiom
main () { Mrml::Tree::Pointer mrml = Mrml::Tree::New(); mrml->Connect("mrml://mrml.na-mic.org/data"); typedef itk::Image<float,3> ImageType; // itkmrml knows about both itk and mrml typedef itkmrml::VolumeData<ImageType> VolumeDataFactoryType; VolumeDataFactoryType::Pointer factory = VolumeDataFactoryType::New(); factory->SetSource(mrml->GetNthVolume(0)); if ( !factory->CanTranslate() ) return; typedef itk::NormalizeImageFilter<ImageType,ImageType> ImageFilterType; ImageFilterType::Pointer norm = ImageFilterType::New(); norm->SetInput(factory->GetImage()); norm->GetOutput()->Update(); // this pulls mrml data into itk::Image VolumeDataFactoryType::Pointer outfactory = VolumeDataFactoryType::New(); mrml::VolumeNode outvol = mrml::VolumeNode::New(); outfactory->SetImage(norm->GetOutput()); outfactory->SetTarget(outvol); outfactory->Update(); // this pushes itk::Image data into mrml mrml->AddNode(outvol); mrml->Commit(); }
DataModel API
This is an initial draft of interactions with the DataModel. Most of the entries were taken from the vtkMrmlTree class.
- dm->Connect("filename");
- dm->Connect("URL");
- dm->Commit();
- dm->Close();
- dm->InsertNode( node, "parent name", "node name");
- dm->GetNode("node name");
- dm->HasNode("node name");
- dm->GetNextNode(); ?? // shouldn't we rather have iterators ?
- dm->GetNthItem(); // useful for blind IO...?
- dm->Gets by Class():
- GetVolume()
- GetTransform()
- GetMatrix() ?? are these matrices representing Transforms ?
- GetColor()
- dm->ComputeTransforms();
- dm->ComputeRelativeTransform("node1 name","node2 name");
- dm->DeleteNode( node );
- dm->DeleteNode( "node name" );
- dm->Delete() : // Let's use vtkSmartPointers and avoid to need Delete()...
Node name stands for any type of Identification, it may be implemented in the form of an integer Id, or in the form of a string.
XML versus SQL
Our analysis seems to indicate that SQL and XML are possible solutions for the storage of the data on disk. We intent to implement an API that will talk to the storage implementation and that will hide it from the Slicer applications. In other words, slicer developers and slicer users should not need to know that there is an XML file or a SQL database underneath.
The following table summarizes the advantages and disadvantages of using XML versus SQL. There is also the option of combining both, if we find that each one alone does not provide all the features that we want for Slicer applications.
Feature | XML | SQL |
---|---|---|
Get element by an identifier | natural but need to be hierarchical | natural |
Insert element with an identifier | natural | natural |
Hierarchy navigation | natural | must be implemented with auxiliary table |
Resistant to power-down | No | Yes |
Support for large datasets | Yes | Yes |
Speed for access | to be measured | to be measured |
SQL Options
- Use a model similar to CORBA but with a customized minimal implementation
- Use a model similar to Microsoft Windows DataSet Features
- Use an SQL Database Server model
- Microsoft: http://msdn.microsoft.com/sql/
- Postgress: http://www.postgresql.org/
- MySQL: http://dev.mysql.com/
- SQLite: http://www.sqlite.org/
- MetaKit: http://www.equi4.com/
The implementation could be done using a unified approach for all the platforms, or it could be done by creating a common API, that then wraps to different local libraries in different platforms. For example, it could use MS-SQL in Windows, and MySQL in Unix, wrapping both of them in a common C++ API customized for the types for objects that Slicer would manage.
Matrix of current options
MS-Windows | Linux | Cygwin | Macintosh | Sun | SGI | License | Installation Burden | |
---|---|---|---|---|---|---|---|---|
MS-SQL | yes | no | no | no | no | no | EULA? | Medium (only Windows) |
PostgreSQL | yes | yes | yes | yes | yes | yes |
BSD (see details) |
Medium (requires root or home user build) |
MySQL | yes | yes | yes | yes | yes | yes |
GPL / Commercial (see details)(see some issues) |
Medium (requires root or home user build) |
CORBA* | yes | yes | yes | yes | yes | yes | ? | High (requires root and network setup) |
SQLite | yes | yes | yes | yes | yes | yes |
Public Domain (see) |
Low (built-in into the application) |
MetaKit | yes | yes | yes | yes | yes | yes |
X/MIT Style (see) |
Low (built-in into the application) |
CORBA would actually require a specific package to be tested per platform...
Current Option
SQLite
Features include: (from)
- Transactions are atomic, consistent, isolated, and durable (ACID) even after system crashes and power failures.
- Zero-configuration - no setup or administration needed.
- Implements most of SQL92. (Features not supported)
- A complete database is stored in a single disk file.
- Database files can be freely shared between machines with different byte orders.
- Supports databases up to 2 terabytes (241 bytes) in size.
- Sizes of strings and BLOBs limited only by available memory.
- Small code footprint: less than 250KiB fully configured or less than 150KiB with optional features omitted.
- Faster than popular client/server database engines for most common operations.
- Simple, easy to use API.
- TCL bindings included. Bindings for many other languages available separately.
- Well-commented source code with over 95% test coverage.
- Self-contained: no external dependencies.
- Sources are in the public domain. Use for any purpose.
Second Option
PostgreSQL DataBase
Features
- Allows connections via unix domain sockets and TCP/IP connections
- Has binding to PHP, C, Python, Perl, Tcl
- Size Limitations (taken from)
- Maximum size for a database? unlimited (32 TB databases exist)
- Maximum size for a table? 32 TB
- Maximum size for a row? 1.6TB
- Maximum size for a field? 1 GB (This is what we will map to one Image. If it becomes a limit we could store the image in Slices per field)
- Maximum number of rows in a table? unlimited
- Maximum number of columns in a table? 250-1600 depending on column types
- Maximum number of indexes on a table? unlimited
- Object Oriented Database: Fields can be customized object data structures.
- Supports Inheritance: on database can inherit properties from another one (details).
- Database server can be a remote machine or the local one
- This will support naturaly a client/server approach such as the one in ParaView
- Client applications can be very diverse in nature: a client
- Could be a text-oriented tool.
- A graphical application
- A web server that accesses the database to display web pages
- or a specialized database maintenance tool.
- The PostgreSQL server can handle multiple concurrent connections from clients.
- For that purpose it starts ("forks") a new process for each connection. From that point on, the client and the new server process communicate without intervention by the original postmaster process.
- Supported platforms (see)
- Native support for using SSL connections to encrypt client/server communications for increased security. This requires that OpenSSL is installed on both client and server systems and that support in PostgreSQL is enabled at build time
Third Option
MetaKit DataBase
http://www.equi4.com/mkoverview.html
Features
- Use your data on any platform. Both the code and datafiles are portable. All byte-ordering managed by the library.
- Complex datastructures in one file. Store multiple nested data structures, to create document-centric applications.
- Restructure datafiles, instantly. It restructure files on-the-fly, while open.
- Serialize all data for transport. Complementing commit/rollback of changes, data can also be serialized.
- Recover from system-failures. The use of Stable Storage ensures that files cannot be corrupted by crashes.
- Load on-demand, quick startup.Files are opened without reading data. Memory-mapped files if O/S supports it.
- Behaves like containers. The API mimics container classes. Quickly get sizes and iterate over rows.
- Wide range of operators built-in. Sorting, relational join / group by, set operations, permutations, hashing.
- 1-32 bits per int (or 64), variable-sized data. The largest int defines storage format. String/binary data is stored as var-sized.
- Create fully self-contained applications. Can be linked shared or statically, for hassle-free deployment of components.
- Tiny code (125 Kb as Win32 DLL). The library is extremely small, unused functions are stripped off in static links.
- Simple API, just 6 core classes. Only a small interface is exposed. One header file lists all the classes you need.
- Also use from Python and Tcl. These language bindings are coded to take advantage of the respective idioms.
API Issues: Strawman Answers
Bold: updates after tcon
- MRML Tree:
- It MrmlTree a true hierarchy or a list of nodes (as currently)?
- Should it be a real scene tree? Right now it's XML file image in memory which combines scene hierarchy and data persistance. No.
- If it's xml file image, do we use DOM, XPath for internal representation of xml file? No, use existing MRML hierarchy.
- Do we use SQL database to persist MRML tree and data? Do we use database to provide remote access to MRML trees and data in the client/server mode. No. Implementation of the internals of the data model will be hidden behind the API
- MRML nodes:
- How is the data accessed from Mrml Node, can we make it independent from vtk/itk types like vtkImageData and itk::Image<>?
- Metadata and vtk data should be separated to avoid redundancy. What metadata is stored in the new MrmlVolume, MrmlModel, etc. Can we use delegation from vtkImageData SetSpacing() etc. methods to avoid duplication? Explicit synchronize metadata methods between MRML node and vtk data. The metadata in the MRML nodes is the definitive version -- any platform-specific metadata is filled out by the factory that generates the structure
- What subset of general vtk vtkImageData and vtkPolyData is supported, multicomponent, tensors etc. Do we create special MRML nodes for tensors. Allow full vtk API for creating and manipulating vtkDataSet and vtkFieldData. Define a specific set of functionality to be represented by MRML -- do not rely on vtk
- How transformations are represented? Do we use new Coordianet System Manager? Yes. Yes Do we use MrmlGroup Node instead MrmlTransform with the new coordinate system manager defining transformations? Yes. Need a way to serialize coordinate systems.
- Support ITK Volumes in API? Only 3D vtk volumes. Define the volume types that MRML needs to support for NA-MIC needs (with ability to extend) ITK and VTK factories will be responsible for mapping them to the specific system.
- Coordinate Systems:
- How are Slicer internal coordinate systems (RAS, LPS, ijk, vtk) represented by the new coordinate system manager? Do we store RAStoIJK as part of MRML node metadata? Each RAStoRAS Transform is a MRML node. Coordinate System Manager has pointers to those transforms and internal RAStoIJK transforms of volumes and models.
- How are non-linear transforms represented? Do we support displacement fields, BSplines? From what coordinate system to what they transform? How are vectors, normals and tensors treated? All non-linear transforms are RAStoRAS. Need new vtk Transforms similar to ITK transforms. vtk transforms can be implemented with vtkITK wrappers
- Execution model:
- Need C++ classes for Aplication, Modules, Viewers. Move tcl global arrays to vtk collections. Move all visualization and application state and logic into it's own classes. Need C++ API for update loop, observers.
- Client/Server:
- Do we support the entire mrml API between client and server? Initially we support only ITK style ImageIO. Later full API?
- Do we use CORBA? Need to have stream based serialization for both Mrml nodes and Mrml data
- Do we use SQL database? Database has to support client/server mode for simulatbious read-write operations.
Path-based MRML3 proposal
Here's a proposal for a path-based MRML3 implementation, using ideas from the Coordinate Space Manager.
Current Status
- First C++ draft is in the sandbox
Current Tools
- Python prototype by Mike of the path-based XML layer:
- parses XML elements and overlays semantics of "path" and "ref" tags.
- pemote resource cache implemented
- mechanism for handling namespaced elements and attributes implemented
- reading files implemented
- writing 70% implemented (remaining issue: renaming resources while maintaining links)
- renaming resources 50% implemented (what happens when you want to move a remote resource like a URL into a local file?)
- C++ prototype by Luis
Test Data
N/A.
Team Members
- Mike Halle - BWH
- Alex Yarmarkovich - Isomics
- Xiaodong Tao - GE
- Luis Ibanez - Kitware
- Steve Pieper - Isomics