Difference between revisions of "ITK Registration Optimization"
From NAMIC Wiki
Line 40: | Line 40: | ||
#* Solution presented by the authors is closely related to the changes being made in ITK | #* Solution presented by the authors is closely related to the changes being made in ITK | ||
− | = | + | = Publications = |
# [http://insight-journal.org/InsightJournalManager/view_reviews.php?pubid=172 Aylward, Stephen; Jomier, Julien; Barre, Sebastien; Davis, Brad; Ibanez, Luis, "Optimizing ITK’s Registration Methods for Multi-processor, Shared-Memory Systems." MICCAI Open Source and Open Data Workshop, 2007] [http://insight-journal.org/InsightJournalManager/download_publication.php?pubid=172&revision=2&name=OptimizingITKRegistrationMethods.pdf&pdf=1 (Download PDF)] | # [http://insight-journal.org/InsightJournalManager/view_reviews.php?pubid=172 Aylward, Stephen; Jomier, Julien; Barre, Sebastien; Davis, Brad; Ibanez, Luis, "Optimizing ITK’s Registration Methods for Multi-processor, Shared-Memory Systems." MICCAI Open Source and Open Data Workshop, 2007] [http://insight-journal.org/InsightJournalManager/download_publication.php?pubid=172&revision=2&name=OptimizingITKRegistrationMethods.pdf&pdf=1 (Download PDF)] | ||
− | |||
= Quick Links = | = Quick Links = |
Revision as of 23:13, 24 October 2007
Home < ITK Registration OptimizationContents
Summary
Goals
There are two components to this research
- Identify registration algorithms that are suitable for non-rigid registration problems that are endemic to NA-MIC
- Develop implementations of those algorithms that take advantage of multi-core and multi-processor hardware
Steps involved
- Modify ITK's registration framework to support oriented images
- Modify ITK's registration framework to be thread safe
- Develop multi-threaded versions of select registration modules
- Make everything backward compatible with ITK's existing registration methods and framework
- Deliver in ITK
Target date for these deliverables: Jan 1, 2008
Follow-on work
- Deliver b-spline deformable registration using LBFSGB optimizer and Mattes MI metric as a multi-threaded Slicer modules
Target date for the follow-on work: Jan 15, 2008
Status and News
- Have developed mult-threaded registration metrics in ITK
- Lead to the discovery that ITK's registration framework was not thread safe.
- Making ITK's registration framework thread safe is conceptually a bug fix for ITK.
- The incomplete implementation of oriented images in ITK has greatly extended the time and effort needed for this project.
- Ultimately, all of ITK and the ITK community will benefit.
- Weekly tcons, Monday, 10am
- Luis Ibanez, Matt Turek, Stephen Aylward
- Active proposal to the ITK community:
- Project plan
- IJ article on oriented images and registration in ITK
- http://www.insight-journal.org/dspace/bitstream/1926/1293/2/Brooks_Arbel_FastOrientedImage_V1.pdf
- Solution presented by the authors is closely related to the changes being made in ITK
Publications
- Aylward, Stephen; Jomier, Julien; Barre, Sebastien; Davis, Brad; Ibanez, Luis, "Optimizing ITK’s Registration Methods for Multi-processor, Shared-Memory Systems." MICCAI Open Source and Open Data Workshop, 2007 (Download PDF)
Quick Links
- Dashboard for this project
- Dashboard for BatchMake
- Batchboard (nightly experiment results) for this project
- BWH Neuroimaging Analysis Center (NAC), 2007-2008: Grid Enabled ITK
Algorithmic Requirements and Use Cases
- Requirements
- relatively robust, with few parameters to tweak
- runs on grey scale images
- has already been published
- relatively fast (ideally speaking a few minutes for volume to volume).
- not patented
- can be implemented in ITK and parallelized.
- Use-cases
- Intersubject mapping
- Example data set (Kilian)
- fMRI to hi-res brain morphology mapping
- Example data set (Steve Pieper)
- DTI: components of the diffusion tensor
- Example data (Sylvain)
- Intersubject mapping
Hardware Platform Requirements and Use Cases
- Requirements
- Shared memory
- Single and multi-core machines
- Single and multi-processor machines
- AMD and Intel - Windows, Linux, and SunOS
- Use-cases
- Intel Core2Duo
- Intel quad-core Xeon processors, Visual Studio 8, Windows Vista (Kitware: redwall)
- 6 CPU Sun, Solaris 8 (SPL: vision)
- 12 CPU Sun, Solaris 8 (SPL: forest and ocean)
- 16 core Opteron (SPL: john, ringo, paul, george)
- 16 core, Sun Fire, AMDOpteron (UNC: Styner)
Data
- Now distributed with CVS
Workplan
Establish testing and reporting infrastructure
- Identify timing tools
- Cross platform and multi-threaded
- Timing and profiling
- Develop performance dashboard for collecting results
- Each test will report time and accuracy to a central server
- The performance of a test, over time, for a given platform can be viewed on one page
- The performance of a set of tests, at one point in time, for all platforms can be viewed on one page
Develop tests
- Develop modular tests
- Develop complete registration solutions for use cases
ITK Optimization
- Target bottlenecks
- Multi-thread metric calculation
- Initial target is MattesMutualInformationImageToImageMetric
- Optimize code
- Sacrifice some memory and algorithm initialization speed to gain algorithm operation speed increases
- Call multi-threaded functions when possible
- Multi-thread metric calculation
- Integrate metrics with transforms and interpolators for tailored performance
Example Results: MattesMutualInformationImageToImageMetric
Example of Optimizations Employed
- GetValue
- Added multi-threading to GetValue function
- Partitions the samples - thereby distributes the computation of the transforms and interpolations across threads
- Added the pre-computation of the FixedImageMarginalPDF for the sample to reduce the need for the thread mutex lock
- Required the concept of an AdjustedFixedImageMarginalPDF that is updated when a fixed image voxel does not map into the moving image and thereby isn't valid for the current computations. By only updating when samples are missed, mutex lock to update a cross-thread data structure is needed less often.
- Each thread now has its own copy of the joinPDF. After threads complete, jointPDFs from each thread are summed. This eliminates mutex from the main loop over samples.
- Added multi-threading to GetValue function
Results
- Speedup on a dual-core system is about 30% (reduction in computation time) when using linear transform and linear interpolation and about 45% when using bspline transform and bspline interpolation.
Events
- April 6, 2007: TCon
- April 12, 2007: TCon
- April 18, 2007: TCon
- May 1, 2007: TCon
- June 27, 2007: NAMIC Programmers' Week
Related Pages
- Non Rigid Registration
- Slicer3:Performance_Analysis
- User:Barre/ITK Registration Optimization
- Testing and ITK Backward Forward Compatibility
Performance Measurement
- LTProf - simple profilter for Windows - Shareware
- Intel's VTune for Linux ($)
- TAU
- Threadmon: Thread usage/blockage
- TotalView ($)
- PerfSuite (POSIX Threads)
- GProf work-around for multi-threaded apps
- References on multi-threaded profiling and code optimization