Difference between revisions of "ITK Registration Optimization"
From NAMIC Wiki
Line 88: | Line 88: | ||
** Multi-thread metric calculation | ** Multi-thread metric calculation | ||
** Integrate metrics with transforms and interpolators for tailored performance | ** Integrate metrics with transforms and interpolators for tailored performance | ||
+ | *# MattesMutualInformationImageToImageMetric | ||
+ | {| border="1" | ||
+ | |- bgcolor="#abcdef" | ||
+ | ! Time in self !! Time in subfuncs !! Function | ||
+ | |- | ||
+ | |"0.00"||"60.54"||"__tmainCRTStartup" | ||
+ | |- | ||
+ | |0.00||34.04||main" | ||
+ | |- | ||
+ | |0.00||21.39||itk::CheckerBoardImageSource<itk::Image<float,3> >::GenerateData" | ||
+ | |- | ||
+ | |16.52||16.57||floor ?" | ||
+ | |- | ||
+ | |0.00||13.55||itk::OptMattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::GetDerivative" | ||
+ | |- | ||
+ | |0.00||12.95||itk::ImageSource<itk::Image<float,3> >::ThreaderCallback" | ||
+ | |- | ||
+ | |0.00||12.95||itk::MattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::GetDerivative" | ||
+ | |- | ||
+ | |0.30||11.45||itk::CentralDifferenceImageFunction<itk::Image<float,3>,double>::Evaluate" | ||
+ | |- | ||
+ | |8.71||8.73||itk::CentralDifferenceImageFunction<itk::Image<float,3>,double>::EvaluateAtIndex" | ||
+ | |- | ||
+ | |2.70||8.43||itk::OptMattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::GetValueAndDerivative" | ||
+ | |- | ||
+ | |7.51||7.53||itk::BSplineKernelFunction<3>::Evaluate" | ||
+ | |- | ||
+ | |3.30||7.53||itk::MattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::GetValueAndDerivative" | ||
+ | |- | ||
+ | |0.00||6.63||endthreadex ?" | ||
+ | |- | ||
+ | |6.61||6.63||itk::StatisticsImageFilter<itk::Image<float,3> >::ThreadedGenerateData" | ||
+ | |- | ||
+ | |3.30||4.82||itk::CheckerBoardSpatialFunction<double,3,itk::Point<double,3> >::Evaluate" | ||
+ | |- | ||
+ | |4.50||4.52||itk::OptMattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::ComputePDFDerivatives" | ||
+ | |- | ||
+ | |3.90||3.92||itk::NearestNeighborInterpolateImageFunction<itk::Image<float,3>,double>::EvaluateAtContinuousIndex" | ||
+ | |- | ||
+ | |3.60||3.61||_ftol2_pentium4" | ||
+ | |- | ||
+ | |3.60||3.61||itk::BSplineKernelFunction<2>::Evaluate [1]" | ||
+ | |- | ||
+ | |1.80||3.01||itk::OptMattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::GetValue" | ||
+ | |- | ||
+ | |0.90||2.41||itk::MattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::GetValue" | ||
+ | |- | ||
+ | |2.40||2.41||itk::ProgressReporter::CompletedPixel" | ||
+ | |- | ||
+ | |2.40||2.41||itk::ShiftScaleImageFilter<itk::Image<float,3>,itk::Image<float,3> >::ThreadedGenerateData" | ||
+ | |- | ||
+ | |2.10||2.11||itk::ImageFunction<itk::Image<float,3>,double,double>::IsInsideBuffer" | ||
+ | |- | ||
+ | |1.80||1.81||itk::BSplineDerivativeKernelFunction<3>::Evaluate" | ||
+ | |- | ||
+ | |1.20||1.81||itk::ImageFunction<itk::Image<float,3>,itk::CovariantVector<double,3>,double>::ConvertContinuousIndexToNearestIndex" | ||
+ | |- | ||
+ | |1.50||1.51||itk::BSplineKernelFunction<2>::Evaluate" | ||
+ | |- | ||
+ | |0.00||1.51||thunk@40316b ?" | ||
+ | |- | ||
+ | |1.20||1.20||itk::InterpolateImageFunction<itk::Image<float,3>,double>::Evaluate" | ||
+ | |- | ||
+ | |0.90||1.20||itk::MattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::Initialize" | ||
+ | |- | ||
+ | |0.60||1.20||itk::OptMattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::Initialize" | ||
+ | |- | ||
+ | |0.90||0.90||itk::ImageBase<3>::GetSpacing" | ||
+ | |- | ||
+ | |0.90||0.90||itk::ImageFunction<itk::Image<float,3>,double,double>::ConvertContinuousIndexToNearestIndex" | ||
+ | |- | ||
+ | |0.90||0.90||itk::MattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::ComputePDFDerivatives" | ||
+ | |- | ||
+ | |0.90||0.90||itk::MattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::TransformPoint" | ||
+ | |- | ||
+ | |0.90||0.90||itk::OptMattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::TransformPoint" | ||
+ | |- | ||
+ | |} | ||
+ | |||
= Modular tests = | = Modular tests = | ||
Line 125: | Line 204: | ||
# MutualInformationHistogramMetricTest | # MutualInformationHistogramMetricTest | ||
# NormaalizedMutualInformationHistogramMetricTest | # NormaalizedMutualInformationHistogramMetricTest | ||
+ | |||
+ | Notes | ||
+ | * MattesMutualInformationMetric defaults to BSpline interpolator - above tests override to instead use nearest neighbor interpolation | ||
= Related Pages = | = Related Pages = | ||
Line 133: | Line 215: | ||
= Performance Measurement = | = Performance Measurement = | ||
+ | * [http://www.lw-tech.com/index.php LTProf - simple profilter for Windows - Shareware] | ||
* [http://www.intel.com/cd/software/products/asmo-na/eng/vtune/vlin/239145.htm Intel's VTune for Linux] ($) | * [http://www.intel.com/cd/software/products/asmo-na/eng/vtune/vlin/239145.htm Intel's VTune for Linux] ($) | ||
* [http://www.cs.uoregon.edu/research/tau/home.php TAU] | * [http://www.cs.uoregon.edu/research/tau/home.php TAU] |
Revision as of 20:45, 1 April 2007
Home < ITK Registration OptimizationGoals
There are two components to this research
- Identify registration algorithms that are suitable for non-rigid registration problems that are indemic to NA-MIC
- Develop implementations of those algorithms that take advantage of multi-core and multi-processor hardware.
Algorithmic Requirements and Use Cases
- Requirements
- relatively robust, with few parameters to tweak
- runs on grey scale images
- has already been published
- relatively fast (ideally speaking a few minutes for volume to volume).
- not patented
- can be implemented in ITK and parallelized.
- Use-cases
- Intersubject mapping
- Example data set (Kilian)
- fMRI to hi-res brain morphology mapping
- Example data set (Steve Pieper)
- DTI: components of the diffusion tensor
- Example data (Sylvain)
- Intersubject mapping
Hardware Platform Requirements and Use Cases
- Requirements
- Shared memory
- Single and multi-core machines
- Single and multi-processor machines
- AMD and Intel - Windows, Linux, and SunOS
- Use-cases
- Intel Core2Duo
- Intel quad-core Xeon processors (?)
- 6 CPU Sun, Solaris 8 (SPL: vision)
- 12 CPU Sun, Solaris 8 (SPL: forest and ocean)
- 16 core Opteron (SPL: john, ringo, paul, george)
- 16 core, Sun Fire, AMDOpteron (UNC: Styner)
Data
Workplan
Establish testing and reporting infrastructure
- Identify timing tools
- Cross platform and multi-threaded
- Timing and profiling
- Status
- Instrumenting modular tests
- Extending itk's cross-platform high precision timer
- Adding thread affinity to ensure valid timings
- Adding method for increasing process priority
- Profiling complete registration solutions for use cases
- Using CacheGrind on single and multi-core linux systems
- Instrumenting modular tests
- Develop performance dashboard for collecting results
- Each test will report time and accuracy to a central server
- The performance of a test, over time, for a given platform can be viewed on one page
- The performance of a set of tests, at one point in time, for all platforms can be viewed on one page
- Status
- BatchMake database communication code being isolated
- Performance dashboard web pages being designed
Develop tests
- Develop modular tests
- Status
- Developed itkCheckerboardImageSource so no IO required
- Developing tests as listed in the "Modular Tests" section below
- Status
- Develop C-style tests
- Tests should represent the non-ITK way of doing image analysis
- Use standard C/C++ arrays and pointers to access blocks of memory as images
- Tests should represent the non-ITK way of doing image analysis
- Develop complete registration solutions for use cases
- Status
- Centralized data and provide easy access
- Identified relevant registration algorithms
- rigid, affine, bspline, multi-level bspline, and Demons'
- normalized mutual information, mean squared difference, and cross correlation
- Developing traditional ITK-style implementations
- Status
Compute performance on target platforms
- Ongoing
Optimize bottlenecks
- Target bottlenecks
- Use random, sub-sampling iterator in mean squared difference and cross correlation
- Multi-thread metric calculation
- Integrate metrics with transforms and interpolators for tailored performance
- MattesMutualInformationImageToImageMetric
Time in self | Time in subfuncs | Function |
---|---|---|
"0.00" | "60.54" | "__tmainCRTStartup" |
0.00 | 34.04 | main" |
0.00 | 21.39 | itk::CheckerBoardImageSource<itk::Image<float,3> >::GenerateData" |
16.52 | 16.57 | floor ?" |
0.00 | 13.55 | itk::OptMattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::GetDerivative" |
0.00 | 12.95 | itk::ImageSource<itk::Image<float,3> >::ThreaderCallback" |
0.00 | 12.95 | itk::MattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::GetDerivative" |
0.30 | 11.45 | itk::CentralDifferenceImageFunction<itk::Image<float,3>,double>::Evaluate" |
8.71 | 8.73 | itk::CentralDifferenceImageFunction<itk::Image<float,3>,double>::EvaluateAtIndex" |
2.70 | 8.43 | itk::OptMattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::GetValueAndDerivative" |
7.51 | 7.53 | itk::BSplineKernelFunction<3>::Evaluate" |
3.30 | 7.53 | itk::MattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::GetValueAndDerivative" |
0.00 | 6.63 | endthreadex ?" |
6.61 | 6.63 | itk::StatisticsImageFilter<itk::Image<float,3> >::ThreadedGenerateData" |
3.30 | 4.82 | itk::CheckerBoardSpatialFunction<double,3,itk::Point<double,3> >::Evaluate" |
4.50 | 4.52 | itk::OptMattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::ComputePDFDerivatives" |
3.90 | 3.92 | itk::NearestNeighborInterpolateImageFunction<itk::Image<float,3>,double>::EvaluateAtContinuousIndex" |
3.60 | 3.61 | _ftol2_pentium4" |
3.60 | 3.61 | itk::BSplineKernelFunction<2>::Evaluate [1]" |
1.80 | 3.01 | itk::OptMattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::GetValue" |
0.90 | 2.41 | itk::MattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::GetValue" |
2.40 | 2.41 | itk::ProgressReporter::CompletedPixel" |
2.40 | 2.41 | itk::ShiftScaleImageFilter<itk::Image<float,3>,itk::Image<float,3> >::ThreadedGenerateData" |
2.10 | 2.11 | itk::ImageFunction<itk::Image<float,3>,double,double>::IsInsideBuffer" |
1.80 | 1.81 | itk::BSplineDerivativeKernelFunction<3>::Evaluate" |
1.20 | 1.81 | itk::ImageFunction<itk::Image<float,3>,itk::CovariantVector<double,3>,double>::ConvertContinuousIndexToNearestIndex" |
1.50 | 1.51 | itk::BSplineKernelFunction<2>::Evaluate" |
0.00 | 1.51 | thunk@40316b ?" |
1.20 | 1.20 | itk::InterpolateImageFunction<itk::Image<float,3>,double>::Evaluate" |
0.90 | 1.20 | itk::MattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::Initialize" |
0.60 | 1.20 | itk::OptMattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::Initialize" |
0.90 | 0.90 | itk::ImageBase<3>::GetSpacing" |
0.90 | 0.90 | itk::ImageFunction<itk::Image<float,3>,double,double>::ConvertContinuousIndexToNearestIndex" |
0.90 | 0.90 | itk::MattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::ComputePDFDerivatives" |
0.90 | 0.90 | itk::MattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::TransformPoint" |
0.90 | 0.90 | itk::OptMattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::TransformPoint" |
Modular tests
All tests send two values to performance dashboards
- the time required
- an measure of the error (0 = no error; 1 = 100% error)
Tests being developed and their parameter spaces
- LinearInterpTest <numThreads> <dimSize> <factor> [<outputImage>]
- NumThreads = 1, 2, 4, and #OfCoresIf>4
- DimSize = 100, 200 (i.e., 100^3 and 200^3 images)
- Factor = 2, 3 (i.e., producing up to 600^3 images)
- = 16 tests (approx time on Core2Duo for these tests = 1 minute)
- BSplineInterpTest <numThreads> <dimSize> <factor> <bSplineOrder> [<outputImage>]
- NumThreads = 1, 2, 4, and #OfCoresIf>4 (for every platform)
- DimSize = 100, 200 (i.e., 100^3 and 200^3 images)
- Factor = 2, 3 (i.e., producing up to 600^3 images)
- bSplineOrder = 3
- = 16 tests (approx time on Core2Duo for these tests = 10 minute)
- SincInterpTest <numThreads> <dimSize> <factor> [<outputImage>]
- Uses the Welch window function
- NumThreads = 1, 2, 4, and #OfCoresIf>4 (for every platform)
- DimSize = 100, 200 (i.e., 100^3 and 200^3 images)
- Factor = 2, 3 (i.e., producing up to 600^3 images)
- = 16 tests (approx time on Core2Duo for these tests = 30 minute)
- BSplineTransformLinearInterpTest <numThreads> <dimSize> <numNodesPerDim> <bSplineOrder> [<outputImage>]
- 3 nodes are also added outside of the image for interpolation
- MeanReciprocalSquaredDifferenceMetricTest
- MeanSquaresMetricTest
- NormalizedCorreltationMetricTest
- GradientDifferentMetricTest
- MattesMutualInformationMetricTest
- MutualInformationMetricTest
- NormalizedMutualInformationMetricTest
- MutualInformationHistogramMetricTest
- NormaalizedMutualInformationHistogramMetricTest
Notes
- MattesMutualInformationMetric defaults to BSpline interpolator - above tests override to instead use nearest neighbor interpolation
Related Pages
Performance Measurement
- LTProf - simple profilter for Windows - Shareware
- Intel's VTune for Linux ($)
- TAU
- Threadmon: Thread usage/blockage
- TotalView ($)
- PerfSuite (POSIX Threads)
- GProf work-around for multi-threaded apps
- References on multi-threaded profiling and code optimization