Difference between revisions of "ITK Registration Optimization"
From NAMIC Wiki
Line 54: | Line 54: | ||
## Compute performance on multiple platforms | ## Compute performance on multiple platforms | ||
− | = | + | = Benchmarks = |
− | # | + | All tests cout two values |
+ | * the time required | ||
+ | * an measure of the error (0 = no error; 1 = 100% error) | ||
+ | |||
+ | Tests being developed and suggested parameter settings | ||
+ | # LinearInterpTest <numThreads> <dimSize> <factor> [<outputImage>] | ||
+ | #* NumThreads = 1, 2, 4, and #OfCoresIf>4 | ||
+ | #* DimSize = 100, 200 (i.e., 100^3 and 200^3 images) | ||
+ | #* Factor = 1.5, 2, 3 (i.e., producing up to 600^3 images) | ||
+ | #* = 24 tests (approx time on dual-core for all tests = 1.5 minutes) | ||
+ | # BSplineInterpTest <numThreads> <dimSize> <factor> [<outputImage>] | ||
+ | #* NumThreads = 1, 2, 4, and #OfCoresIf>4 (for every platform) | ||
+ | #* DimSize = 100, 200 (meaning: 100^3 and 200^3 images) | ||
+ | #* Factor = 1.5, 2, 3 (thereby producing up to 600^3 images) | ||
+ | #* = 24 tests (approx time on dual-core for all tests = ??) | ||
+ | # SincInterpTest <numThreads> <dimSize> <factor> [<outputImage>] | ||
+ | # BSplineTransformLinearInterpTest <numThreads> <dimSize> <numNodesPerDim> [<outputImage>] | ||
+ | # MeanReciprocalSquaredDifferenceMetricTest | ||
+ | # MeanSquaresMetricTest | ||
+ | # NormalizedCorreltationMetricTest | ||
+ | # GradientDifferentMetricTest | ||
+ | # MattesMutualInformationMetricTest | ||
+ | # MutualInformationMetricTest | ||
+ | # NormalizedMutualInformationMetricTest | ||
+ | # MutualInformationHistogramMetricTest | ||
+ | # NormaalizedMutualInformationHistogramMetricTest | ||
= Related Pages = | = Related Pages = |
Revision as of 13:27, 30 March 2007
Home < ITK Registration OptimizationContents
Goals
There are two components to this research
- Identify registration algorithms that are suitable for non-rigid registration problems that are indemic to NA-MIC
- Develop implementations of those algorithms that take advantage of multi-core and multi-processor hardware.
Algorithmic Requirements and Use Cases
- Requirements
- relatively robust, with few parameters to tweak
- runs on grey scale images
- has already been published
- relatively fast (ideally speaking a few minutes for volume to volume).
- not patented
- can be implemented in ITK and parallelized.
- Use-cases
- Intersubject mapping example data set (Kilian)
- fMRI to hi-res brain morphology mapping example data set (Steve Pieper)
- DTI: components of the diffusion tensor DTI-non-rigid (Sylvain)
Hardware Platform Requirements and Use Cases
- Requirements
- Shared memory
- Single and multi-core machines
- Single and multi-processor machines
- AMD and Intel - Windows, Linux, and SunOS
- Use-cases
- Intel Core2Duo
- Intel quad-core Xeon processors (?)
- 6 CPU Sun, Solaris 8 (SPL: vision)
- 12 CPU Sun, Solaris 8 (SPL: forest and ocean)
- 16 core Opteron (SPL: john, ringo, paul, george)
- 16 core, Sun Fire, AMDOpteron (UNC: Styner)
Data
Workplan
- Quantify current performance and bottlenecks
- Identify timing tools (cross platform, multi-threaded)
- For each use-case
- Centralized data and provide easy access
- Identify relevant registration algorithm(s)
- Develop traditional ITK-style implementations
- Develop timing tests using implementations and data
- Across use-cases
- Identify ITK classes/functions common to implementations (e.g., interpolation/resampling)
- Develop timing tests specific to these common sub-classes
- Compute performance on multiple platforms
Benchmarks
All tests cout two values
- the time required
- an measure of the error (0 = no error; 1 = 100% error)
Tests being developed and suggested parameter settings
- LinearInterpTest <numThreads> <dimSize> <factor> [<outputImage>]
- NumThreads = 1, 2, 4, and #OfCoresIf>4
- DimSize = 100, 200 (i.e., 100^3 and 200^3 images)
- Factor = 1.5, 2, 3 (i.e., producing up to 600^3 images)
- = 24 tests (approx time on dual-core for all tests = 1.5 minutes)
- BSplineInterpTest <numThreads> <dimSize> <factor> [<outputImage>]
- NumThreads = 1, 2, 4, and #OfCoresIf>4 (for every platform)
- DimSize = 100, 200 (meaning: 100^3 and 200^3 images)
- Factor = 1.5, 2, 3 (thereby producing up to 600^3 images)
- = 24 tests (approx time on dual-core for all tests = ??)
- SincInterpTest <numThreads> <dimSize> <factor> [<outputImage>]
- BSplineTransformLinearInterpTest <numThreads> <dimSize> <numNodesPerDim> [<outputImage>]
- MeanReciprocalSquaredDifferenceMetricTest
- MeanSquaresMetricTest
- NormalizedCorreltationMetricTest
- GradientDifferentMetricTest
- MattesMutualInformationMetricTest
- MutualInformationMetricTest
- NormalizedMutualInformationMetricTest
- MutualInformationHistogramMetricTest
- NormaalizedMutualInformationHistogramMetricTest
Related Pages
Performance Measurement
- Intel's VTune for Linux ($)
- TAU
- Threadmon: Thread usage/blockage
- TotalView ($)
- PerfSuite (POSIX Threads)
- GProf work-around for multi-threaded apps
- References on multi-threaded profiling and code optimization