SPM/Benchmarks
From Wikibooks, the open-content textbooks collection
Contents |
[edit] Introduction
The SPM99 and more recent SPM2 suite of software is used by many neuroimagers world-wide for analysing MRI images. This takes a lot of computer time, and any approach to speeding it up is welcome. Tom Womack and Chris Rorden found that recompiling parts of the SPM package using a more advanced compiler than the one used for the standard distribution (Intel C++ version 7.1, rather than GCC) produces quite significant speed-ups. The tables below show the speeds they found on different computers and with differently compiled MEX files (measured with Matthew Brett's mextest.m file). Note that the many computers are much slower computing Not-a-Number values than real values (i.e. compare speed of 'linear resample' to the 'NaN linear resample'). Unfortunately, the statistics stage of SPM processing uses NaN values heavily: Chris Rorden thinks this explains why Ahtlon and Athlon64 computers are so much faster than other systems for computing the 'statistics' stage of SPM (as they do not have a NaN penalty). The tables indicate the time required to complete different tasks, with lower numbers indicating faster performance. Systems that show a NaN penalty are highlighted in red: these systems will be slow during the statisitics portion of SPM processing. Sun and SGI data from Otto Muzik and Shane McKie.
| System | MEX files | SimpleRead | Linear Resample | Sinc Resample | NaN linear Resample | Smooth |
| Celeron .8Ghz | OriginalSPM99 | 1.26 | 3.59 | 48.22 | 11.68 | 17.02 |
| icc7.1 p2 | 1.26 | 1.30 | 23.15 | 9.16 | 15.45 | |
| Athlon 2200XP 1.8Ghz | OriginalSPM99 | 0.32 | 1.14 | 15.67 | 1.09 | 14.98 |
| icc7.1 p2 | 0.31 | 0.39 | 6.28 | 0.38 | 11.37 | |
| Athlon 2800XP 2.1GHz | OriginalSPM2 | 0.17 | 0.95 | 6.16 | 0.95 | 9.55 |
| Athlon64 3400+ 2.2GHz | SPM2std | 0.11 | 0.70 | 7.86 | 0.70 | 5.95 |
| SPM2p3 | 0.11 | 0.66 | 8.58 | 0.65 | 5.95 | |
| SPM2p4 | 0.11 | 0.69 | 7.45 | 0.63 | 5.97 | |
| Pentium4 2.4GHz | OriginalSPM99 | 0.23 | 1.16 | 11.45 | 20.66 | 16.71 |
| Matthew Brett gcc3.2 | 0.20 | 0.99 | 8.59 | 0.99 | 15.31 | |
| icc7.1 p2 | 0.21 | 0.31 | 5.55 | 19.30 | 19.20 | |
| icc7.1 p4 | 0.20 | 0.32 | 7.59 | 0.28 | 18.29 | |
| Xeon 2x3GHz | SPM2p3 | 0.19 | 0.99 | 8.84 | 15.73 | 13.3 |
| SPM2p4 | 0.20 | 0.66 | 9.98 | 0.62 | 13.4 | |
| Sun Ultra II .25Ghz | OriginalSPM99 | 6.20 | 7.20 | 134.17 | 27.12 | 49.00 |
| Sun Ultra III CPU .75Ghz | OriginalSPM99 | 0.68 | 2.88 | 33.27 | 123.42 | 11.72 |
| Sun Ultra III+ CPU 1.0GHz | OriginalSPM99 | 0.30 | 1.31 | 23.94 | 95.91 | 4.44 |
| SGI R14000 .6GHz | OriginalSPM2 | 0.35 | 0.62 | 15.46 | 117.03 | 10.94 |
| Macintosh G4: 2x1.4GHz | OriginalSPM99 | 0.46 | 0.89 | 18.68 | 0.88 | 7.48 |
[edit] SPM2
SPM2 allows you to write scripts to process data fairly automatically. I created a script to process a simple block design dataset (130 volumes). This dataset gives you an idea of the amount of time required to process a dataset. Note that for event-related designs the 'statistics' portion of the analysis would take considerably longer. Since the statistics are often recomputed for different comparisons, I suggest that the performance on the statistics portion of this benchmark should be emphasized. All times are in seconds: 'smaller numbers are faster'. Note that 'preprocess total' is the sum of realignment, unwarping, normalization and smoothing. The excellent performance of the Athlon64 for statistics is probably due to the fact that it does not suffer a NaN penalty. If you wish, you can download this data set and SPM2 batch scripts here (18Mb). Note that the same 2.4GHz Pentium 4 was tested both with Windows and Linux: with Linux showing much better performance in SPM, I found similar results with the FSL3.1 FEEDS benchmark running on WindowsXP/Cygwin1.5.9 versus Linux.
| System | Realign | Unwarp | Normalize | Smooth | Preprocess Total | Statistics | |
| 0.8GHz Celeron | 530 | 2374 | 953 | 131 | 4005 | 364 | WinXP; Matlab5.3 192Mb RAM |
| Dual 1.0GHz Pentium3 | 233 | 1016 | 343 | 63 | 1661 | 271 | WinXP; SGI330 1.5Gb RAM |
| 1.3GHz PentiumM 'Centrino' | 177 | 525 | 183 | 34 | 922 | 143 | WinXP; Dell D500 Latitude laptop |
| Dual 1.4GHz Opteron 240 | 118 | 478 | 255 | 24 | 875 | 66 | LinuxRH9, 1Gb RAM UC Irvine Brain Imaging Center |
| 1.83GHz AthlonXP 2500 | 141 | 558 | 170 | 85 | 958 | 106 | WinXP, 512Mb RAM, HP Laptop |
| Dual 2.0GHz G5 Macintosh | 156 | 440 | 161 | 28 | 789 | 104 | Mac10.3 |
| 2.2GHz Athlon64 3400+ | 76 | 314 | 97 | 15 | 509 | 69 | WinXP, 1Gb RAM |
| 61 | 293 | 124 | 15 | 494 | 41 | SUSE9.1 64bit, 1Gb RAM | |
| 2.4GHz Pentium4 | 149 | 426 | 141 | 24 | 743 | 145 | WinXP, 512Mb RAM, FSB: 800Mhz |
| 185 | 513 | 163 | 23 | 887 | 144 | WinXP, 512Mb RAM, FSB: 800Mhz MATLAB 7.0 | |
| 112 | 294 | 116 | 21 | 547 | 67 | SUSE9.0, 512Mb RAM, FSB: 800Mhz | |
| 71 | 320 | 121 | 22 | 536 | 65 | SUSE9.1, 512Mb RAM, FSB: 800Mhz | |
| 3.0GHz Pentium4 | 88 | 258 | 105 | 18 | 472 | 57 | Mandrake10, 2Gb RAM, UC Berkeley |
| Dual 3.0GHz Xeon | 125 | 339 | 103 | 21 | 591 | 121 | WinXP, 3Gb RAM, Optimized Mex/ATLAS without optimised files: statistics required 346 seconds |
| Dual 3.0GHz Xeon | 65 | 267 | 96 | 21 | 451 | 60 | RH9, 3Gb RAM, Optimized Mex/ATLAS, Campinas |
[edit] Matlab bench functions
I think the real world tests above are probably a better benchmark of SPM performance than the built-in Matlab Benchmark test. However, here are a few Matlab 'bench' values that should give a rough idea about performance. Lower values mean faster performance, except for the 'score', where a higher value means faster overall performance. Note that Matlab is single-threaded, so there is little benefit for dual processors. These values are from Matlab 6.5, and it is possible that the G5 and Athlon64 may show improved performance if Mathworks releases new versions of Matlab optimized for these systems. Good Pentium4 performance depends on compiling code specifically for its quirks. Note that Linux systems perform slower on the 2D/3D scores than the same system running Windows, a finding not reflected in the SPM2 benchmark.
| System | LU | FFT | ODE | Sparse | 2-D | 3-D | Score | Notes |
| Dual 1.0GHz Pentium3 | 1.61 | 1.98 | 0.86 | 1.30 | 2.27 | 0.70 | 11.5 | WinXP SGI330 1.5Gb RAM |
| 1.3GHz PentiumM 'Centrino' | 0.82 | 0.99 | 0.37 | 0.73 | 0.78 | 0.41 | 24.4 | WinXP Dell D500 Latitude laptop |
| 1.8GHz Athlon XP2200 | 0.81 | 1.48 | 0.50 | 0.86 | 1.31 | 0.52 | 17.3 | WinXP |
| Dual 2.0GHz G5 Macintosh | 0.32 | 1.10 | 0.44 | 0.52 | 1.10 | 0.90 | 22.8 | Mac10.3 |
| 2.1GHz AthlonXP 2800+ | 0.46 | 0.85 | 0.31 | 0.55 | 0.47 | 0.16 | 35.8 | WinXP [is 3D score correct?] |
| 2.2GHz Athlon64 3400+ | 0.28 | 0.59 | 0.20 | 0.44 | 0.66 | 0.67 | 35.2 | WinXP |
| 0.38 | 0.60 | 0.34 | 0.61 | 0.43 | 0.76 | 32.0 | SUSE9.1 64bit, 1Gb RAM | |
| 2.4GHz Pentium4 | 0.31 | 1.02 | 0.47 | 0.57 | 0.79 | 0.68 | 26.0 | WinXP FSB: 800Mhz, optimized ATLAS |
| 0.31 | 0.82 | 0.58 | 0.72 | 0.81 | 1.31 | 22.0 | SUSE9 FSB: 800Mhz, optimized ATLAS | |
| Dual 3.0GHz Xeon | 0.38 | 1.02 | 0.36 | 0.47 | 0.56 | 0.30 | 32.4 | WinXP optimized ATLAS, ATI Radeon 9800pro |
| Dual 3.0GHz Xeon | 0.38 | 1.02 | 0.36 | 0.47 | 0.56 | 0.30 | 32.4 | WinXP optimized ATLAS, ATI Radeon 9800pro |
[edit] FSL benchmark
The FMRIB in Oxford provides a group of neuroimaging tools known collectively as FSL. These tools also come with a benchmark named FEEDS that allow you to test that the programs are installed correctly as well as giving you an idea for the performance of your system. Since FSL is available is source code, we can recompile these tools to take advantage of architecture specific features (like SSE or the extra registers provided when the Athlon64 is in 64-bit mode). The times below reflect total time (not shorter user time) in seconds to complete the FEEDS 3.1 benchmark, 'lower values mean faster performance'.
| System | Total time (sec) | |
| 2.2GHz Athlon64 3400+ | 1852 | WinXP/Cygwin: distribution from FSL website |
| 1620 | SUSE9.1 64bit OS, 32-bit: distribution from FSL website | |
| 1305 | SUSE9.1 64bit OS, 32-bit: -march=k8 -mcpu=k8 -mfpmath=sse -O3 -fexpensive-optimizations | |
| 1137 | SUSE9.1 64bit OS, 64-bit: -march=k8 -mcpu=k8 -mfpmath=sse -O3 -fexpensive-optimizations -m64 | |
| 2.4GHz Pentium4 | 2572 | WinXP/Cygwin: distribution from FSL website |
| 2155 | SUSE9.0: distribution from FSL website | |
| 1784 | SUSE9.1: -march=pentium4 -mcpu=pentium4 -mfpmath=sse -O3 -fexpensive-optimizations |
- Note: I was unable to get 'slicer' and 'overlay' to compile in 64-bit mode, further 'melodic' was much slower as a 64-bit executable: 32-bit executables were used for these stages of processing.