SPM/Faster SPM
From Wikibooks, the open-content textbooks collection
Contents |
[edit] Optimizations
[edit] MatLab Optimizations
Disable JavaVM
matlab -nojvm
[edit] Enabling multithreading on Matlab >=7.0.1
Matlab R14 Service Pack 1 (AKA 7.0.1) (or higher) uses the Intel Math Kernel Library. Threading is disabled by default. You can enable threading if you are using a processor with hyperthreading or you have more than one physical processor core.
http://www.mathworks.com/access/helpdesk/help/techdoc/rn/math_new.html#1001367 http://developer.intel.com/software/products/mkl/docs/mklusel.htm#Using%20MKL%20Parallelism http://www.mathworks.com/support/solutions/data/1-V63VS.html?solution=1-V63VS http://www.mathworks.com/support/solutions/data/1-ZGD1M.html?solution=1-ZGD1M
However you should be aware that some users have experienced fatal problems with this setting. The following code will cause MATLAB R14SP2 to crash:
System specs
Hyperthreaded Pentium 4 Gentoo Linux 2.6.10-r4 gcc 3.3.5 MATLAB 7.0.4.352 (R14) Service Pack 2
in bash
export OMP_NUM_THREADS=2 matlab -nojvm
in MATLAB
n = 32; A=zeros(n,n,n); B=zeros(n,n,n); C=zeros(n,n,n); A(:,1,:) = B(:,:,1)*squeeze(C(:,1,:));
Error message
OMP abort: Unable to set worker thread stack size to 4195328 bytes Try reducing KMP_STACKSIZE or increasing the shell stack limit.
We contacted the MathWorks, who told us that the environment variable, OMP_NUM_THREADS, is not officially supported for the Linux platform. It should be set to 1 (or not set at all) in order to use MATLAB 7.0.4 (R14SP2) without any thread related errors.
[edit] Install new BLAS
Install the latest Basic Linear Algebra Subroutines for your system. In some cases (e.g. running NaNs on a Pentium4) you can expect a speed increase of x100 just by updating your BLAS! You have several options of where to get your BLAS from:
- recompiled ATLAS library (open source but possibly not the fastest or most stable)
- AMD's Core Math Library
- Intel's Math Kernel Library.
Links: Matlab 7.0.1 maths features, configuring MKL, MKL discussion forum, MatWorks page on BLAS
[edit] Notes on compiling ATLAS
- Make sure you have write permissions to the ATLAS directory after you've untarred it (chown root *)
- On my hyperthreaded P4, I couldn't get ATLAS to compile. This was because the script from the CBU site had entered my CPU clock speed twice (because it's dual processor so the command "grep "MHz" /proc/cpuinfo | gawk 'BEGIN { FS = " " } {print $4 }'" returns two numbers. So here's one way to get the compile to work on a hyperthreaded or SMP system:
- Follow the guide on the CBU site. Stop before running "make"
- Edit your Make.Linux_P4_SSE2* file. Find "PentiumCPS". You should see your clock speed entered twice on that line. Delete one of the entries.
- When compiling the mex files, make sure "mex" is in your path (i.e. make sure /usr/local/bin is in your path - it should be. It might not be if your running inside su)
[edit] Installing the Intel Math Kernel Library
- Download the free non-commercial version of the MKL
- Extract the files ("tar -xvf filename" on Linux)
- Linux: make sure the drive is mounted with exec permissions (type "mount" if you're not sure - if the drive is listed as "noexec" then you'll get a "permission denied" error when you try to run the install program"
- ./install
- Now copy your MKL files to the matlab directory. I do this by typing something like:
cp /opt/intel/mkl72/lib/32/* /usr/local/matlab/bin/glnx86/
(yes, you do need to copy everything)
- Finally, tell MatLab to use the relevant .so file by editing $MATLAB/bin/glnx86/blas.spec (read the MKL documentation to find out which .so file you need)
- Finally, read the MKL documentation on configuring for maximum speed (e.g. turning on threading)
[edit] Installing MKL on Gentoo Linux
Gentoo doesn't use RPM. And it has an ancient version of MKL in Portage.
- emerge rpm
- Download the free non-commercial version of the MKL
- Extract the files ("tar -xvf filename" on Linux)
- Make sure the drive is mounted with exec permissions (type "mount" if you're not sure - if the drive is listed as "noexec" then you'll get a "permission denied" error when you try to run the install program"
- ./install
- The install will fail but should tell you where all the install files are. Change to this directory
- Find the .rpm file (let's call it mkl????.rpm for now)
- rpm -i --nodeps mkl????.rpm
- Now copy your MKL files to the matlab directory. I do this by typing:
cp /opt/intel/mkl72/lib/32/* /usr/local/matlab/bin/glnx86/
(yes, you do need to copy everything)
- Finally, tell MatLab to use the relevant .so file by editing $MATLAB/bin/glnx86/blas.spec
- Finally, read the MKL documentation on configuring for maximum speed (e.g. turning on threading)
[edit] SPM Optimizations
[edit] MAXMEM
set your memory setting in spm_defaults.m
defaults.stats.maxmem = 2^30;
- 2^30 = 1GByte
- 2^29 = 512MBytes
[edit] Misc
- There are a couple techniques maximizing Matlab performance - a page from MathWorks.com
- SPM99: Compute full instead of sparse matrices.
- SPM99 Pentium4 users: Update MEX files for your computer using the best compiler you can get your hands on.
- SPM2: especially Pentium4 users: Windows users should replace the DLLs in the SPM2 folder with the files in the compressed archive spm_mex_win_p4.zip.
- Linux users should recompile the mex files specifically for their processor architecture (especially if you have a Pentium4). If you have an Intel CPU (i.e. a Pentium) then you should use the Intel C++ Compiler rather than GCC. ICC is faster than GCC on Intel hardware. Info on optimising ICC.
[edit] Operating system optimizations
[edit] Linux
- Configure your kernel for your system
[edit] Windows
- Matlab works best if Windows XP is running in "classic mode"
[edit] Mac OS X
[edit] Making use of your Graphics Processing Unit (GPU)
Modern graphics cards have an enormous amount of processing power which could be harnessed for doing scientific calculations, especially with the arrival of PCI Express which allows very fast full-duplex communication between the GPU and CPU. Efficient use of the GPU could give speed-ups on the order of 15 times. It's something that will happen - the question is when. For example, a 3GHz P4 has a theoretical performance of 6 GFLOPS whilst 40 GFLOPS has been observed for the GeForce 6800 Ultra.
Matlab users can use the GPU via the Jacket Software created by AccelerEyes. AccelerEyes is currently offering the beta version of their Jacket Software free of charge to interested developers. They may be contacted via their website at http://www.accelereyes.com .
Links:
- AccelerEyes: a company that develops the Jacket Software for Matlab GPU computing.
- Peakstream: a company that develops a platform for GPU/CPU integration
- Linear Algebra Operators for GPU Implementation of Numerical Algorithms
- GPUs doing scientific calculation
- BrookGPU - a compiler for GPUs
- GPGPU - General-Purpose Computation Using Graphics Hardware
- GPGPU Forums
- GPGPU thread on using GPU with MatLab
- Implementing Performance Libraries on Graphics Hardware
- Brook on GPUs: stream computing on graphics hardware Paper
- Brook for GPUs: Stream Computing on Graphics Hardware PDF powerpoint presentation
- The GPU as a Computational Resource in Medical Image Processing November 2004
[edit] Clusters and parallel processing
[edit] SPM-specific tools
[edit] pSPM
Parallel SPM can be downloaded from http://prdownloads.sourceforge.net/parallelspm/
It implements realignment, slice-timing correction, normalization, smoothing, and statistics via the PSPM interface. Running SPM in parallel significantly reduces processing time on systems with multiple processors or workstation clusters (as MATLAB by default can only use one processor).
Differences between v1 and v2:
- added fully parallelized statistics estimation
- added a -nodisplay option to suppress graphics output
- the coregistration option outputs a plot (spm2.ps) just like spm normally does
- added windows support, you should be able to compile in Cygwin or MSVC (or whatever your favorite Windows compiler is). Refer to README.windows within the source distribution
- fixed a bug in slice timing correction
- a few minor user interface changes
- better error handling
- a few utility files to test the PSPM package
So for those keeping a running tally, here's what is currently parallelized:
- coregistration and reslicing
- slice timing correction
- applying normalization parameters to files (NOT estimation)
- smoothing
- full stats estimation
Once you have installed the package, use the PSPM_test_dir script to compare the output from parallel processing to regular uni-process SPM processing. When you run PSPM_test_dir it will ask you to select two directories. It will then proceed to compare all the image files in the two directories with the same name, and provide a report regarding discrepancies. There is also a PSPM_compare_struct script which will compare (element by element) two structures in MATLAB to see if they are identical. This might also be useful
At present, stats estimation produces a slight discrepancy of ~10^-12 per image. This seems due to floating point arithmetic precision issues. I have not had to this effect the results in any way. I'm still looking into this.
[edit] General tools for making MatLab parallel
[edit] Intel's cluster Math Kernel Library
[edit] Sun Grid Engine
The SGE is an opensource project sponsored by Sun Microsystems.
What you need:
- The qsubfunc .m file
- The Sun Grid Engine
[edit] FAQs
[edit] With hyper-threading enabled, I only get 50% CPU utilisation
I wouldn't worry about it.
With hyperthreading enabled, XP believes that you've got two processors. By "50% utilisation" XP actually means that one processor is at full utilisation whilst the other is idle. This occurs because, as others have said, Matlab is only single threaded and so can only use one processor. That sounds less than optimal, doesn't it.
It's not a problem because HyperThreading *isn't* the same as having two processors. HyperThreading is a clever way to keep the P4's massive pipeline full by allowing a single physical core to run two threads. But, even when the system is running optimally (i.e. you're running more than one thread) then a single HT processor is no-where near the speed of a true SMP setup. And there's little evidence that HT slows down single-threaded applications. In short: XP is lying to you - your CPU *is* at 100% utilisation for a single-threaded app (which begs the question: if 50% in XP = 100% in reality then does 100% in XP = 200% in reality, to which the answer is no!)
To be honest, I doubt you'll see any improvement by turning off hyperthreading (and, in fact, you might find that XP refuses to boot if you disable HT in the BIOS). And leaving HT on allows XP to run some other processes more efficiently whilst Matlab is running.
Here's what I suggest you do:
If you've got some time on your hands then benchmark your existing system then turn off hyperthreading in the BIOS and benchmark it again. My gut feeling is that you wont see much difference but I could be wrong.
If you've got less time then just don't worry about it. Your expensive CPU is being pushed as hard is it can be pushed. XP is fibbing to you when it says "50% utilisation".
For some benchmarks and some more theory on hyperthreading, have a look at these two links: