# Parallel Spectral Numerical Methods/Gray Scott

## Background

[1]

$\frac{\partial u}{\partial t} = D_{u} \nabla ^{2} u-uv^{2}+F (1-u)$
$\frac{\partial v}{\partial t} = D_{v} \nabla ^{2} u+uv^{2}-(F+k)v$
The above reaction-diffusion system describes the density of two chemicals ,($u$ and $v$), reacting with another. First term in the equations is for the diffusion, the second describes the reaction of $u$ and $v$,($u+2v$$3v$) and the last term in the first equation is the input of new $u$, while in the second the last term removes $v$ from the system.

### Matlab Program

The program solves the linear part with fourier transformation and the nonlinear part with the implicit midpoint rule.

( A)

% A program to solve the Gray-Scott equations using
% splitting method. The nonlinear equations are solved using the implicit
% midpoint rule
clear all; format compact, format short,
set(0,'defaultaxesfontsize',17,'defaultaxeslinewidth',.7,...
'defaultlinelinewidth',3.5,'defaultpatchlinewidth',3.5)

Nx =64; % number of modes
Ny=64;
Nz=64;
% set up grid
Nt=4;
tmax=0.04;
Lx = 1.5;       % period  2*pi * L
Ly =1.5;
Lz=1.5;
dt = tmax/Nt;   % number of time slices
tol=0.1^12;
A = 0.04;
B = 0.1;
Du = 1.;
Dv = 1.;
% initialise variables
x = (2*pi/Nx)*(-Nx/2:Nx/2 -1)'*Lx;      % x coordinate
y = (2*pi/Ny)*(-Nx/2:Ny/2 -1)'*Ly;
z = (2*pi/Nz)*(-Nx/2:Nz/2 -1)'*Lz;
kx = i*[0:Nx/2-1 0 -Nx/2+1:-1]'/Lx;     % wave vector
ky = i*[0:Ny/2-1 0 -Ny/2+1:-1]'/Ly;
kz = i*[0:Nz/2-1 0 -Nz/2+1:-1]'/Lz;
[xx,yy,zz]=meshgrid(x,y,z);
[kxx,kyy,kzz]=meshgrid(kx,ky,kz);
% initial conditions
t=0; tdata(1)=t;
u = 0.2 + exp(-2*(xx.^2+yy.^2+zz.^2));
v = 0.1 + exp(-4*(xx.^2+yy.^2+zz.^2-0.01).^2);

gamma=[1];
Ahat=A*fftn(ones(size(xx)));
t=0;
figure(11); clf;
subplot(2,1,1);
H = vol3d('CData',real(u),'texture','3D','XData',x,'YData',y,'ZData',z);
xlabel('x'); ylabel('y'); zlabel('z'); colorbar;
axis equal; axis square; view(3);

xlabel('x'); ylabel('y'); zlabel('z');
axis equal; axis square; view(3); drawnow;
title(['Time ',num2str(t)]);
subplot(2,1,2);
H = vol3d('CData',real(v),'texture','3D','XData',x,'YData',y,'ZData',z);
xlabel('x'); ylabel('y'); zlabel('z'); colorbar;
axis equal; axis square; view(3);

xlabel('x'); ylabel('y'); zlabel('z');
axis equal; axis square; view(3); drawnow;
filename=['./pictures1/',num2str(10000000),'.jpg'];
saveas(11,filename)

for n=1:Nt
chg=1;
for m=1:1
% use fixed point iteration to solve nonlinear system in real space
chg=1;
uold=u; vold=v;
while (chg>tol)
utemp=u; vtemp=v;
umean=0.5*(u+uold);
vmean=0.5*(v+vold);
u=uold+0.5*gamma(m)*dt*(-umean.*vmean.^2);
v=vold+0.5*gamma(m)*dt*umean.*vmean.^2;
chg=max(abs(u-utemp))+max(abs(v-vtemp));
end
uhat=fftn(u); vhat=fftn(v); % solve linear part exactly in Fourier space
uhat=exp(gamma(m)*dt*(-A+Du*(kxx.^2+kyy.^2+kzz.^2))).*...
(uhat-Ahat./(A+Du*(kxx.^2+kyy.^2+kzz)))+Ahat./...
(A+Du*(kxx.^2+kyy.^2+kzz.^2));
vhat=exp(gamma(m)*dt*(-B+Dv*(kxx.^2+kyy.^2+kzz.^2))).*vhat;
u=ifftn(uhat); v=ifftn(vhat);
% use fixed point iteration to solve nonlinear system in real space
chg=1;
uold=u; vold=v;
while (chg>tol)
utemp=u; vtemp=v;
umean=0.5*(u+uold);
vmean=0.5*(v+vold);
u=uold+0.5*gamma(m)*dt*(-umean.*vmean.^2);
v=vold+0.5*gamma(m)*dt*umean.*vmean.^2;
chg=max(abs(u-utemp))+max(abs(v-vtemp));
end
end
t=n*dt;
figure(11); clf;
subplot(2,1,1);
H = vol3d('CData',real(u),'texture','3D','XData',x,'YData',y,'ZData',z);
xlabel('x'); ylabel('y'); zlabel('z'); colorbar;
axis equal; axis square; view(3);

xlabel('x'); ylabel('y'); zlabel('z');
axis equal; axis square; view(3); drawnow;
title(['Time ',num2str(t)]);
subplot(2,1,2);
H = vol3d('CData',real(v),'texture','3D','XData',x,'YData',y,'ZData',z);
xlabel('x'); ylabel('y'); zlabel('z'); colorbar;
axis equal; axis square; view(3);

xlabel('x'); ylabel('y'); zlabel('z');
axis equal; axis square; view(3); drawnow;
filename=['./pictures1/',num2str(1000000+n),'.jpg'];
saveas(11,filename)

end


### Fortran Program

The program uses 2decomp_fft like the programs from previous chapters.

( B)

	!--------------------------------------------------------------------
!
!
! PURPOSE
!
! This program solves GrayScott equation in 3 dimensions
! u_t=D_u*(u_{xx}+u_{yy}+u_{zz})^2-u*v^2-F*(1-u)
! v_t=D_v*(v_{xx}+v_{yy}+v_{zz})^2+u*v^2-(F+k)*v
!
!
! .. Parameters ..
!  Nx				= number of modes in x - power of 2 for FFT
!  Ny				= number of modes in y - power of 2 for FFT
!  Nz				= number of modes in z - power of 2 for FFT
!  Nt				= number of timesteps to take
!  Tmax				= maximum simulation time
!  plotgap			= number of timesteps between plots
!  pi = 3.14159265358979323846264338327950288419716939937510d0
!  Lx				= width of box in x direction
!  Ly				= width of box in y direction
!  Lz				= width of box in z direction
!  A				=F
!  B				=F+k
! .. Scalars ..
!  Ahat 			= Fourier transform of A
!  i				= loop counter in x direction
!  j				= loop counter in y direction
!  k				= loop counter in z direction
!  n				= loop counter for timesteps direction
!  allocatestatus	= error indicator during allocation
!  start			= variable to record start time of program
!  finish			= variable to record end time of program
!  count_rate		= variable for clock count rate
!  dt				= timestep
!  modescalereal	= Number to scale after backward FFT
!  myid				= Process id
!  ierr				= error code
!  p_row			= number of rows for domain decomposition
!  p_col			= number of columns for domain decomposition
!  filesize			= total filesize
!  disp				= displacement to start writing data from
!  ind				= index in array to write
!  plotnum			= number of plot to save
!  numberfile		= number of the file to be saved to disk
!  stat				= error indicator when reading inputfile
!  umean			= temporary for fix point iteration
!  vmean			= temporary for fix point iteration
! .. Arrays ..
!  u 				= approximate solution u
!  v				= approximate solution v
!  uold 				= previous timestep of u
!  vold				= previous timestep of v
!  utemp 			= temporary for fix point iteration u
!  vtemp			= temporary for fix point iteration v
!  uhat 			= Fourier transform of u
!  vhat				= Fourier transform of v
! .. Vectors ..
!  kx				= fourier frequencies in x direction squared
!  ky				= fourier frequencies in y direction squared
!  kz				= fourier frequencies in z direction squared
!  x				= x locations
!  y				= y locations
!  z				= z locations
!  time				= times at which save data
!  nameconfig		= array to store filename for data to be saved
!  InputFileName	= name of the Input File
!  dpcomm
!  intcomm
! .. Special Structures ..
!  decomp			= contains information on domain decomposition
!
! REFERENCES
!
! ACKNOWLEDGEMENTS
!
! ACCURACY
!
! ERROR INDICATORS AND WARNINGS
!
! Check that the initial iterate is consistent with the
! boundary conditions for the domain specified
!--------------------------------------------------------------------
! External routines required
!
! External libraries required
! 2DECOMP&FFT	 -- Domain decomposition and Fast Fourier Library
!			(http://www.2decomp.org/index.html)
! MPI library

PROGRAM main
USE decomp_2d
USE decomp_2d_fft
USE decomp_2d_io
USE MPI
! Declare variables
IMPLICIT NONE
INTEGER(kind=4)		::  Nx=0
INTEGER(kind=4) 	::  Ny=0
INTEGER(kind=4) 	::  Nz=0
INTEGER(kind=4)		::  Nt=0
INTEGER(kind=4)		::  plotgap=0
REAL(kind=8), PARAMETER	::&
pi=3.14159265358979323846264338327950288419716939937510d0
REAL(kind=8)		::  Lx=0.0,Ly=0.0,Lz=0.0
REAL(kind=8)		::  dt=0.0
REAL(kind=8)		::  tol=0.1**12
REAL(kind=8)		::  A=0.0
REAL(kind=8)		::  Ahat=0
REAL(kind=8)		::  B=0.0
REAL(kind=8)		::  Du=0.0
REAL(kind=8)		::  Dv=0.0
REAL(kind=8)		::  chg=1.0
REAL(kind=8), DIMENSION(:), ALLOCATABLE	::  kx,ky,kz
REAL(kind=8),  	 DIMENSION(:), ALLOCATABLE	::  x,y,z
REAL(kind=8), DIMENSION(:,:,:), ALLOCATABLE	::  u,v
REAL(kind=8), DIMENSION(:,:,:), ALLOCATABLE	::  uold,vold
REAL(kind=8) 		:: utemp,vtemp
COMPLEX(kind=8), DIMENSION(:,:,:), ALLOCATABLE	::  uhat,vhat
COMPLEX(kind=8) :: uhatmp
REAL(kind=8) :: umean,vmean
REAL(kind=8), 	 DIMENSION(:), ALLOCATABLE	::  time
INTEGER(KIND=4), DIMENSION(1:5)	::  intcomm
REAL(KIND=8), DIMENSION(1:8)	::  dpcomm
REAL(kind=8) :: modescalereal
INTEGER(kind=4)	::  i,j,k,n,AllocateStatus,stat
INTEGER(kind=4)	:: myid,numprocs,ierr
TYPE(DECOMP_INFO)	::  decomp,sp
INTEGER(kind=MPI_OFFSET_KIND) :: filesize, disp
INTEGER(kind=4)	::  p_row=0, p_col=0
INTEGER(kind=4)	::  start, finish, count_rate, ind, plotnum
CHARACTER*50	::	nameconfig
CHARACTER*20	::	numberfile, InputFileName
! initialisation of MPI
CALL MPI_INIT(ierr)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, ierr)
CALL MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierr)

CALL MPI_BCAST(stat,1,MPI_INTEGER,0,MPI_COMM_WORLD,ierr)

IF(myid.eq.0) THEN
InputFileName='INPUTFILE'
OPEN(unit=11,FILE=trim(InputFileName),status="OLD")
REWIND(11)
dpcomm(1), dpcomm(2),  dpcomm(3),  dpcomm(4),  dpcomm(5), &
dpcomm(6), dpcomm(7),  dpcomm(8)
CLOSE(11)
PRINT *,"NX ",intcomm(1)
PRINT *,"NY ",intcomm(2)
PRINT *,"NZ ",intcomm(3)
PRINT *,"NT ",intcomm(4)
PRINT *,"plotgap ",intcomm(5)
PRINT *,"Lx",dpcomm(1)
PRINT *,"Ly",dpcomm(2)
PRINT *,"Lz",dpcomm(3)
PRINT *,"dt",dpcomm(4)
PRINT *,"Du",dpcomm(5)
PRINT *,"Dv",dpcomm(6)
PRINT *,"F",dpcomm(7)
PRINT *,"k",dpcomm(8)
END IF
CALL MPI_BCAST(dpcomm,8,MPI_DOUBLE_PRECISION,0,MPI_COMM_WORLD,ierr)
CALL MPI_BCAST(intcomm,5,MPI_INTEGER,0,MPI_COMM_WORLD,ierr)

Nx=intcomm(1)
Ny=intcomm(2)
Nz=intcomm(3)
Nt=intcomm(4)
plotgap=intcomm(5)
Lx=dpcomm(1)
Ly=dpcomm(2)
Lz=dpcomm(3)
dt=dpcomm(4)
Du=dpcomm(5)
Dv=dpcomm(6)
A=dpcomm(7)
B=dpcomm(8)+dpcomm(7)
! initialisation of 2decomp
! do automatic domain decomposition
CALL decomp_2d_init(Nx,Ny,Nz,p_row,p_col)
! get information about domain decomposition choosen
CALL get_decomp_info(decomp) ! physical domain
CALL decomp_info_init(Nx/2+1,Ny,Nz,sp) ! spectral domain
! initialise FFT library
CALL decomp_2d_fft_init
ALLOCATE(u(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
v(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
uold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
vold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
uhat(sp%zst(1):sp%zen(1),&
sp%zst(2):sp%zen(2),&
sp%zst(3):sp%zen(3)),&
vhat(sp%zst(1):sp%zen(1),&
sp%zst(2):sp%zen(2),&
sp%zst(3):sp%zen(3)),&
kx(sp%zst(1):sp%zen(1)), &
ky(sp%zst(2):sp%zen(2)), &
kz(sp%zst(3):sp%zen(3)), &
x(decomp%xst(1):decomp%xen(1)),&
y(decomp%xst(2):decomp%xen(2)),&
z(decomp%xst(3):decomp%xen(3)),&
time(1:1+Nt/plotgap),&
stat=AllocateStatus)
IF (AllocateStatus .ne. 0) STOP

IF (myid.eq.0) THEN
PRINT *,'allocated space'
END IF
CALL MPI_BARRIER(MPI_COMM_WORLD,ierr)
modescalereal=1.0d0/REAL(Nx,KIND(0d0))
modescalereal=modescalereal/REAL(Ny,KIND(0d0))
modescalereal=modescalereal/REAL(Nz,KIND(0d0))

! setup fourier frequencies and grid points
DO i = 1,1+ Nx/2
IF ((i.GE.sp%zst(1)).AND.(i.LE.sp%zen(1))) THEN
kx(i)=-(REAL(i-1,kind(0d0))/Lx)**2
END IF
END DO
IF ((Nx/2 + 1 .GE.sp%zst(1)).AND.(Nx/2 + 1 .LE.sp%zen(1))) THEN
kx( Nx/2 + 1 ) = 0.0d0
END IF
DO i = Nx/2+2, Nx
IF ((i.GE.sp%zst(1)).AND.(i.LE.sp%zen(1))) THEN
Kx( i) = -(REAL(1-i+Nx,KIND(0d0))/Lx)**2
END IF
END DO
DO i=decomp%xst(1),decomp%xen(1)
x(i)=(-1.0d0 + 2.0d0*REAL(i-1,kind(0d0))/REAL(Nx,kind(0d0)))*pi*Lx
END DO

DO j = 1,1+ Ny/2
IF ((j.GE.sp%zst(2)).AND.(j.LE.sp%zen(2))) THEN
ky(j)= -(REAL(j-1,kind(0d0))/Ly)**2
END IF
END DO
IF ((Ny/2 + 1 .GE.sp%zst(2)).AND.(Ny/2 + 1 .LE.sp%zen(2))) THEN
ky( Ny/2 + 1 ) = 0.0d0
ENDIF
DO j = Ny/2+2, Ny
IF ((j.GE.sp%zst(2)).AND.(j.LE.sp%zen(2))) THEN
ky(j) = -(REAL(1-j+Ny,KIND(0d0))/Ly)**2
END IF
END DO
DO j=decomp%xst(2),decomp%xen(2)
y(j)=(-1.0d0 + 2.0d0*REAL(j-1,kind(0d0))/REAL(Ny,kind(0d0)))*pi*Ly
END DO

DO k = 1,1+ Nz/2
IF ((k.GE.sp%zst(3)).AND.(k.LE.sp%zen(3))) THEN
kz(k)= -(REAL(k-1,kind(0d0))/Lz)**2
END IF
END DO
IF ((Nz/2 + 1 .GE.sp%zst(3)).AND.(Nz/2 + 1 .LE.sp%zen(3))) THEN
kz( Nz/2 + 1 ) = 0.0d0
ENDIF
DO k = Nz/2+2, Nz
IF ((k.GE.sp%zst(3)).AND.(k.LE.sp%zen(3))) THEN
kz(k) = -(REAL(1-k+Nz,KIND(0d0))/Lz)**2
END IF
END DO
DO k=decomp%xst(3),decomp%xen(3)
z(k)=(-1.0d0 + 2.0d0*REAL(k-1,kind(0d0))/REAL(Nz,kind(0d0)))*pi*Lz
END DO

IF (myid.eq.0) THEN
PRINT *,'Setup grid and fourier frequencies'
END IF
!compute Ahat
Ahat=A*Nx*Ny*Nz
!initial condition
CALL MPI_BARRIER(MPI_COMM_WORLD,ierr)

DO k=decomp%xst(3),decomp%xen(3)
DO j=decomp%xst(2),decomp%xen(2)
DO i=decomp%xst(1),decomp%xen(1)
u(i,j,k)=1+(exp(-1*(x(i)**2+y(j)**2+z(k)**2)-1))
v(i,j,k)=0+(exp(-1*(x(i)**2+y(j)**2+z(k)**2)-1))
END DO
END DO
END DO
! write out using 2DECOMP&FFT MPI-IO routines
nameconfig='./data/u'
plotnum=0
WRITE(numberfile,'(i0)') 10000000+plotnum
ind=index(nameconfig,' ') -1
nameconfig=nameconfig(1:ind)//numberfile
ind=index(nameconfig,' ') -1
nameconfig=nameconfig(1:ind)//'.datbin'
CALL decomp_2d_write_one(1,u,nameconfig)
!same for v
nameconfig='./data/v'
plotnum=0
WRITE(numberfile,'(i0)') 10000000+plotnum
ind=index(nameconfig,' ') -1
nameconfig=nameconfig(1:ind)//numberfile
ind=index(nameconfig,' ') -1
nameconfig=nameconfig(1:ind)//'.datbin'
CALL decomp_2d_write_one(1,v,nameconfig)

IF (myid.eq.0) THEN
PRINT *,'Got initial data, starting timestepping'
END IF
CALL system_clock(start,count_rate)
time(1)=0
DO n=1,Nt
!use fixed point iteration to solve nonlinear system in real space
DO k=decomp%xst(3),decomp%xen(3)
DO j=decomp%xst(2),decomp%xen(2)
DO i=decomp%xst(1),decomp%xen(1)
uold(i,j,k)=u(i,j,k)
vold(i,j,k)=v(i,j,k)
END DO
END DO
END DO
DO k=decomp%xst(3),decomp%xen(3)
DO j=decomp%xst(2),decomp%xen(2)
DO i=decomp%xst(1),decomp%xen(1)
chg=1
DO WHILE (chg>tol)
utemp=u(i,j,k)
vtemp=v(i,j,k)
umean=0.5*(u(i,j,k)+uold(i,j,k))
vmean=0.5*(v(i,j,k)+vold(i,j,k))
u(i,j,k)=uold(i,j,k)+0.5*dt*(-umean*vmean**2)
v(i,j,k)=vold(i,j,k)+0.5*dt*(umean*vmean**2)
utemp=u(i,j,k)-utemp
vtemp=v(i,j,k)-vtemp
chg=abs(utemp)+abs(vtemp)
END DO
END DO
END DO
END DO
! solve linear part exactly in Fourier space
CALL decomp_2d_fft_3d(u,uhat)
CALL decomp_2d_fft_3d(v,vhat)
IF (myid.eq.0) THEN
uhatmp=uhat(1,1,1)
END IF
DO k=sp%zst(3),sp%zen(3)
DO j=sp%zst(2),sp%zen(2)
DO i=sp%zst(1),sp%zen(1)
uhat(i,j,k)=exp(dt*(-A+Du*(kx(i)+ky(j)+kz(k))))*&
uhat(i,j,k)
vhat(i,j,k)=exp(dt*(-B+Dv*(kx(i)+ky(j)+kz(k))))*&
vhat(i,j,k)
END DO
END DO
END DO
IF (myid.eq.0) THEN
uhat(1,1,1)=exp(dt*(-A+Du*(kx(1)+ky(1)+kz(1))))*&
(uhatmp-Ahat/(A+Du*(kx(1)+ky(1)+kz(1))))+&
(Ahat/(A+Du*(kx(1)+ky(1)+kz(1))))
END IF
!dont forget to scale
CALL decomp_2d_fft_3d(uhat,u)
CALL decomp_2d_fft_3d(vhat,v)
DO k=decomp%xst(3),decomp%xen(3)
DO j=decomp%xst(2),decomp%xen(2)
DO i=decomp%xst(1),decomp%xen(1)
u(i,j,k)=u(i,j,k)*modescalereal
v(i,j,k)=v(i,j,k)*modescalereal
END DO
END DO
END DO
!use fixed point iteration to solve nonlinear system in real space
DO k=decomp%xst(3),decomp%xen(3)
DO j=decomp%xst(2),decomp%xen(2)
DO i=decomp%xst(1),decomp%xen(1)
uold(i,j,k)=u(i,j,k)
vold(i,j,k)=v(i,j,k)
END DO
END DO
END DO
DO k=decomp%xst(3),decomp%xen(3)
DO j=decomp%xst(2),decomp%xen(2)
DO i=decomp%xst(1),decomp%xen(1)
chg=1
DO WHILE (chg>tol)
utemp=u(i,j,k)
vtemp=v(i,j,k)
umean=0.5*(u(i,j,k)+uold(i,j,k))
vmean=0.5*(v(i,j,k)+vold(i,j,k))
u(i,j,k)=uold(i,j,k)+0.5*dt*(-umean*vmean**2)
v(i,j,k)=vold(i,j,k)+0.5*dt*(umean*vmean**2)
utemp=u(i,j,k)-utemp
vtemp=v(i,j,k)-vtemp
chg=abs(utemp)+abs(vtemp)
END DO
END DO
END DO
END DO
!write data
IF (mod(n,plotgap)==0) THEN
time(1+n/plotgap)=n*dt
IF (myid.eq.0) THEN
PRINT *,'time',n*dt
END IF
nameconfig='./data/u'
plotnum=plotnum+1
WRITE(numberfile,'(i0)') 10000000+plotnum
ind=index(nameconfig,' ') -1
nameconfig=nameconfig(1:ind)//numberfile
ind=index(nameconfig,' ') -1
nameconfig=nameconfig(1:ind)//'.datbin'
!write out using 2DECOMP&FFT MPI-IO routines
CALL decomp_2d_write_one(1,u,nameconfig)
nameconfig='./data/v'
WRITE(numberfile,'(i0)') 10000000+plotnum
ind=index(nameconfig,' ') -1
nameconfig=nameconfig(1:ind)//numberfile
ind=index(nameconfig,' ') -1
nameconfig=nameconfig(1:ind)//'.datbin'
!write out using 2DECOMP&FFT MPI-IO routines
CALL decomp_2d_write_one(1,v,nameconfig)
END IF
END DO
IF (myid.eq.0) THEN
PRINT *,'Finished time stepping'
END IF

CALL system_clock(finish,count_rate)

IF (myid.eq.0) THEN
PRINT*,'Program took ',REAL(finish-start)/REAL(count_rate),'for execution'
END IF

IF (myid.eq.0) THEN
! Save times at which output was made in text format
nameconfig = './data/tdata.dat'
OPEN(unit=11,FILE=nameconfig,status="UNKNOWN")
REWIND(11)
DO j=1,1+Nt/plotgap
WRITE(11,*) time(j)
END DO
CLOSE(11)
! Save x grid points in text format
nameconfig = './data/xcoord.dat'
OPEN(unit=11,FILE=nameconfig,status="UNKNOWN")
REWIND(11)
DO i=1,Nx
WRITE(11,*) x(i)
END DO
CLOSE(11)
! Save y grid points in text format
nameconfig = './data/ycoord.dat'
OPEN(unit=11,FILE=nameconfig,status="UNKNOWN")
REWIND(11)
DO j=1,Ny
WRITE(11,*) y(j)
END DO
CLOSE(11)
! Save z grid points in text format
nameconfig = './data/zcoord.dat'
OPEN(unit=11,FILE=nameconfig,status="UNKNOWN")
REWIND(11)
DO k=1,Nz
WRITE(11,*) z(k)
END DO
CLOSE(11)
PRINT *,'Saved data'
END IF

! clean up
CALL decomp_2d_fft_finalize
CALL decomp_2d_finalize
DEALLOCATE(u,v,uold,vold,uhat,vhat,&
kx,ky,kz,x,y,z,&
time,stat=AllocateStatus)
IF (AllocateStatus .ne. 0) STOP
IF (myid.eq.0) THEN
PRINT *,'Program execution complete'
END IF
CALL MPI_FINALIZE(ierr)
END PROGRAM main


### OpenCL

OpenCL you write 'Kernel' to do the intense work in parallel. The kernel is compiled during runtime, to enable the use of different accalerators. To run the code you need an valid OpenCL 1.2 framework for example AMD APP SDK [3] which runs also on non AMD CPU's.

The program uses clfft[4].

( C)

A OpenCL C program to solve the 3-dimensional GrayScott equation Code download
#include "clFFT.h"
//#include <CL/cl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <sys/time.h>

#if defined(cl_amd_fp64) || defined(cl_khr_fp64)
#if defined(cl_amd_fp64)
#pragma OPENCL EXTENSION cl_amd_fp64 : enable
#elif defined(cl_khr_fp64)
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
#endif
// function declarations/definitions using double precision floating-point arithmetic
#endif

#define MAX_SOURCE_SIZE (0x100000)

int main(void) {
//time meassuring
struct timeval tvs;
struct timeval tve;
double elapsedTime;
//variables
int	  Nx;
int 	  Ny;
int 	  Nz;
int	  N;
int 	  plotnum=0;
int	  Tmax=0;
int 	  plottime=0;
int	  plotgap=0;
double	  Lx,Ly,Lz;
double	  dt=0.0;
double	  A=0.0;
double	  B=0.0;
double	  Du=0.0;
double	  Dv=0.0;
double	  a[2]={1.0,0.0};
double 	  b[2]={0.5,0.0};
double*	  x,*y,*z ;
double*	  u[2],*v[2];
//openCL variables
cl_platform_id platform_id = NULL;
cl_device_id device_id = NULL;
cl_context context = NULL;
cl_command_queue command_queue = NULL;
cl_mem cl_u[2] = {NULL,NULL};
cl_mem cl_v[2] = {NULL,NULL};
cl_mem cl_uhat[2] = {NULL,NULL};
cl_mem cl_vhat[2] = {NULL,NULL};
cl_mem cl_x = NULL;
cl_mem cl_y = NULL;
cl_mem cl_z = NULL;
cl_mem cl_kx = NULL;
cl_mem cl_ky = NULL;
cl_mem cl_kz = NULL;
cl_program p_grid = NULL,p_frequencies = NULL,p_initialdata = NULL,p_linearpart=NULL,p_nonlinearpart=NULL;
cl_kernel grid = NULL,frequencies = NULL,initialdata = NULL,linearpart=NULL,nonlinearpart=NULL;
cl_uint ret_num_devices;
cl_uint ret_num_platforms;
cl_int ret;
ret = clGetPlatformIDs(1, &platform_id, &ret_num_platforms);
ret = clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_DEFAULT, 1, &device_id, &ret_num_devices);
context = clCreateContext( NULL, 1, &device_id, NULL, NULL, &ret);
command_queue = clCreateCommandQueue(context, device_id, 0, &ret);
size_t source_size;
char *source_str;
//end opencl
int  i,n;
int status=0;
//int  start, finish, count_rate, ind, numthreads
char	nameconfig[100]="";
char	InputFileName[]="./INPUTFILE";
FILE*fp;
fp=fopen(InputFileName,"r");
if(!fp) {fprintf(stderr, "Failed to load IPUTFILE.\n");exit(1);}
int ierr=fscanf(fp, "%d %d %d %d %d %lf %lf %lf %lf %lf %lf %lf %lf", &Nx,&Ny,&Nz,&Tmax,&plotgap,&Lx,&Ly,&Lz,&dt,&Du,&Dv,&A,&B);
if(ierr!=13){fprintf(stderr, "INPUTFILE corrupted.\n");exit(1);}
fclose(fp);
printf("NX %d\n",Nx);
printf("NY %d\n",Ny);
printf("NZ %d\n",Nz);
printf("Tmax %d\n",Tmax);
printf("plotgap %d\n",plotgap);
printf("Lx %lf\n",Lx);
printf("Ly %lf\n",Ly);
printf("Lz %lf\n",Lz);
printf("dt %lf\n",dt);
printf("Du %lf\n",Du);
printf("Dv %lf\n",Dv);
printf("F %lf\n",A);
printf("k %lf\n",B);
N=Nx*Ny*Nz;
plottime=plotgap;
B=A+B;
//ALLocate the memory
u[0]=(double*) malloc(N*sizeof(double));
v[0]=(double*) malloc(N*sizeof(double));
x=(double*) malloc(Nx*sizeof(double));
y=(double*) malloc(Ny*sizeof(double));
z=(double*) malloc(Nz*sizeof(double));

//allocate gpu mem
cl_u[0] = clCreateBuffer(context, CL_MEM_READ_WRITE, N * sizeof(double), NULL, &ret);
cl_v[0] = clCreateBuffer(context, CL_MEM_READ_WRITE, N * sizeof(double), NULL, &ret);
cl_u[1] = clCreateBuffer(context, CL_MEM_READ_WRITE, N * sizeof(double), NULL, &ret);
cl_v[1] = clCreateBuffer(context, CL_MEM_READ_WRITE, N * sizeof(double), NULL, &ret);
cl_uhat[0] = clCreateBuffer(context, CL_MEM_READ_WRITE, N * sizeof(double), NULL, &ret);
cl_vhat[0] = clCreateBuffer(context, CL_MEM_READ_WRITE, N * sizeof(double), NULL, &ret);
cl_uhat[1] = clCreateBuffer(context, CL_MEM_READ_WRITE, N * sizeof(double), NULL, &ret);
cl_vhat[1] = clCreateBuffer(context, CL_MEM_READ_WRITE, N * sizeof(double), NULL, &ret);
printf("allocated space\n");

// FFT library realted declarations.
clfftPlanHandle planHandle;
clfftDim dim = CLFFT_3D;
size_t clLengths[3] = {Nx, Ny, Nz};
// Setup clFFT.
clfftSetupData fftSetup;
ret = clfftInitSetupData(&fftSetup);
ret = clfftSetup(&fftSetup);
// Create a default plan for a complex FFT.
ret = clfftCreateDefaultPlan(&planHandle, context, dim, clLengths);
// Set plan parameters.
ret = clfftSetPlanPrecision(planHandle, CLFFT_DOUBLE);
ret = clfftSetLayout(planHandle, CLFFT_COMPLEX_PLANAR, CLFFT_COMPLEX_PLANAR);
ret = clfftSetResultLocation(planHandle, CLFFT_OUTOFPLACE);
// Bake the plan.
ret = clfftBakePlan(planHandle, 1, &command_queue, NULL, NULL);
// Create temporary buffer.
cl_mem tmpBufferu = 0;
cl_mem tmpBufferv = 0;
// Size of temp buffer.
size_t tmpBufferSize = 0;
status = clfftGetTmpBufSize(planHandle, &tmpBufferSize);
if ((status == 0) && (tmpBufferSize > 0)) {
tmpBufferu = clCreateBuffer(context, CL_MEM_READ_WRITE, tmpBufferSize, NULL, &ret);
tmpBufferv = clCreateBuffer(context, CL_MEM_READ_WRITE, tmpBufferSize, NULL, &ret);
if (ret != CL_SUCCESS)
printf("Error with tmpBuffer clCreateBuffer\n");
}
//kernel grid
fp = fopen("./grid.c", "r");
if (!fp) {fprintf(stderr, "Failed to load grid.\n"); exit(1); }
source_str = (char *)malloc(MAX_SOURCE_SIZE);
source_size = fread( source_str, 1, MAX_SOURCE_SIZE, fp );
fclose( fp );

p_grid = clCreateProgramWithSource(context, 1, (const char **)&source_str, (const size_t *)&source_size, &ret);
ret = clBuildProgram(p_grid, 1, &device_id, NULL, NULL, NULL);
grid = clCreateKernel(p_grid, "grid", &ret);
//first x
cl_x = clCreateBuffer(context, CL_MEM_READ_WRITE, Nx * sizeof(double), NULL, &ret);
ret = clSetKernelArg(grid, 0, sizeof(cl_mem), (void *)&cl_x);
ret = clSetKernelArg(grid, 1, sizeof(double),(void*)&Lx);
ret = clSetKernelArg(grid, 2, sizeof(int),(void*)&Nx);
size_t global_work_size_x[3] = {Nx, 0, 0};
ret = clEnqueueNDRangeKernel(command_queue, grid, 1, NULL, global_work_size_x, NULL, 0, NULL, NULL);
ret = clFinish(command_queue);
ret = clEnqueueReadBuffer(command_queue, cl_x, CL_TRUE, 0, Nx * sizeof(double), x, 0, NULL, NULL);
ret = clFinish(command_queue);
//then y
cl_y = clCreateBuffer(context, CL_MEM_READ_WRITE, Ny * sizeof(double), NULL, &ret);
ret = clSetKernelArg(grid, 0, sizeof(cl_mem), (void *)&cl_y);
ret = clSetKernelArg(grid, 1, sizeof(double),(void*)&Ly);
ret = clSetKernelArg(grid, 2, sizeof(int),(void*)&Ny);
size_t global_work_size_y[3] = {Ny, 0, 0};

ret = clEnqueueNDRangeKernel(command_queue, grid, 1, NULL, global_work_size_y, NULL, 0, NULL, NULL);
ret = clFinish(command_queue);
ret = clEnqueueReadBuffer(command_queue, cl_y, CL_TRUE, 0, Ny * sizeof(double), y, 0, NULL, NULL);
ret = clFinish(command_queue);

//last z
cl_z = clCreateBuffer(context, CL_MEM_READ_WRITE, Nz * sizeof(double), NULL, &ret);
ret = clSetKernelArg(grid, 0, sizeof(cl_mem), (void *)&cl_z);
ret = clSetKernelArg(grid, 1, sizeof(double),(void*)&Lz);
ret = clSetKernelArg(grid, 2, sizeof(int),(void*)&Nz);
size_t global_work_size_z[3] = {Nz, 0, 0};
ret = clEnqueueNDRangeKernel(command_queue, grid, 1, NULL, global_work_size_z, NULL, 0, NULL, NULL);
ret = clFinish(command_queue);
ret = clEnqueueReadBuffer(command_queue, cl_z, CL_TRUE, 0, Nz * sizeof(double), z, 0, NULL, NULL);
ret = clFinish(command_queue);
ret = clReleaseKernel(grid); ret = clReleaseProgram(p_grid);

//kernel initial data
fp = fopen("./initialdata.c", "r");
if (!fp) {fprintf(stderr, "Failed to load initialdata.\n"); exit(1); }
free(source_str);
source_str = (char *)malloc(MAX_SOURCE_SIZE);
source_size = fread( source_str, 1, MAX_SOURCE_SIZE, fp );
fclose( fp );

p_initialdata = clCreateProgramWithSource(context, 1, (const char **)&source_str, (const size_t *)&source_size, &ret);
ret = clBuildProgram(p_initialdata, 1, &device_id, NULL, NULL, NULL);
initialdata = clCreateKernel(p_initialdata, "initialdata", &ret);

ret = clSetKernelArg(initialdata, 0, sizeof(cl_mem),(void *)&cl_u[0]);
ret = clSetKernelArg(initialdata, 1, sizeof(cl_mem),(void* )&cl_v[0]);
ret = clSetKernelArg(initialdata, 2, sizeof(cl_mem),(void *)&cl_u[1]);
ret = clSetKernelArg(initialdata, 3, sizeof(cl_mem),(void* )&cl_v[1]);
ret = clSetKernelArg(initialdata, 4, sizeof(cl_mem),(void* )&cl_x);
ret = clSetKernelArg(initialdata, 5, sizeof(cl_mem),(void* )&cl_y);
ret = clSetKernelArg(initialdata, 6, sizeof(cl_mem),(void* )&cl_z);
ret = clSetKernelArg(initialdata, 7, sizeof(int),(void* )&Nx);
ret = clSetKernelArg(initialdata, 8, sizeof(int),(void* )&Ny);
ret = clSetKernelArg(initialdata, 9, sizeof(int),(void* )&Nz);
size_t global_work_size[3] = {N, 0, 0};
ret = clEnqueueNDRangeKernel(command_queue, initialdata, 1, NULL, global_work_size, NULL, 0, NULL, NULL);
ret = clFinish(command_queue);
ret = clReleaseKernel(initialdata); ret = clReleaseProgram(p_initialdata);
ret = clEnqueueReadBuffer(command_queue, cl_u[0], CL_TRUE, 0, N * sizeof(double), u[0], 0, NULL, NULL);
ret = clFinish(command_queue);
ret = clEnqueueReadBuffer(command_queue, cl_v[0], CL_TRUE, 0, N * sizeof(double), v[0], 0, NULL, NULL);
ret = clFinish(command_queue);
ret = clReleaseMemObject(cl_x);
ret = clReleaseMemObject(cl_y);
ret = clReleaseMemObject(cl_z);
//write to disk
fp=fopen("./data/xcoord.dat","w");
if (!fp) {fprintf(stderr, "Failed to write xcoord.dat.\n"); exit(1); }
for(i=0;i<Nx;i++){fprintf(fp,"%lf\n",x[i]);}
fclose( fp );
fp=fopen("./data/ycoord.dat","w");
if (!fp) {fprintf(stderr, "Failed to write ycoord.dat.\n"); exit(1); }
for(i=0;i<Ny;i++){fprintf(fp,"%lf\n",y[i]);}
fclose( fp );
fp=fopen("./data/zcoord.dat","w");
if (!fp) {fprintf(stderr, "Failed to write zcoord.dat.\n"); exit(1); }
for(i=0;i<Nz;i++){fprintf(fp,"%lf\n",z[i]);}
fclose( fp );
free(x); free(y); free(z);
n=0;
plotnum=0;
//output of initial data U
char tmp_str[10];
strcpy(nameconfig,"./data/u");
sprintf(tmp_str,"%d",10000000+plotnum);
strcat(nameconfig,tmp_str);
strcat(nameconfig,".datbin");
fp=fopen(nameconfig,"wb");
if (!fp) {fprintf(stderr, "Failed to write initialdata.\n"); exit(1); }
for(i=0;i<N;i++){fwrite(&u[0][i], sizeof(double), 1, fp);}
fclose( fp );
//V
strcpy(nameconfig,"./data/v");
sprintf(tmp_str,"%d",10000000+plotnum);
strcat(nameconfig,tmp_str);
strcat(nameconfig,".datbin");
fp=fopen(nameconfig,"wb");
if (!fp) {fprintf(stderr, "Failed to write initialdata.\n"); exit(1); }
for(i=0;i<N;i++){fwrite(&v[0][i], sizeof(double), 1, fp);}
fclose( fp );

//frequencies kernel

fp = fopen("./frequencies.c", "r");
if (!fp) {fprintf(stderr, "Failed to load frequencies.\n"); exit(1); }
free(source_str);
source_str = (char *)malloc(MAX_SOURCE_SIZE);
source_size = fread( source_str, 1, MAX_SOURCE_SIZE, fp );
fclose( fp );

p_frequencies = clCreateProgramWithSource(context, 1, (const char **)&source_str, (const size_t *)&source_size, &ret);
ret = clBuildProgram(p_frequencies, 1, &device_id, NULL, NULL, NULL);
frequencies = clCreateKernel(p_frequencies, "frequencies", &ret);
//get frequencies first x
cl_kx = clCreateBuffer(context, CL_MEM_READ_WRITE, Nx * sizeof(double), NULL, &ret);
ret = clSetKernelArg(frequencies, 0, sizeof(cl_mem), (void *)&cl_kx);
ret = clSetKernelArg(frequencies, 1, sizeof(double),(void*)&Lx);
ret = clSetKernelArg(frequencies, 2, sizeof(int),(void*)&Nx);
ret = clEnqueueNDRangeKernel(command_queue, frequencies, 1, NULL, global_work_size_x, NULL, 0, NULL, NULL);
ret = clFinish(command_queue);
//then y
cl_ky = clCreateBuffer(context, CL_MEM_READ_WRITE, Ny * sizeof(double), NULL, &ret);
ret = clSetKernelArg(frequencies, 0, sizeof(cl_mem), (void *)&cl_ky);
ret = clSetKernelArg(frequencies, 1, sizeof(double),(void*)&Ly);
ret = clSetKernelArg(frequencies, 2, sizeof(int),(void*)&Ny);
ret = clEnqueueNDRangeKernel(command_queue, frequencies, 1, NULL, global_work_size_y, NULL, 0, NULL, NULL);
ret = clFinish(command_queue);
//last z
cl_kz = clCreateBuffer(context, CL_MEM_READ_WRITE, Nz * sizeof(double), NULL, &ret);
ret = clSetKernelArg(frequencies, 0, sizeof(cl_mem), (void *)&cl_kz);
ret = clSetKernelArg(frequencies, 1, sizeof(double),(void*)&Lz);
ret = clSetKernelArg(frequencies, 2, sizeof(int),(void*)&Nz);
ret = clEnqueueNDRangeKernel(command_queue, frequencies, 1, NULL, global_work_size_z, NULL, 0, NULL, NULL);
ret = clFinish(command_queue);

printf("Setup grid, fourier frequencies and initialcondition\n");
//load the rest of the kernels
//linearpart kernel
fp = fopen("./linearpart.c", "r");
if (!fp) {fprintf(stderr, "Failed to load linearpart.\n"); exit(1); }
free(source_str);
source_str = (char *)malloc(MAX_SOURCE_SIZE);
source_size = fread( source_str, 1, MAX_SOURCE_SIZE, fp );
fclose( fp );

p_linearpart = clCreateProgramWithSource(context, 1, (const char **)&source_str, (const size_t *)&source_size, &ret);
ret = clBuildProgram(p_linearpart, 1, &device_id, NULL, NULL, NULL);
linearpart = clCreateKernel(p_linearpart, "linearpart", &ret);

//kernel nonlinear
fp = fopen("./nonlinearpart.c", "r");
if (!fp) {fprintf(stderr, "Failed to load nonlinearpart.\n"); exit(1); }
free(source_str);
source_str = (char *)malloc(MAX_SOURCE_SIZE);
source_size = fread( source_str, 1, MAX_SOURCE_SIZE, fp );
fclose( fp );

p_nonlinearpart = clCreateProgramWithSource(context, 1, (const char **)&source_str, (const size_t *)&source_size, &ret);
ret = clBuildProgram(p_nonlinearpart, 1, &device_id, NULL, NULL, NULL);
nonlinearpart = clCreateKernel(p_nonlinearpart, "nonlinearpart", &ret);

printf("Got initial data, starting timestepping\n");
gettimeofday(&tvs, NULL);
for(n=0;n<=Tmax;n++){
//linear
ret = clfftEnqueueTransform(planHandle, CLFFT_FORWARD, 1, &command_queue, 0, NULL, NULL,cl_u, cl_uhat, tmpBufferu);
ret = clfftEnqueueTransform(planHandle, CLFFT_FORWARD, 1, &command_queue, 0, NULL, NULL,cl_v, cl_vhat, tmpBufferv);
ret = clFinish(command_queue);

ret = clSetKernelArg(linearpart, 0, sizeof(cl_mem),(void *)&cl_uhat[0]);
ret = clSetKernelArg(linearpart, 1, sizeof(cl_mem),(void *)&cl_uhat[1]);
ret = clSetKernelArg(linearpart, 2, sizeof(cl_mem),(void *)&cl_vhat[0]);
ret = clSetKernelArg(linearpart, 3, sizeof(cl_mem),(void *)&cl_vhat[1]);
ret = clSetKernelArg(linearpart, 4, sizeof(cl_mem),(void* )&cl_kx);
ret = clSetKernelArg(linearpart, 5, sizeof(cl_mem),(void* )&cl_ky);
ret = clSetKernelArg(linearpart, 6, sizeof(cl_mem),(void* )&cl_kz);
ret = clSetKernelArg(linearpart, 7, sizeof(double),(void* )&dt);
ret = clSetKernelArg(linearpart, 8, sizeof(double),(void* )&Du);
ret = clSetKernelArg(linearpart, 9, sizeof(double),(void* )&Dv);
ret = clSetKernelArg(linearpart, 10, sizeof(double),(void* )&A);
ret = clSetKernelArg(linearpart, 11, sizeof(double),(void* )&B);
ret = clSetKernelArg(linearpart, 12, sizeof(double),(void* )&b[0]);
ret = clSetKernelArg(linearpart, 13, sizeof(double),(void* )&b[1]);
ret = clSetKernelArg(linearpart, 14, sizeof(int),(void* )&Nx);
ret = clSetKernelArg(linearpart, 15, sizeof(int),(void* )&Ny);
ret = clSetKernelArg(linearpart, 16, sizeof(int),(void* )&Nz);
ret = clEnqueueNDRangeKernel(command_queue, linearpart, 1, NULL, global_work_size, NULL, 0, NULL, NULL);
ret = clFinish(command_queue);

ret = clfftEnqueueTransform(planHandle, CLFFT_BACKWARD, 1, &command_queue, 0, NULL, NULL,cl_uhat, cl_u, tmpBufferu);
ret = clfftEnqueueTransform(planHandle, CLFFT_BACKWARD, 1, &command_queue, 0, NULL, NULL,cl_vhat, cl_v, tmpBufferv);
ret = clFinish(command_queue);
//nonlinearpart
ret = clSetKernelArg(nonlinearpart, 0, sizeof(cl_mem),(void *)&cl_u[0]);
ret = clSetKernelArg(nonlinearpart, 1, sizeof(cl_mem),(void *)&cl_u[1]);
ret = clSetKernelArg(nonlinearpart, 2, sizeof(cl_mem),(void* )&cl_v[0]);
ret = clSetKernelArg(nonlinearpart, 3, sizeof(cl_mem),(void* )&cl_v[1]);
ret = clSetKernelArg(nonlinearpart, 4, sizeof(double),(void* )&dt);
ret = clSetKernelArg(nonlinearpart, 5, sizeof(double),(void* )&a[0]);
ret = clSetKernelArg(nonlinearpart, 6, sizeof(double),(void* )&a[1]);
ret = clEnqueueNDRangeKernel(command_queue, nonlinearpart, 1, NULL, global_work_size, NULL, 0, NULL, NULL);
ret = clFinish(command_queue);
// linear part
ret = clfftEnqueueTransform(planHandle, CLFFT_FORWARD, 1, &command_queue, 0, NULL, NULL,cl_u, cl_uhat, tmpBufferu);
ret = clfftEnqueueTransform(planHandle, CLFFT_FORWARD, 1, &command_queue, 0, NULL, NULL,cl_v, cl_vhat, tmpBufferv);
ret = clFinish(command_queue);

ret = clSetKernelArg(linearpart, 0, sizeof(cl_mem),(void *)&cl_uhat[0]);
ret = clSetKernelArg(linearpart, 1, sizeof(cl_mem),(void *)&cl_uhat[1]);
ret = clSetKernelArg(linearpart, 2, sizeof(cl_mem),(void *)&cl_vhat[0]);
ret = clSetKernelArg(linearpart, 3, sizeof(cl_mem),(void *)&cl_vhat[1]);
ret = clSetKernelArg(linearpart, 4, sizeof(cl_mem),(void* )&cl_kx);
ret = clSetKernelArg(linearpart, 5, sizeof(cl_mem),(void* )&cl_ky);
ret = clSetKernelArg(linearpart, 6, sizeof(cl_mem),(void* )&cl_kz);
ret = clSetKernelArg(linearpart, 7, sizeof(double),(void* )&dt);
ret = clSetKernelArg(linearpart, 8, sizeof(double),(void* )&Du);
ret = clSetKernelArg(linearpart, 9, sizeof(double),(void* )&Dv);
ret = clSetKernelArg(linearpart, 10, sizeof(double),(void* )&A);
ret = clSetKernelArg(linearpart, 11, sizeof(double),(void* )&B);
ret = clSetKernelArg(linearpart, 12, sizeof(double),(void* )&b[0]);
ret = clSetKernelArg(linearpart, 13, sizeof(double),(void* )&b[1]);
ret = clSetKernelArg(linearpart, 14, sizeof(int),(void* )&Nx);
ret = clSetKernelArg(linearpart, 15, sizeof(int),(void* )&Ny);
ret = clSetKernelArg(linearpart, 16, sizeof(int),(void* )&Nz);
ret = clEnqueueNDRangeKernel(command_queue, linearpart, 1, NULL, global_work_size, NULL, 0, NULL, NULL);
ret = clFinish(command_queue);

ret = clfftEnqueueTransform(planHandle, CLFFT_BACKWARD, 1, &command_queue, 0, NULL, NULL,cl_uhat, cl_u, tmpBufferu);
ret = clfftEnqueueTransform(planHandle, CLFFT_BACKWARD, 1, &command_queue, 0, NULL, NULL,cl_vhat, cl_v, tmpBufferv);
ret = clFinish(command_queue);
// done
if(n==plottime){
printf("time:%lf, step:%d,%d\n",n*dt,n,plotnum);
plottime=plottime+plotgap;
plotnum=plotnum+1;
ret = clEnqueueReadBuffer(command_queue, cl_u[0], CL_TRUE, 0, N * sizeof(double), u[0], 0, NULL, NULL);
ret = clEnqueueReadBuffer(command_queue, cl_v[0], CL_TRUE, 0, N * sizeof(double), v[0], 0, NULL, NULL);
ret = clFinish(command_queue);
//output of data U
char tmp_str[10];
strcpy(nameconfig,"./data/u");
sprintf(tmp_str,"%d",10000000+plotnum);
strcat(nameconfig,tmp_str);
strcat(nameconfig,".datbin");
fp=fopen(nameconfig,"wb");
if (!fp) {fprintf(stderr, "Failed to write u-data.\n"); exit(1); }
for(i=0;i<N;i++){fwrite(&u[0][i], sizeof(double), 1, fp);}
fclose( fp );
//V
strcpy(nameconfig,"./data/v");
sprintf(tmp_str,"%d",10000000+plotnum);
strcat(nameconfig,tmp_str);
strcat(nameconfig,".datbin");
fp=fopen(nameconfig,"wb");
if (!fp) {fprintf(stderr, "Failed to write v-data.\n"); exit(1); }
for(i=0;i<N;i++){fwrite(&v[0][i], sizeof(double), 1, fp);}
fclose( fp );
}
}
gettimeofday(&tve, NULL);
printf("Finished time stepping\n");
elapsedTime = (tve.tv_sec - tvs.tv_sec) * 1000.0;      // sec to ms
elapsedTime += (tve.tv_usec - tvs.tv_usec) / 1000.0;   // us to ms
printf("Program took %lfms for execution\n",elapsedTime);

clReleaseMemObject(cl_u[0]);
clReleaseMemObject(cl_u[1]);
clReleaseMemObject(cl_v[0]);
clReleaseMemObject(cl_v[1]);
clReleaseMemObject(cl_uhat[0]);
clReleaseMemObject(cl_uhat[1]);
clReleaseMemObject(cl_vhat[0]);
clReleaseMemObject(cl_vhat[1]);
clReleaseMemObject(cl_kx);
clReleaseMemObject(cl_ky);
clReleaseMemObject(cl_kz);
ret = clReleaseKernel(frequencies); ret = clReleaseProgram(p_frequencies);
ret = clReleaseKernel(linearpart); ret = clReleaseProgram(p_linearpart);
ret = clReleaseKernel(nonlinearpart); ret = clReleaseProgram(p_nonlinearpart);
free(u[0]);
free(v[0]);
clReleaseMemObject(tmpBufferu);
clReleaseMemObject(tmpBufferv);
/* Release the plan. */
ret = clfftDestroyPlan(&planHandle);
/* Release clFFT library. */
clfftTeardown();

ret = clReleaseCommandQueue(command_queue);
ret = clReleaseContext(context);
printf("Program execution complete\n");

return 0;
}


Now all the OpenCL-Kernel's, which sould be saved in the same folder. Some of them look very hard to read, but that is due to the lack of a native implementation for complexnumbers.

( D)

#if defined(cl_amd_fp64) || defined(cl_khr_fp64)
#if defined(cl_amd_fp64)
#pragma OPENCL EXTENSION cl_amd_fp64 : enable
#elif defined(cl_khr_fp64)
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
#endif
// function declarations/definitions using double precision doubleing-point arithmetic
#endif
__kernel void linearpart ( __global double* uhatre,__global double* uhatim,__global double* vhatre,__global double* vhatim,__global const double* Kx,__global const double* Ky,__global const double* Kz, const double dt, const double Du,const double Dv, const double A,const double B,const double bhighre,const double bhighim , const int Nx, const int Ny, const int Nz)
{
const int ind = get_global_id(0);

int i,j,k;
k=floor((double)ind/(double)(Ny*Nx));
j=floor((double)(ind-k*(Ny*Nx))/(double)Nx);
i=ind-k*(Ny*Nx)-j*Nx;
double uexp[2];
uexp[0]=dt*bhighre*(-1.0*A+Du*(Kx[i]+Ky[j]+Kz[k]));
uexp[1]=dt*bhighim*(-1.0*A+Du*(Kx[i]+Ky[j]+Kz[k]));
double vexp[2];
vexp[0]=dt*bhighre*(-1.0*B+Dv*(Kx[i]+Ky[j]+Kz[k]));
vexp[1]=dt*bhighim*(-1.0*B+Dv*(Kx[i]+Ky[j]+Kz[k]));
if(ind==0){
double N=(double)Nx*Ny*Nz;

uhatre[ind]=exp(uexp[0])*(((uhatre[ind]-N)*cos(uexp[1]))-(uhatim[ind]*sin(uexp[1])))+N;
uhatim[ind]=exp(uexp[0])*(((uhatre[ind]-N)*sin(uexp[1]))+(uhatim[ind]*cos(uexp[1])));
}

else{
uhatre[ind]=exp(uexp[0])*((uhatre[ind]*cos(uexp[1]))-(uhatim[ind]*sin(uexp[1])));
uhatim[ind]=exp(uexp[0])*((uhatre[ind]*sin(uexp[1]))+(uhatim[ind]*cos(uexp[1])));
}
vhatre[ind]=exp(vexp[0])*((vhatre[ind]*cos(vexp[1]))-(vhatim[ind]*sin(vexp[1])));
vhatim[ind]=exp(vexp[0])*((vhatre[ind]*sin(vexp[1]))+(vhatim[ind]*cos(vexp[1])));
}


( E)

#if defined(cl_amd_fp64) || defined(cl_khr_fp64)
#if defined(cl_amd_fp64)
#pragma OPENCL EXTENSION cl_amd_fp64 : enable
#elif defined(cl_khr_fp64)
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
#endif
// function declarations/definitions using double precision doubleing-point arithmetic
#endif
__kernel void nonlinearpart ( __global double* ure,__global double* uim,__global double* vre,__global double* vim, const double dt, const double are,const double aim)
{
const int ind = get_global_id(0);
const double tol=pown(0.1,12);//0.000000000001;
double chg=1;
const double uoldre=ure[ind];
const double uoldim=uim[ind];
const double voldre=vre[ind];
const double voldim=vim[ind];
double utempre,utempim,vtempre,vtempim;
double uMre,uMim,vMre,vMim;
while(chg>tol){
utempre=ure[ind];
utempim=uim[ind];
vtempre=vre[ind];
vtempim=vim[ind];

uMre=0.5*(ure[ind]+uoldre);
uMim=0.5*(uim[ind]+uoldim);
vMre=0.5*(vre[ind]+voldre);
vMim=0.5*(vim[ind]+voldim);

ure[ind]=uoldre-dt*are*uMre*vMre*vMre+dt*are*uMre*vMim*vMim+2.0*dt*are*uMim*vMre*vMim+2.0*dt*aim*uMre*vMre*vMim+dt*aim*uMim*vMre*vMre-dt*aim*uMim*vMim*vMim;

uim[ind]=uoldim-dt*aim*uMre*vMre*vMre+dt*are*uMim*vMim*vMim+2.0*dt*aim*uMim*vMre*vMim-2.0*dt*are*uMre*vMre*vMim-dt*are*uMim*vMre*vMre+dt*aim*uMre*vMim*vMim;

vre[ind]=voldre+dt*are*uMre*vMre*vMre-dt*are*uMre*vMim*vMim-2.0*dt*are*uMim*vMre*vMim-2.0*dt*aim*uMre*vMre*vMim-dt*aim*uMim*vMre*vMre+dt*aim*uMim*vMim*vMim;

vim[ind]=voldim+dt*aim*uMre*vMre*vMre-dt*are*uMim*vMim*vMim-2.0*dt*aim*uMim*vMre*vMim+2.0*dt*are*uMre*vMre*vMim+dt*are*uMim*vMre*vMre-dt*aim*uMre*vMim*vMim;

chg=sqrt((ure[ind]-utempre)*(ure[ind]-utempre)+(uim[ind]-utempim)*(uim[ind]-utempim))+
sqrt((vre[ind]-vtempre)*(vre[ind]-vtempre)+(vim[ind]-vtempim)*(vim[ind]-vtempim));
}
}


( F)

#if defined(cl_amd_fp64) || defined(cl_khr_fp64)
#if defined(cl_amd_fp64)
#pragma OPENCL EXTENSION cl_amd_fp64 : enable
#elif defined(cl_khr_fp64)
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
#endif
// function declarations/definitions using double precision doubleing-point arithmetic
#endif
__kernel void grid ( __global double* x, const double Lx, const int Nx)
{
const int ind = get_global_id(0);
x[ind]=(-1.0 + ((double) 2.0*ind/(double)Nx))*M_PI*Lx;

}


( G)

#if defined(cl_amd_fp64) || defined(cl_khr_fp64)
#if defined(cl_amd_fp64)
#pragma OPENCL EXTENSION cl_amd_fp64 : enable
#elif defined(cl_khr_fp64)
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
#endif
// function declarations/definitions using double precision doubleing-point arithmetic
#endif
__kernel void initialdata ( __global double* ure, __global double* vre, __global double* uim, __global double* vim, __global const double* x ,__global const double* y ,__global const double* z , const int Nx, const int Ny, const int Nz)
{
const int ind = get_global_id(0);

int i,j,k;
k=floor((double)ind/(double)(Ny*Nx));
j=floor((double)(ind-k*(Ny*Nx))/(double)Nx);
i=ind-k*(Ny*Nx)-j*Nx;
ure[ind]=0.5+exp(-1.0*(x[i]*x[i]+y[j]*y[j]+z[k]*z[k] )-1.0);//
vre[ind]=0.1+exp(-1.0*(x[i]*x[i]+y[j]*y[j]+z[k]*z[k] )-1.0);
uim[ind]=0.0;
vim[ind]=0.0;
}


( H)

#if defined(cl_amd_fp64) || defined(cl_khr_fp64)
#if defined(cl_amd_fp64)
#pragma OPENCL EXTENSION cl_amd_fp64 : enable
#elif defined(cl_khr_fp64)
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
#endif
// function declarations/definitions using double precision doubleing-point arithmetic
#endif
__kernel void frequencies ( __global double* kx, const double Lx, const int Nx)
{
const int ind = get_global_id(0);
if ( ind < Nx/2) kx[ind] = -1.0*((double)((ind))/Lx)*((double)( ind)/Lx);
if ( ind ==Nx/2) kx[ ind]=0.0;
if ( ind > Nx/2) kx[ ind]=-1.0*(double)(Nx-ind)/Lx*(double)(Nx-ind)/Lx;
}


( I)

####### Compiler, tools and options

CC            = gcc
CFLAGS        = -m64 -pipe -O2 -Wno-unused-but-set-parameter -Wall
#OPENCL        = /opt/AMDAPP
#CLFFT         = /opt/clFFT
#INC_OPENCL    = -I$(OPENCL)/include #INC_CLFFT = -I$(CLFFT)/include
#INCPATH       = $(INC_OPENCL)$(INC_CLFFT)
#LIBS_CLFFT    = -L$(CLFFT)/lib64 -lclFFT -L$(OPENCL)/lib/x86_64 -lOpenCL
DEL_FILE      = rm -f

LIBS = -lOpenCL -I../../clFFT/src/package/include/ -L../../clFFT/src/package/lib64 -lclFFT -lm

#

####### Build rules

CL: grayscottOpenCLs.c
$(DEL_FILE) grayscottOpenCLs$(DEL_FILE) *.o
$(CC)$(CFLAGS) -o grayscottOpenCLs grayscottOpenCLs.c $(LIBS) export LD_LIBRARY_PATH=../../clFFT/src/package/lib64:$$LD_LIBRARY_PATH export DISPLAY=:0 CL43: grayscottOpenCL43.c (DEL_FILE) grayscottOpenCL43 (DEL_FILE) *.o (CC) (CFLAGS) -o grayscottOpenCL43 grayscottOpenCL43.c (LIBS) export LD_LIBRARY_PATH=../../clFFT/src/package/lib64:$$LD_LIBRARY_PATH CL4k: grayscottOpenCL4k.c$(DEL_FILE) grayscottOpenCL4k
$(DEL_FILE) *.o$(CC) $(CFLAGS) -o grayscottOpenCL4k grayscottOpenCL4k.c$(LIBS)
export LD_LIBRARY_PATH=../../clFFT/src/package/lib64:LD_LIBRARY_PATH

info: infocl.c
$(CC)$(CFLAGS) -o infocl infocl.c $(LIBS) out: xdmfcreate.f90 gfortran -o out xdmfcreate.f90 outa: xdmfcreatea.f90 gfortran -o outa xdmfcreatea.f90 clean:$(DEL_FILE) grayscottOpenCL43
$(DEL_FILE) grayscottOpenCLs$(DEL_FILE) infocl
$(DEL_FILE) out$(DEL_FILE) outa
$(DEL_FILE) *.xmf$(DEL_FILE) *.o


To run you need the INPUTFILE with the parameters.

( J)

2048
2048
1
4
2
4.0
4.0
4.0
0.010
0.90
0.005
0.038
0.060

Nx
Ny
Nt
Plotgap
Lx
Ly
DT
Du
Dv
F
k

To view the data, compile and run xdmfcreate.f90. Open udata.xdmf and vdata.xdmf in ParaView.


( K)

	PROGRAM XdmfCreate
!--------------------------------------------------------------------------------
! .. Purpose ..
! XdmfCreate is a postprocessing program which creates header files for
! Paraview and Visit
! It uses the INPUTFILE and assumes that the filenames in the program
! are
! consistent with those in the current file.
!
! .. PARAMETERS .. INITIALIZED IN INPUTFILE
! Nx = power of two, number of modes in the x direction
! Ny = power of two, number of modes in the y direction
! Nz = power of two, number of modes in the z direction
! Nt = the number of timesteps
! plotgap = the number of timesteps to take before plotting
! Lx = definition of the periodic domain of computation in x direction
! Ly = definition of the periodic domain of computation in y direction
! Lz = definition of the periodic domain of computation in z direction
! Dt = the time step
!
! REFERENCES
!
! www.xdmf.org
! Paraview users mailing list
! VisIt users mailing list
!
! ACCURACY
!
! ERROR INDICATORS AND WARNINGS
!
!------------------------------------------------------------------------------------
! EXTERNAL ROUTINES REQUIRED
IMPLICIT NONE
! .. Scalar Arguments ..
INTEGER(KIND=4) :: Nx, Ny, Nz, Nt, plotgap
REAL(KIND=8)    :: Lx, Ly, Lz, DT
! .. Local scalars ..
INTEGER(KIND=4) :: stat,plotnum,ind,n,numplots
REAL(KIND=8), PARAMETER :: pi=3.14159265358979323846264338327950288419716939937510d0
! .. Local Arrays ..
CHARACTER*50    :: InputFileName, OutputFileName, OutputFileName2
CHARACTER*10    :: number_file
InputFileName='INPUTFILE'
OPEN(unit=11,FILE=trim(InputFileName),status="OLD")
REWIND(11)
READ(11,*) Nx, Ny, Nz, Nt, plotgap, Lx, Ly, Lz, DT
CLOSE(11)
plotnum=0
numplots=1+Nt/plotgap
OutputFileName='udata.xmf'
OPEN(unit=11,FILE=trim(OutputFileName),status="UNKNOWN")
REWIND(11)
WRITE(11,'(a)') "<?xml version=""1.0"" ?>"
WRITE(11,'(a)') "<!DOCTYPE Xdmf SYSTEM ""Xdmf.dtd"" []>"
WRITE(11,'(a)') "<Xdmf xmlns:xi=""http://www.w3.org/2001/XInclude"" Version=""2.0"">"
WRITE(11,'(a)') "<Domain>"
WRITE(11,'(a)') " <Topology name=""topo"" TopologyType=""3DCoRectMesh"""
WRITE(11,*) "  Dimensions=""",Nx,Ny,Nz,""">"
WRITE(11,'(a)') " </Topology>"
WRITE(11,'(a)') " <Geometry name=""geo"" Type=""ORIGIN_DXDYDZ"">"
WRITE(11,'(a)') "  <!-- Origin -->"
WRITE(11,'(a)') "  <DataItem Format=""XML"" Dimensions=""3"">"
WRITE(11,*) -pi*Lx, -pi*Ly, -pi*Lz
WRITE(11,'(a)') "  </DataItem>"
WRITE(11,'(a)') "  <!-- DxDyDz -->"
WRITE(11,'(a)') "  <DataItem Format=""XML"" Dimensions=""3"">"
WRITE(11,*) REAL(2.0*pi*Lx/Nx,kind(0d0)), REAL(2.0*pi*Ly/Ny,kind(0d0)), REAL(2.0*pi*Lz/Nz,kind(0d0))
WRITE(11,'(a)') "   </DataItem>"
WRITE(11,'(a)') "  </Geometry>"
WRITE(11,'(a)')
WRITE(11,'(a)') "       <Grid Name=""TimeSeries"" GridType=""Collection"" CollectionType=""Temporal"">"
WRITE(11,'(a)') "               <Time TimeType=""HyperSlab"">"
WRITE(11,'(a)') "                       <DataItem Format=""XML"" NumberType=""Float"" Dimensions=""3"">"
WRITE(11,*) "                   0.0", dt," 2"
WRITE(11,'(a)') "                       </DataItem>"
WRITE(11,'(a)') "               </Time>"

DO n=1,numplots

OutputFileName = 'data/u'
ind = index(OutputFileName,' ') - 1
WRITE(number_file,'(i0)') 10000000+plotnum
OutputFileName = OutputFileName(1:ind)//number_file
ind = index(OutputFileName,' ') - 1
OutputFileName = OutputFileName(1:ind)//'.datbin'

OutputFileName2 = "<Grid Name=""T"
WRITE(number_file,'(i0)') n
OutputFileName2 = OutputFileName2(1:13)//number_file
ind =  index(number_file,' ') - 1
OutputFileName2 = OutputFileName2(1:13+ind)//'" GridType="Uniform">'
plotnum=plotnum+1

WRITE(11,'(a)')
WRITE(11,'(3X,a)') "   ",OutputFileName2
WRITE(11,'(a)') "       <Topology Reference=""/Xdmf/Domain/Topology[1]""/>"
WRITE(11,'(a)') "       <Geometry Reference=""/Xdmf/Domain/Geometry[1]""/>"
WRITE(11,'(a)') "        <Attribute Name=""Displacement"" Center=""Node"">"
WRITE(11,'(a)') "               <DataItem Format=""Binary"" "
WRITE(11,'(a)') "                DataType=""Float"" Precision=""8"" Endian=""Little"" "
WRITE(11,*) "            Dimensions=""",Nx, Ny, Nz,""">"
WRITE(11,'(16X,a)') "           ",OutputFileName
WRITE(11,'(a)') "               </DataItem>"
WRITE(11,'(a)') "       </Attribute>"
WRITE(11,'(3X,a)') "</Grid>"
END DO
WRITE(11,'(a)')"        </Grid>"
WRITE(11,'(a)') "</Domain>"
WRITE(11,'(a)') "</Xdmf>"
CLOSE(11)
!same for v
plotnum=0
numplots=1+Nt/plotgap
OutputFileName='vdata.xmf'
OPEN(unit=11,FILE=trim(OutputFileName),status="UNKNOWN")
REWIND(11)
WRITE(11,'(a)') "<?xml version=""1.0"" ?>"
WRITE(11,'(a)') "<!DOCTYPE Xdmf SYSTEM ""Xdmf.dtd"" []>"
WRITE(11,'(a)') "<Xdmf xmlns:xi=""http://www.w3.org/2001/XInclude"" Version=""2.0"">"
WRITE(11,'(a)') "<Domain>"
WRITE(11,'(a)') " <Topology name=""topo"" TopologyType=""3DCoRectMesh"""
WRITE(11,*) "  Dimensions=""",Nx,Ny,Nz,""">"
WRITE(11,'(a)') " </Topology>"
WRITE(11,'(a)') " <Geometry name=""geo"" Type=""ORIGIN_DXDYDZ"">"
WRITE(11,'(a)') "  <!-- Origin -->"
WRITE(11,'(a)') "  <DataItem Format=""XML"" Dimensions=""3"">"
WRITE(11,*) -pi*Lx, -pi*Ly, -pi*Lz
WRITE(11,'(a)') "  </DataItem>"
WRITE(11,'(a)') "  <!-- DxDyDz -->"
WRITE(11,'(a)') "  <DataItem Format=""XML"" Dimensions=""3"">"
WRITE(11,*) REAL(2.0*pi*Lx/Nx,kind(0d0)), REAL(2.0*pi*Ly/Ny,kind(0d0)), REAL(2.0*pi*Lz/Nz,kind(0d0))
WRITE(11,'(a)') "   </DataItem>"
WRITE(11,'(a)') "  </Geometry>"
WRITE(11,'(a)')
WRITE(11,'(a)') "       <Grid Name=""TimeSeries"" GridType=""Collection"" CollectionType=""Temporal"">"
WRITE(11,'(a)') "               <Time TimeType=""HyperSlab"">"
WRITE(11,'(a)') "                       <DataItem Format=""XML"" NumberType=""Float"" Dimensions=""3"">"
WRITE(11,*) "                   0.0", dt," 2"
WRITE(11,'(a)') "                       </DataItem>"
WRITE(11,'(a)') "               </Time>"

DO n=1,numplots

OutputFileName = 'data/v'
ind = index(OutputFileName,' ') - 1
WRITE(number_file,'(i0)') 10000000+plotnum
OutputFileName = OutputFileName(1:ind)//number_file
ind = index(OutputFileName,' ') - 1
OutputFileName = OutputFileName(1:ind)//'.datbin'

OutputFileName2 = "<Grid Name=""T"
WRITE(number_file,'(i0)') n
OutputFileName2 = OutputFileName2(1:13)//number_file
ind =  index(number_file,' ') - 1
OutputFileName2 = OutputFileName2(1:13+ind)//'" GridType="Uniform">'
plotnum=plotnum+1

WRITE(11,'(a)')
WRITE(11,'(3X,a)') "   ",OutputFileName2
WRITE(11,'(a)') "       <Topology Reference=""/Xdmf/Domain/Topology[1]""/>"
WRITE(11,'(a)') "       <Geometry Reference=""/Xdmf/Domain/Geometry[1]""/>"
WRITE(11,'(a)') "        <Attribute Name=""Displacement"" Center=""Node"">"
WRITE(11,'(a)') "               <DataItem Format=""Binary"" "
WRITE(11,'(a)') "                DataType=""Float"" Precision=""8"" Endian=""Little"" "
WRITE(11,*) "            Dimensions=""",Nx, Ny, Nz,""">"
WRITE(11,'(16X,a)') "           ",OutputFileName
WRITE(11,'(a)') "               </DataItem>"
WRITE(11,'(a)') "       </Attribute>"
WRITE(11,'(3X,a)') "</Grid>"
END DO
WRITE(11,'(a)')"        </Grid>"
WRITE(11,'(a)') "</Domain>"
WRITE(11,'(a)') "</Xdmf>"
CLOSE(11)
END PROGRAM XdmfCreate