# Message-Passing Interface/MPI function reference

```int MPI_Send( void *buf, int count, MPI_Datatype datatype, int dest,
int tag, MPI_Comm comm )
```

This sends the contents of `buf` to the destination of rank `dest` while the receiving end calls `MPI_Recv`.

```int MPI_Recv( void *buf, int count, MPI_Datatype datatype, int source,
int tag, MPI_Comm comm, MPI_Status *status )
```

This fills the `buf` with data comming from to the source of rank `source` while the sender calls `MPI_Send`.

```int MPI_Bcast ( void *buffer, int count, MPI_Datatype datatype, int root,
MPI_Comm comm )
```

This sends the contents of `buffer` on `root` to all other processes. So afterwards the first `count` elements of `buffer` is the same across all nodes. [1]

The performance of `MPI_Bcast` can be between $O(N)$ and $O(\log_2 N)$.[2][3]

## MPI 2.0 Connection commands

```int MPI_Open_port(MPI_Info info, char *port_name)
```

This creates a port to which other processes can connect. The buffer passed as `port_name` string must be at least `MPI_MAX_PORT_NAME` long and will contain a unique identifier which other processes will need to know in order to connect.

```int MPI_Comm_accept(char *port_name, MPI_Info info, int root, MPI_Comm comm,
MPI_Comm *newcomm)
```

After calling `MPI_Open_port()`, this function waits for a connection.

```int MPI_Comm_connect(char *port_name, MPI_Info info, int root, MPI_Comm comm,
MPI_Comm *newcomm)
```

This opens a connection to another process which is waiting on `MPI_Comm_accept()`. The `port_name` argument must be the same as the result of the other process's `port_name` returned from `MPI_Open_port`.

## Reduction

The following functions reduce arrays of data across processors to scalars on one or many processors by applying simple functions such as summation.

```int MPI_Scan ( void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype,
MPI_Op op, MPI_Comm comm )
```

```int MPI_Allreduce ( void *sendbuf, void *recvbuf, int count,
MPI_Datatype datatype, MPI_Op op, MPI_Comm comm )
```

This performs an operation specified by `MPI_Op` on all nodes to each node's `sendbuf`. For example, if node 0 had `{0, 1, 2}` and node 1 had `{3, 4, 5}` in its `sendbuf`, respectively, and if both called `MPI_Allreduce(sendbuf, foo, 3, MPI_INT, MPI_SUM, world)`, the contents of the buffer pointed to by `foo` on both would be

{0+3, 1+4, 2+5} = {3, 5, 7}

Performance for P processes: $O(N\log P)$.[5]

```int MPI_Comm_split(MPI_Comm comm, int color, int key, MPI_Comm *newcomm);
```

`MPI_Comm_split()` creates a new communicator in each process. The resulting communicators are common to the processes that provided the same `color` argument. Processes can opt not to get a communicator by providing `MPI_UNDEFINED` as the `color`, which will produce `MPI_COMM_NULL` for that process.

For example, the following code splits `MPI_COMM_WORLD` into three communicators "colored" 0, 1, and 2.

```#include<iostream>
#include<mpi.h>

using namespace std;

int main(int argc, char** argv) {
MPI_Init(&argc, &argv);
int rank;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm comm;
MPI_Comm_split(MPI_COMM_WORLD, rank % 3, -rank*2, &comm); // The keys need not be positive or contiguous.
int rank2;
MPI_Comm_rank(comm, &rank2);
cout << "My rank was " << rank << " now " << rank2 << " color: " << rank % 3 << "\n";
MPI_Finalize();
}
```

Run with eight processes, this output the following (in undefined order):

```My rank was 0 now 2 color: 0
My rank was 1 now 2 color: 1
My rank was 2 now 1 color: 2
My rank was 3 now 1 color: 0
My rank was 4 now 1 color: 1
My rank was 5 now 0 color: 2
My rank was 6 now 0 color: 0
My rank was 7 now 0 color: 1
```

## Nonblocking asynchronous communication

The following functions work together to allow nonblocking asynchronous communications among processes.[6] One process sends while another receives. The sender must must check that the operation is complete before erasing the buffer. `MPI_Wait()` is a blocking wait whereas `MPI_Test` is nonblocking.

`MPI_Isend()` and `MPI_Irecv()` calls need not be in order. That is, process 42 could call `MPI_Irecv()` three times to begin receiving from processes 37, 38, and 39 which can send at their leisure.

```int MPI_Isend(void* buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm, MPI_Request *request);
```

---

```int MPI_Irecv(void* buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Request recvtag, MPI_Comm comm, MPI_Status *status);
```

```int MPI_Wait(MPI_Request *request, MPI_Status *status);
```

```int MPI_Test(MPI_Request  *request, int* flag, MPI_Status* status);
```

Example code:

```#include<iostream>
#include<mpi.h>

using namespace std;

int main(int argc, char** argv) {
MPI_Init(&argc, &argv);
int rank;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm comm;

int sources[] = {3,4,5};
int dest = 1;
int tag =42;
if (rank == sources[0] || rank == sources[1] || rank == sources[2]) {
double x[] = { 1*rank, 2*rank, 3*rank};
MPI_Request r;
MPI_Isend(x, 3, MPI_DOUBLE, dest, tag, MPI_COMM_WORLD, &r);
cout << "Process " << rank << " waiting...\n";
MPI_Status status;
MPI_Wait(&r, &status);
cout << "Process " << rank << " sent\n";
} else if(rank == dest) {
double x[3][3];
MPI_Request r[3];
for (int i = 0; i !=3; ++i) {
MPI_Irecv(x[i], 3, MPI_DOUBLE, sources[i], tag, MPI_COMM_WORLD, &r[i]);
cout << "Process " << rank << " waiting for " << sources[i] << " on recv.\
\n";
}
for (int i = 0; i !=3; ++i) {
MPI_Status status;
MPI_Wait(&r[i], &status);
cout << "Process " << rank << " got " << x[i][0] << " " << x[i][1] << " " <\
< x[i][2] << ".\n";
}
}

MPI_Finalize();
}
```

## Other functions

```int MPI_Init(int *argc, char ***argv);
int MPI_Finalize(void);
int MPI_Comm_rank(MPI_Comm comm, int *rank);
int MPI_Comm_size(MPI_Comm comm, int *size);
int MPI_Get_count(MPI_Status *status, MPI_Datatype datatype, int *count);
int MPI_Type_extent(MPI_Datatype datatype, MPI_Aint *extent);
int MPI_Type_struct(int count, int *array_of_blocklengths, MPI_Aint *array_of_displacements, MPI_Datatype *array_of_types, MPI_Datatype *newtype);
int MPI_Scatter(void* sendbuf, int sendcount, MPI_Datatype sendtype, void* recvbuf, int recvcount, MPI_Datatype recvcount, int root, MPI_Comm comm);  Performance potential: As good as log_2 (N) as bad as N. http://www.pdc.kth.se/training/Talks/MPI/Collective.I/less.html#perf_scatter_image
int MPI_Gather(void* sendbuf, int sendcount, MPI_Datatype sendype, void* recvbuf, int recvcount, MPI_Datatype recvtype, int root, MPI_Comm comm); -- $O(NP)$ [7]
```
• request);
```int MPI_Sendrecv(void* sendbuf, int sendcount, MPI_Datatype datatype, int dest, int sendtag, void* recvbuf, int recvcount, MPI_Datatype recvtype, int source, int recvtag, MPI_Comm comm, MPI_Status *status);
int MPI_Sendrecv_replace(void* buf, int count, MPI_Datatype datatype, int dest, int sendtag, int source, int  int MPI_Request_free(MPI_Request *request);
int MPI_Group_rank(MPI_Group group, int *rank);
int MPI_Group_size(MPI_Group group, int *size);
int MPI_Comm_group(MPI_Comm comm, MPI_Group *group);
int MPI_Group_free(MPI_Group *group);
int MPI_Group_incl(MPI_Group *group, int n, int *ranks, MPI_Group *newgroup);
int MPI_Comm_create(MPI_Comm comm, MPI_Group group, MPI_Comm *newgroup);
int MPI_Wtime(void);
int MPI_Get_processor_name(char *name, int *resultlen);
```