# Transportation Geography and Network Science/Network formation

## Models of Network Formation

This section will discuss Models of Network Formation also referred to as “Generative Network Models.” These models offer an explanation to the question of why a network should have a certain set of predetermined parameters (e.g. power-law degree distributions). That is, they model the mechanisms by which networks are created and explore the resultant network structures of certain hypothesized generative mechanisms. “If the structures are similar to those of networks we observe in the real world, it suggests-though does not prove-that similar generative mechanisms may be at work in real networks.” (Newman, 2010)

## Preferential Attachment

Many networks are observed to have degree distributions that approximately follow power laws, at least in the tail of the distribution. In the 1970s, Derek Price first proposed a network formation model that explained this observation which was inspired by the work of economist Herbert Simon and statistician Udny Yule (Yule Process). Price expanded Simon’s mathematical explanation of how the “rich-get-richer” (also a power-law distribution) to the network context, calling it “cumulative advantage” (more well known as preferential attachment coined by Barabasi and Albert in 1999).

### Price’s Model

Price specifically applied this process to a model of a citation network. In this model, papers are published continually (added one by one), yet not necessarily at a constant rate and newer papers cite older papers. The edges in this network are directional and are created but never destroyed. The central assumption of Price’s model is that a newly appearing paper cites previous ones chosen at random with probability proportional to the number of citations those previous papers already have plus a constant a, where a>0. Consider a price model network where $q_{i}$ denotes the in-degree of vertex i. Now suppose a new vertex is added to the network. Then the probability of the citation made to a particular vertex, say i is proportional to $q_{i}+a$ . Normalizing the above term, we get the probability of citing a vertex, say i by a newly introduced vertex is given by,

$p_{i}={\frac {q_{i}+a}{\sum (q_{i}+a)}}$ If average number of citations for a given paper is c, with certain variance according to the properties of the network then as as the network size becomes very large, the fraction of vertices in the network that have in-degree q becomes: (Newman, 2010 pp- 490-494) Random network (a) and scale-free network (b). In the scale-free network, the larger hubs are highlighted.

$p_{q}={\frac {B(q+a,2+a/c)}{B(a,1+a/c)}}$ Where:

q=in-degree

a=constant

c=average out-degree of network (or average bibliography size)

where beta function is defined as $B(x,y)={\frac {\Gamma (x)\Gamma (y)}{\Gamma (x+y)}}$ and gamma function as $\Gamma (x)=\int _{0}^{\infty }t^{x-1}e^{-t}\,dt$ Furthermore, the Beta function falls off as a power law for large values of x, with exponent y. Applying this, we find that for large values of q (q>>a) the degree distribution goes to:

$p_{q}\sim q^{-(2+{\frac {a}{c}})}$ Thus Price’s model for citation network gives rise to a degree distribution with a power-law tail. More generally speaking, each paper could be considered a node and each citation a link. Here, links are added to existing nodes at a probability proportional to the number of links already reaching that node.

### Model of Barabasi and Albert

Developed independently in the 1990s, the Barabasi-Albert model is the best known generative network model in use today. Similar to Price’s model, vertices are added one by one to a growing network and connected to existing vertices. However, these connections are undirected and the number of connections made by each vertex is exactly c (unlike Price’s model where c is allowed to vary from step to step). Connections are made in proportion to the undirected degree, k. By treating the Barabasi and Albert model as a special case of Price’s model, Newman, 2010 shows that:

$p_{k}={\frac {2c(c+1)}{k(k+1)(k+2)}}$ for k ≥ c

Where:

c=number of connections made by each vertex

k=degree of vertex

$p_{k}$ is the proportion of vertices with k degrees.

In this case, when k becomes very large the degree distribution becomes:

$p_{k}\sim k^{-3}$ ### Extensions and Further Properties of Preferential Attachment Models

Model Property/Extension Intuitive Rationale Resultant Distribution Contributor
Incorporating Time of Creation By preferential attachment, older vertices will have more time to acquire links. Contains leading algebraic factor, does not follow power-law distribution (Dorogovsev, Mendes, & Samukhim, 2000)
Sizes of In-Components Distribution of the set of vertices from which vertex i can be reached by following a directed path For component sizes << network size, follows power-law distribution (Newman, 2010)
Addition of Extra Edges Connections added between two existing vertices Power-law distribution (Krapivsky, Rodgers, & Redner, 2001)
Removal of Edges Considered reverse preferential attachment, higher degree are more likely to lose an edge Power-law distribution to a point, then stretched exponential (Moore, Ghoshal, & Newman, 2006)
Non-Linear Preferential Attachment Considers a non-linear attachment process rather than a linear process in the degree of the vertex Stretched exponential (Krapivsky, Redner, & Leyvraz, Connectivity of Growing Rrandom Networks, 2000)
Vertices of Varying Quality or Attractiveness Quality and attractiveness are incorporated into attachment process Depends on distribution of “fitness” values (Bianconi & Barabasi, 2001)

### Thought Question

What other networks’ formation could be accurately described using preferential attachment (considering either Price’s model or Barabasi-Albert)?

## Vertex Copying Models

Though preferential treatment offers a plausible explanation for power-law degree distributions, it is not the only means by which networks can grow (Newman, 2010). Suppose newly added vertices copied a fraction of the connections of an existing vertex and the remainders are filled out by other existing vertices (e.g. a new paper copied a fraction the bibliography of an existing paper). As analyzed by Newman, 2010, the degree distribution of a large network under this model will asymptotically follow a power-law distribution (Newman, 2010).

## Network Optimization Models

Previous network structures are based, for the most part, on successive random processes, which are blind to the large-scale structures that are being created. A network growth mechanism that may be more suited to transportation networks is structural optimization. The optimization in this case involves a trade-off between travel time and cost. One of the simplest forms of this type of mechanism was developed by Ferrer i Cancho and Sole in 2003. This model seeks to minimize the following quality function (Ferrer i Cancho & Sole, 2003):

$E(e,l)=\lambda e+(1-\lambda )l$ Where:

e=number of edges in a network

l=mean geodesic distance between vertex pairs (dissatisfaction measure)

λ=a parameter in the range 0≤λ≤1

In this model, the cost of maintaining the network is represented by the variable m. This is the same as saying the cost of operating an airline is proportional to the number of routes it operates, for example. Following the airline example, the variable l would be the average number of legs required to journey from one point to another. This is obviously a simplification of a complex network. The parameter λ provides a balance between a network with the minimum number of edges possible and a fully connected network. From this model it can be seen that by placing a moderate weight on λ, l is minimized and the optimal result is a star graph. This offers a simple explanation of why the hub-and-spoke system is so efficient (Newman, 2010). As it turns out, a non-star graph solution only appears when: $\lambda <{\frac {2}{n^{2}+2}}$ .

The degree distributions of this model, as determined by Ferrer i Cancho and Sole, ranged from an exponential distribution to a power-law distribution as the value of λ changes. This was proposed only for local minima.

Gastner and Newman, 2006 generalizes the model presented by Ferrer i Cancho and Sole by considering not only the number of legs in a journey but also the geographic distance traveled (Newman, 2010). The term l is replaced by t, where t=u+vr (Newman, 2010). In this expression u and v are constants associated with a fixed time and pace (1/speed) on a link and r is the distance traveled on the link. For example, u could be considered the check in, waiting, taxiing, etc. time at an airport and pace would be the reciprocal of the average velocity traveled during a flight. This introduces a spatial aspect into the model where before only topological considerations were accounted for. However, this model also produces only numerical results.

### Thought Question

By changing the values of u and v in Gastner and Newman’s model, what assumptions could be drawn about the geometry of the resultant network?