Transportation Geography and Network Science/Network formation

Models of Network Formation

This section will discuss Models of Network Formation also referred to as “Generative Network Models.” These models offer an explanation to the question of why a network should have a certain set of predetermined parameters (e.g. power-law degree distributions). That is, they model the mechanisms by which networks are created and explore the resultant network structures of certain hypothesized generative mechanisms. “If the structures are similar to those of networks we observe in the real world, it suggests-though does not prove-that similar generative mechanisms may be at work in real networks.” (Newman, 2010)

Preferential Attachment

Many networks are observed to have degree distributions that approximately follow power laws, at least in the tail of the distribution. In the 1970s, Derek Price first proposed a network formation model that explained this observation which was inspired by the work of economist Herbert Simon and statistician Udny Yule (Yule Process). Price expanded Simon’s mathematical explanation of how the “rich-get-richer” (also a power-law distribution) to the network context, calling it “cumulative advantage” (more well known as preferential attachment coined by Barabasi and Albert in 1999).

Price’s Model

Price specifically applied this process to a model of a citation network. In this model, papers are published continually (added one by one), yet not necessarily at a constant rate and newer papers cite older papers. The edges in this network are directional and are created but never destroyed. The central assumption of Price’s model is that a newly appearing paper cites previous ones chosen at random with probability proportional to the number of citations those previous papers already have plus a constant a, where a>0. Consider a price model network where $q_{i}$ denotes the in-degree of vertex i. Now suppose a new vertex is added to the network. Then the probability of the citation made to a particular vertex, say i is proportional to $q_{i}+a$ . Normalizing the above term, we get the probability of citing a vertex, say i by a newly introduced vertex is given by,

$p_{i}={\frac {q_{i}+a}{\sum (q_{i}+a)}}$

If average number of citations for a given paper is c, with certain variance according to the properties of the network then as as the network size becomes very large, the fraction of vertices in the network that have in-degree q becomes: (Newman, 2010 pp- 490-494)

$p_{q}={\frac {B(q+a,2+a/c)}{B(a,1+a/c)}}$

Where:

q=in-degree

a=constant

c=average out-degree of network (or average bibliography size)

where beta function is defined as $B(x,y)={\frac {\Gamma (x)\Gamma (y)}{\Gamma (x+y)}}$

and gamma function as $\Gamma (x)=\int _{0}^{\infty }t^{x-1}e^{-t}\,dt$

Furthermore, the Beta function falls off as a power law for large values of x, with exponent y. Applying this, we find that for large values of q (q>>a) the degree distribution goes to:

$p_{q}\sim q^{-(2+{\frac {a}{c}})}$

Thus Price’s model for citation network gives rise to a degree distribution with a power-law tail. More generally speaking, each paper could be considered a node and each citation a link. Here, links are added to existing nodes at a probability proportional to the number of links already reaching that node.

Model of Barabasi and Albert

Developed independently in the 1990s, the Barabasi-Albert model is the best known generative network model in use today. Similar to Price’s model, vertices are added one by one to a growing network and connected to existing vertices. However, these connections are undirected and the number of connections made by each vertex is exactly c (unlike Price’s model where c is allowed to vary from step to step). Connections are made in proportion to the undirected degree, k. By treating the Barabasi and Albert model as a special case of Price’s model, Newman, 2010 shows that:

$p_{k}={\frac {2c(c+1)}{k(k+1)(k+2)}}$

for k ≥ c

Where:

c=number of connections made by each vertex

k=degree of vertex

$p_{k}$ is the proportion of vertices with k degrees.

In this case, when k becomes very large the degree distribution becomes:

$p_{k}\sim k^{-3}$

Extensions and Further Properties of Preferential Attachment Models

Model Property/Extension	Intuitive Rationale	Resultant Distribution	Contributor
Incorporating Time of Creation	By preferential attachment, older vertices will have more time to acquire links.	Contains leading algebraic factor, does not follow power-law distribution	(Dorogovsev, Mendes, & Samukhim, 2000)
Sizes of In-Components	Distribution of the set of vertices from which vertex i can be reached by following a directed path	For component sizes << network size, follows power-law distribution	(Newman, 2010)
Addition of Extra Edges	Connections added between two existing vertices	Power-law distribution	(Krapivsky, Rodgers, & Redner, 2001)
Removal of Edges	Considered reverse preferential attachment, higher degree are more likely to lose an edge	Power-law distribution to a point, then stretched exponential	(Moore, Ghoshal, & Newman, 2006)
Non-Linear Preferential Attachment	Considers a non-linear attachment process rather than a linear process in the degree of the vertex	Stretched exponential	(Krapivsky, Redner, & Leyvraz, Connectivity of Growing Rrandom Networks, 2000)
Vertices of Varying Quality or Attractiveness	Quality and attractiveness are incorporated into attachment process	Depends on distribution of “fitness” values	(Bianconi & Barabasi, 2001)

Thought Question

What other networks’ formation could be accurately described using preferential attachment (considering either Price’s model or Barabasi-Albert)?

Online Simulation

NetLogo Models Library: Preferential Attachment

Vertex Copying Models

Though preferential treatment offers a plausible explanation for power-law degree distributions, it is not the only means by which networks can grow (Newman, 2010). Suppose newly added vertices copied a fraction of the connections of an existing vertex and the remainders are filled out by other existing vertices (e.g. a new paper copied a fraction the bibliography of an existing paper). As analyzed by Newman, 2010, the degree distribution of a large network under this model will asymptotically follow a power-law distribution (Newman, 2010).

Network Optimization Models

Previous network structures are based, for the most part, on successive random processes, which are blind to the large-scale structures that are being created. A network growth mechanism that may be more suited to transportation networks is structural optimization. The optimization in this case involves a trade-off between travel time and cost. One of the simplest forms of this type of mechanism was developed by Ferrer i Cancho and Sole in 2003. This model seeks to minimize the following quality function (Ferrer i Cancho & Sole, 2003):

$E(e,l)=\lambda e+(1-\lambda )l$

Where:

e=number of edges in a network

l=mean geodesic distance between vertex pairs (dissatisfaction measure)

λ=a parameter in the range 0≤λ≤1

In this model, the cost of maintaining the network is represented by the variable m. This is the same as saying the cost of operating an airline is proportional to the number of routes it operates, for example. Following the airline example, the variable l would be the average number of legs required to journey from one point to another. This is obviously a simplification of a complex network. The parameter λ provides a balance between a network with the minimum number of edges possible and a fully connected network. From this model it can be seen that by placing a moderate weight on λ, l is minimized and the optimal result is a star graph. This offers a simple explanation of why the hub-and-spoke system is so efficient (Newman, 2010). As it turns out, a non-star graph solution only appears when: $\lambda <{\frac {2}{n^{2}+2}}$ .

The degree distributions of this model, as determined by Ferrer i Cancho and Sole, ranged from an exponential distribution to a power-law distribution as the value of λ changes. This was proposed only for local minima.

Gastner and Newman, 2006 generalizes the model presented by Ferrer i Cancho and Sole by considering not only the number of legs in a journey but also the geographic distance traveled (Newman, 2010). The term l is replaced by t, where t=u+vr (Newman, 2010). In this expression u and v are constants associated with a fixed time and pace (1/speed) on a link and r is the distance traveled on the link. For example, u could be considered the check in, waiting, taxiing, etc. time at an airport and pace would be the reciprocal of the average velocity traveled during a flight. This introduces a spatial aspect into the model where before only topological considerations were accounted for. However, this model also produces only numerical results.

Thought Question

By changing the values of u and v in Gastner and Newman’s model, what assumptions could be drawn about the geometry of the resultant network?

References

Barabasi, A.-L., & Albert, R. (1999). Emergence of Scaling in Random Networks. Science , 286, 509-512.

Bianconi, G., & Barabasi, A.-L. (2001). Bose-Einstein Condensation in Complex Networks. Physics Review Letters , 86, 5632-5635.

Dorogovsev, S. N., Mendes, J. F., & Samukhim, A. N. (2000). Structure of Growing Networks with Preferential Linking. Physics Review Letters , 85, 4633-4636.

Ferrer i Cancho, R., & Sole, R. V. (2003). Statistical Mechanics of Complex Networks (Vol. 625). (R. Pastor-Satorras, M. Rubi, & A. Diaz-Guilera, Eds.) Berlin: Springer.

Gastner, M. T., & Newman, M. E. (2006). Optimal Design of Spatial Distribution Networks. Physics Review E , 74, 016117.

Jackson, M. O. (2005). A Survey of Models of Network Formation: Stability and Efficiency. In G. Demange, & M. Wooders (Eds.), Group Formation in Economics: Networks, Clubs, and Coalitions (p. Chapter 1). Cambridge: Cambridge University Press.

Krapivsky, P. L., Redner, S., & Leyvraz, F. (2000). Connectivity of Growing Rrandom Networks. Physics Review Letters , 85, 4629-4632.

Krapivsky, P. L., Rodgers, G. J., & Redner, S. (2001). Degree Distributions of Growing Networks. Physics Review Letters , 86, 5401-5404.

Moore, C., Ghoshal, G., & Newman, M. E. (2006). Exact Solutions of Models of Evolving Networks with Addition and Deletion of Nodes. Physics Review E , 74 (3), 036121.

Newman, M. E. (2010). Networks: An Introduction. New York, NY: Oxford University Press Inc.

Price, D. d. (1976). A General Theory of Bibliometric and Other Cumulative Advantage Processes. Journal of the American Society for Information Science , 27, 292-306.

Rodrigue, J.-P., Comtois, C., & Slack, B. (2009). The Geography of Transport Systems (2nd Edition ed.). New York, NY: Routledge.

Simon, H. A. (1955). On a Class of Skew Distribution Functions. Biometrika , 42 (3/4), 425-440.

Wilensky, U. (2005). NetLogo Preferential Attachment model. http://ccl.northwestern.edu/netlogo/models/PreferentialAttachment. Center for Connected Learning and Computer-Based Modeling, Northwestern University, Evanston, IL

Wilensky, U. (1999). NetLogo. http://ccl.northwestern.edu/netlogo/. Center for Connected Learning and Computer-Based Modeling, Northwestern University, Evanston, IL.

Yerra, B. M., & Levinson, D. M. (2005). The Emergence of Hierarchy in Transportation Networks. Annals of Regional Science , 39, 541-553.

Transportation Geography and Network Science/Network formation

Models of Network Formation

Preferential Attachment

Price’s Model

Model of Barabasi and Albert

Extensions and Further Properties of Preferential Attachment Models

Thought Question

Online Simulation

Vertex Copying Models

Network Optimization Models

Thought Question

Further Reading

See Also

References