# Complex Networks Principles, Methods and Applications pdf pdf

Gratis

### 79 Anson Road, #06–04/06, Singapore 0906 Cambridge University Press is part of the University of Cambridge

We can roughly say that a complex system is a system made by a large number of single units (individuals,components or agents) interacting in such a way that the behaviour of the system is not a simple combination of the behaviours of the single units. In Chapter 2 we focus on the concept of centrality, along with some of the related mea- sures originally introduced in the context of social network analysis, which are today usedextensively in the identification of the key components of any complex system, not only of social networks.

### 1 Graphs are the mathematical objects used to represent networks, and graph theory is the

To give some examples, graph theory helps us to schedule airplane routing and has solved problems such as finding the max-imum flow per unit time from a source to a sink in a network of pipes, or colouring the regions of a map using the minimum number of different colours so that no neighbouringregions are coloured the same way. The people invited at aparty are the nodes of the graph, while the existence of acquaintances between two persons defines the links in the graph.

### 2 K called links (or edges, or lines)

The number of vertices N ≡ N[G] = |N |, where the symbol | · | denotes the cardinality of a set, is usually referred as the order of G, while the number of edges K ≡ K[G] = |L| [1] is the size of G. However, in many cases ofinterest, the number of links K is proportional to the number of nodes N, and therefore the concept of size of a graph can equally well be represented by the number of its nodes N or by the number of its edges K.

### 2 In other words, G and G are isomorphic if and only if a one-to-one correspondence

Analogously, it is easy to prove that the number of different automorphisms of a triangle C = K is a = 6, and more in general, for a cycle of N nodes, C , we haveC 3 3 △N a = 2N. 1.6t eliminating four of the edges of G (b), the subgraph generated by the set N 1, 2, 3, 4, 5} (c), and a spanning= { 6 tree (d) (one of the connected subgraphs which contain all the nodes of the original graph and have the smallest number of links, i.e.

**1.2 Directed, Weighted and Bipartite Graphs**

Sometimes, the precise order of the two nodes connected by a link is important, as in the case of the following example of the shuttles running between the terminals of an airport. A traditional measure of graph reciprocity is the ratio r between the number of arcs in the network pointing in both directions and the totalnumber of arcs [308] (see Problem 1.2 for a mathematical expression of r, and the work by Diego Garlaschelli and Maria Loffredo for alternative measures of the reciprocity [128]).

### 2 K and are called links or edges

For instance, typical bipartite networks are systems of users purchasing items such as books, or watching movies. An example isshown in Figure 1.8, where we have denoted the user-set as U = {u , u , · · · , u } and 1 2 N the object-set as O = {o , o , · · · , o }.

### 2 V

The first graph is a projection of the bipartite graph on the first set of nodes: the nodes are the users and twousers are linked if they have at least one object in common. Analogously, we can construct a graph of similarities between different objects by projecting the bipartitegraph on the set of objects; see panel (c) in the figure.

**1.3 Basic Definitions**

Basically, the definitions given above are valid both for undirected and for directed graphs, with the only difference that, in an undirected graph, if a sequence of nodes isa walk, a trail or a path, then also the inverse sequence of nodes is respectively a walk, a trail or a path, since the links have no direction. The total degree of the node is then defined as ki , referred to as the out-degree of the node, and the number of ingoing links kin i Definition 1.7 (Node degree) The degree kiof a node i is the number of edges incident in the node.

### 2 N = N + N = K + 1 + K + 1 = K + 1

1 1 1 2 Finally, it is clear that by the successive use of the three propositions above we have proved that definition (A) implies definition (B).**1.5 Graph Theory and the Bridges of Königsberg**

If we collapse the whole river bank A to a point, and we do the same for river bank B and for the islands C and D, all the relevant information aboutthe city map can, in fact, be encapsulated into a graph with four nodes (river banks and islands) and seven edges (the bridges) shown in right-hand side of Figure 1.11. (Difficulty of an exhaustive search) A way to perform an exhaustive searchExample 1.11 for an Eulerian trail in a graph with N nodes and K edges is to check among the walks of length l = K whether there is one containing all the edges of the graph.

### 1 L

, s], ∪ = L i j i i= 1 Now we can state the characterisation of Eulerian graphs in the form of equivalence of the following three statements [150]: Graphs and Graph TheoryThe reduction process to prove that a graph with all nodes with even degree can be partitioned into cycles. Proceeding with the reduction, one will at last reach a graph ′G ≃ Z , therefore the set of links L is partitioned in cycles L , L , .

### 2 Eulerian circuit

The Euler theorem provides a general solution to the bridge problem: the request to pass over every bridge exactly once can be satisfied if and only if the vertices with odddegree are zero (starting and ending point coincide) or two (starting and ending point do not coincide). In the same way, by a simple and fast inspection, we can answer the same question for the city of Venice or forany other city in the world having any number of islands and bridges.

**1.6 How to Represent a Graph**

This matrix has K elements dif- d ferent from zero, and the number of ones in row i is equal to the number of outgoing links outin k , while the number of ones in column i is equal to the number of ingoing links k .ii Clearly, Definition 1.15 refers only to undirected and directed graphs without loops or multiple links, which will be our main interest in this section and in the rest of the book. Then, all the non-zero values of r are sorted in α iα iα decreasing order, and the objects in the top of the list are recommended to user u .i Let us now come back to the main issue of this section, namely how to represent an undirected or directed graph.

## A. Each of the two vectors has M components, with M =

Note that the order in which the edges are stored is irrelevant: to change it is equivalent to relabelling the edges, but the graph (andthe edge labels as a pair of nodes) is unchanged. 1.6 How to Represent a Graph Example 1.16 (List of neighbours) The lists of neighbours corresponding to the two graphs in Example 1.12 are respectively: ⎛ ⎞ ⎛ ⎞ ⎜ ⎟ ⎜ ⎟ node in-neighbours node out-neighbours ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ 1 2 5 1 2 5 ⎜ ⎟ ⎜ ⎟ 2 1 3 5 2 3 A = , A = u d ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎝ ⎜ ⎟ ⎜ ⎟ 3 2 3 4 5 4 ⎠ ⎝4 2 ⎠ 5 1 2 5 1 Notice that in the case of a directed graph, we have two different possibilities.

### 1 N+

, N.i i i+ 1 i Therefore, the k = 2 neighbours of node 1 are stored in j in the positions from r = 1 to 1 1r − 1 = 2, the k = 3 neighbours of node 2 are stored in j in the positions from r = 3 to 2 2 2r − 1 = 5, and so on. Instead in thei+ 1 i i compressed column storage the vectors are r = (1, 2, 4, 5, 6, 8) and j = (5, 1, 4, 2, 3, 1, 3), in and now r − r gives the in-degree k of node i.i+ 1 i i In practice, for the implementation of graph algorithms given in the Appendix, we shall make use of either the ij-form or the compressed row storage form.

### 2 Centrality measures allow the key elements in a graph to be identified. The concept of

We will then consider centrality measures based on shortest paths, such as the closeness centrality which is related to the average distance of a node from all the other nodes, or the betweenness centrality which counts instead the number of shortest paths a node lies on. As only one possible example of the many potential applications, we intro-duce a large graph describing a real social system, namely the movie actor collaboration network , and we use it to identify the most popular movie stars.

**2.1 The Importance of Being Central**

In particular, the measure of wealth reportedhere stands for the net wealth measured in the year 1427, and is coded in thousands of lira, while the number of priorates is the number of seats on the Civic Council held between1282 and 1344. For instance, the only node with six edges corresponds to Medici, the family with the second-greatestwealth, while Strozzi, the family with both the greatest wealth and the largest number of priorates, is one of the two nodes with the second largest value of degree.

**2.2 Connected Graphs and Irreducible Matrices**

In particular, if Ais the adjacency matrix of a graph, the effect of the permutation matrix P on A corre- sponds to a permutation of the graph vertex indices φ : N → N . Thus, if the adjacencymatrix is reducible, this means that by an appropriate relabelling of the nodes we can put A in a form such that it is evident that some of the nodes are unreachable by othernodes.

### 22 A

This means that if for all pairs of nodes k (i, j), there exists a k(i, j) such that (A ) > 0 this means that it is always possible to go ij from i to j in a finite number of steps, and therefore the weighted graph is connected, and the matrix A is irreducible (see Theorem 2.1). Example 2.3 Referring to the matrices of Example 2.2, we find that by direct computation: ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟ 1 3 3 1 4 4 3 1 1 3 6 4 6 4 N−1 N−1⎜ ⎟ ⎜ ⎟) ) (I + A B = (I + A C = ⎝ ⎠ ⎝ ⎠ 1 3 1 3 4 4 3 1 3 1 6 4 6 4 This proves that A is irreducible, while A is reducible, in agreement with the result found B C by visual inspection of the graphs B and C in Figure 2.3.

### 2 O (N K ) for non-symmetric matrices

, N) is defined as: N Dc a , (2.2)i ij = k = i j=1where k is the degree of node i.i Centrality Measures In a directed graph, in-degree and out-degree centrality of node i are respectively defined as:in out N N D in D outc a , c a , (2.3)ji ij i = k i = i = k i =j=1 j=1 in outwhere k and k are respectively in- and out-degrees.i i in out D D D The normalised degree centralities C , C , C are obtained by dividing the threei i iquantities above by N − 1. A central node with a degree of 25 in a graph of 100 nodes, for example, is not as central as a node with degree of 25 in agraph of 30 nodes, and neither can be easily compared with a central node with a degree 6 in a graph of 10 nodes.

### 1 N

1 1 1 1 = λ In a directed strongly connected graph, the in- and the out-degree eigenvector centrali- in outE Eties c and c of node i are respectively equal to the ith component of the eigenvectorsi i ⊤ associated with the leading eigenvalue λ of matrix A , and of matrix A. / i = c i i i The value of the parameter α reflects the relative importance of the graphs structure, namely the adjacency matrix, versus the exogenous factor corresponding to the addition ofthe vector 1, in the determination of the centrality scores.

**2.4 Measures Based on Shortest Paths In the previous section we considered centrality measures depending on the node degree**

We will see three different examples of such measures, such asthe closeness centrality, based on the lengths of the shortest paths from a vertex; the betweenness centrality , counting the number of shortest paths a vertex lies on; and the delta centrality , based both on the number and on the length of shortest paths. Usually, the coefficient takes values in the range [1, 1], and is equal to 1 if the agreement between the two rankings is perfect (i.e. the two rankings are the same), or is equal to 0 if the rankings are completelyindependent, while it is 1 if one ranking is the reverse of the other.

**2.5 Movie Actors**

C [G] of a graph G with N nodes, Deﬁnition 2.10 (Graph centralisation) The centralisation Hwith respect to the centrality measure C, is given by: N∗ (C [G]) i i [G] − C C i=1H (2.19)[G] = N′ ′ ∗ max (C [G [G ]) i i ] − C ′G i=1 [G] is the value of centrality of node i in graph G, for the centrality index of where Ci ∗ interest C, and i is the node with the highest centrality. The normalisation factor at the denominator is the maximum ′ possible value of the numerator over all graphs G with the same number of nodes as G, C which is obtained in the case of a star with N nodes, so that the graph centralisation H [G] ranges in the interval [0, 1] and in particular equals 1 if the graph G is a star.

### 10 D B B

According to the way we evaluate such a distance, we will get slightly different definitions of closenesscentrality, except in the trivial case of a group g consisting of a single node, where the three definitions of distances are identical, and the group centrality reduces to the nodecentrality. jk jkDeﬁnition 2.14 (Group delta centrality) In a graph G, the delta centrality C of group g isgdefined as the relative drop in the graph efficiency E caused by the deactivation of the nodes in g: ′ ( E) E ] g [G] − E[G C (2.21) g = =E P [G] ′where G is the graph obtained by removing from G the edges incident in nodes belonging to g, and the efficiency E is defined as in Eq.

### 3 Random Graphs

The systematic study of random graphs was initiated by Erd˝os and Rényiin the late 1950s with the original purpose of studying theoretically, by means of proba- bilistic methods, the properties of graphs as a function of the increasing number of randomconnections. As a practical example we compute the component order distribution in a set of large real networks of scientific paper coauthorships and we compare the results with random graphs having the same number of nodes and links.

**3.1 Erd˝os and Rényi (ER) Models**

Therefore, the probability M ER K P is equal to 1/C G of finding a given graph G ≡ (N , L) in the model G N ,K M if |N | = N and |L| = K, and is zero otherwise, namely: K 1/C iff |N | = N and |L| = K M P (3.3) G = otherwise 15 18 As an example, in Figure 3.1 we show six realisations of the C ≈ 4.73 × 10 120 possible random graphs with N = 16 nodes and K = 15 links. The number of green-eyed people you pick is a random variable which follows a binomial distribution with n = 100 and p = 0.05.p = 0.125 in model B, then the random graphs we sample have, on average, pM = 15 links, which is equal to the number of links in the graphs sampled by model A in Figure 3.1.

**3.2 Degree Distribution**

N−1 kN−1−k In fact, pis the probability is the probability for the existence of k edges, (1 − p) k N − 1 is the number for the absence of the remaining N − 1 − k edges, and C = N−1k of different ways of selecting the end points of the k edges among the N − 1 possible. The expectation of the number of nodes with exactly k links is given by the sum of the probabilities that each of the nodes in the graph has k links, namely: N N Prob .k k = i =k i=1 Since all the nodes in a random graph are statistically equivalent, Prob k i =k is the same ∀i, and we have: Nk k i = N Prob =k 3.2 Degree Distributionwhich implies that p has the same distribution as Prob .

### 3 N c c

est, and by s , respectively their order. By definition, we have ℓ c , with ℓ = 1, 2, .

### 3 N c s

Let us indicate as S = s(0 < S ≤ 1) the fraction of the graph occupied by the giant component, and as v = 1−S the fraction of the vertices of the graph that do not belong to the giant component. If the graph has N nodes, the expectation value of the number of iso- 1 ¯n N−1 N−1 is the probability that 1 lated nodes can be written as ¯n = N(1 − p) , where (1 − p) a given node of the graph is isolated.

**3.5 Scientiﬁc Collaboration Networks**

In the same plots,we have also reported for comparison the average number n of components of order s s in ensembles of random graphs with the same number of nodes and links as in the real networks. Only for NCSTRL do weobserve the presence of components other than isolated nodes (and of order up to about ten) in the randomised version of the graph, although the average number of components, n , s is a much more rapidly decreasing function of s than in the real network.

### 2 Average of third-largest

the average number of nodes at distance two from i, we are interested 2 not in the complete degree of the vertex j reached by following an edge from i, but in the number of edges emerging from j other than the one we arrived along (since the latter edgeleads back to vertex i and so does not contribute to the number of second neighbours of i). Since we have useda log-scale for the y axis, it is easy to see that z is a good approximation of N up to m m5 m = 2, 3, while, starting a m = 4, due to the finiteness of the system, the number of new6000 1044000 10 N m32000 10 N m 7 1 2 3 4 5 6 8 92 60000 1040000 1 m N 1020000 10 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8m m The number N of new visited nodes at distance m from a given node is plotted as a function of m, respectively, in Fig.

### 1/ D

Since a linear graph with N nodes is the longest for N nodes, the longest graph of N + 1 nodes has to be formed bythe longest graph of order N and a new a vertex linked to one of the N existing vertices N as shown in the figure. If D is the contribution to D when the new node is linked to the nthN ij = i<j N+1 N+1 existing node, with 1 ≤ n ≤ N, such an N + 1-vertex graph obeys: nD = D N + D N+1 N+1n where D can be computed by adding the distances of the new node, N + 1, to all the N+1 other nodes in the graph: 3.7 What We Have Learned and Further Readings n N−n+1n .

## N+1 N+1

As a practical example, in this chapter we have computed the order of the components in a set of large real networks of scientific paper coauthorships , and we have compared the results to those obtained for random graphs with the same number of nodes and links. All graphs haveN = 10,000 nodes, and each pair of nodes is connected with a probability pab = 0.00008 in the first ensemble, with a probability p = 0.0006 in the second ensemble, and with a probability p c = 0.0018 in the third one.

**3.5 Scientific Collaboration Networks**

### 1 N −

Can you estimate the characteristic path length of thesocial network in the USA, and in the entire world, knowing that k = 1000 is a good approximation of the typical number of acquaintances for people in theUSA (and assuming the same value of k for the entire world)? This model and the various modified versions of it that have been proposed over the years, are all based on the additionof a few long-range connections to a regular lattice and provide a good intuition about the small-world effect in real systems.

**4.1 Six Degrees of Separation**

The authors of the workprovided some arguments to correct this effect, and derived that a better estimate of the median ranges from five to seven steps, depending on the separation of source and tar-get, which somehow confirms the results of the Milgram experiment. In Table 4.3 we report the distribution of Bacon numbers extracted from the actor collaboration graph introduced in DATA SET Box 2.2 The first thing to 4.1 Six Degrees of Separation Box 4.1 The Erd˝os NumberPaul Erd˝os, one of the fathers of random graph theory, was an incredibly proliﬁc mathematician of the highest calibre and wrote thousands of scientiﬁc papers in diﬀerent areas of mathematics, many of them in collab-oration with other mathematicians.

**4.2 The Brain of a Worm**

elegans and the collaboration networks of ArXiv and movie actors, we report the results obtained for other real-world networks with numbers of nodes ranging from a few hundreds to a couple of million in the case of Medline scientific coauthorship network. The points inthe figure include brain networks from imaging, transcription regulation networks, protein interaction networks, email networks, citation networks, the Internet, the World Wide Web, 4.2 The Brain of a Worm 250 200150 100 50 50 100 150 200 250 Graphical representation of the connections among the 279 nodes of the neural network of the C.

### 20 L ln <k>

Notice that each node of these networks does not represent a neuron but an entire area of the brain,usually consisting of thousands to millions neurons, while an edge can indicate the presence of either a physical connection or a signiﬁcative correlation in the activities of two areas [61, 113, 291, 62, 92]. However, since for many real-world systems we only have access to a single net- work with a given fixed number of nodes N, in order to say that the network has thesmall-world property we will accept as sufficient requirement that the network has a value of L comparable to that of a random graph with N nodes.

**4.3 Clustering Coefficient**

(4.2), when the degree of i is either zero or one.i 4.3 Clustering Coefficient Example 4.3 (Expression in terms of the adjacency matrix) We can rewrite the local clustering coefficient, c , of node i as: i ⎧a a a ⎨ l ,m il lm mi for k ≥ 2 ic = k (k − 1) (4.4)i i i ⎩ for k = 0, 1 i Consider the numerator in Eq. elegans introduced in the previous section, and in the two collabora-tion networks of ArXiv and of movie actors is to plot, as a function of the degree k, the average clustering coefficient c(k) of nodes with degree k defined in Eq.

### i . However, the similarity between C and T can be misleading: although in many graphs C

and T take similar values, there are cases where the two measures can be very different, as shown in the following example. Example 4.5 (Clustering coefficient versus transitivity) The clustering coefficient C of a graph, although apparently similar to T, is indeed a different measure.

### 0. Thus, the graph has a clustering coefficient equal to

The graph has N + 2 nodes in total: nodes A and B 4.3 Clustering Coefficientare directly linked, and each of them is also connected to all the other N nodes. The tran- sitivity T of the graph is equal to 3/(N + 2), since the graph has n = N triangles and △n = 3N + N(N − 1) triads, namely N triads XAB, N triads XBA, N triads AXB and ∧ ′ ′ ′ (N − 1) triads XAX or XBX with X .

### 2 A B

Indeed, from the above equation we have that the total number of triangles, n , and the total number of triads, n , can be ∧ written as: 1 1 ∧n = n = n ci i i 3 3 i i∧ n = n .∧i i The factor 1/3 in the expression of n is due to the fact that each individual triangle is counted three times in the summation, given that it belongs to three different nodes. While the graph clustering coefficient C is the arithmetic average of the clustering coefficients of all the nodes, the transitivity T isa weighted average of the clustering coefficients, the weight of each node being equal to the number of triads centred on it.

### 10 C.elegans

An easy way to count the number K[G ] of edges in G avoiding double i i counting is to scan all the six nodes of G clockwise and consider the number of edges i connecting each of them to nodes forward in the circle (in clockwise direction). For a generic value of m, the total number of existing edges in G is: i Small-World Networks 1m− 3m(m − 1)(m − 1) + m j = 2 1j= where the first term comes from the nodes to the left of node i (backward nodes), while the second term is due to the nodes to the right of i (forward nodes).