In network analysis, many community detection algorithms have been
developed. However,their applications leave unaddressed one important
question: **the statistical validation of the
results**.

*robin* (ROBustness in Network) has a double aim:
**tests the robustness** of a community detection algorithm
to detect if the community structure found is statistically significant
and **compares two detection algorithms** to choose the one
that better fits the network of interest.

**Reference in Policastro V., Righelli D., Carissimo A.,
Cutillo L., De Feis I. (2021) https://journal.r-project.org/archive/2021/RJ-2021-040/index.html.**

It provides:

**1)** a procedure to examine the
robustness of a community detection algorithm against a random graph;

**2)** a procedure to choose among different community
detection algorithms the one that better fits the network of
interest;

**3)** two tests to determine the statistical
difference between the curves;

**4)** a graphical
interactive representation.

If there are problems with the installation try:

**prepGraph** function creates an *igraph* object
from the input file. This step is necessary for *robin*
execution

The *unweighted* graph can be read from different format:
edgelist, pajek, graphml, gml, ncol, lgl, dimacs, graphdb and igraph
graphs.

```
my_network <- system.file("example/football.gml", package="robin")
# downloaded from: http://www-personal.umich.edu/~mejn/netdata/
graph <- prepGraph(file=my_network, file.format="gml")
graph
```

```
## IGRAPH c135384 U--- 115 613 --
## + attr: id (v/n), label (v/c), value (v/n)
## + edges from c135384:
## [1] 1-- 2 1-- 5 1-- 10 1-- 17 1-- 24 1-- 34 1-- 36 1-- 42 1-- 66 1-- 91
## [11] 1-- 94 1--105 2-- 26 2-- 28 2-- 34 2-- 38 2-- 46 2-- 58 2-- 90 2--102
## [21] 2--104 2--106 2--110 3-- 4 3-- 7 3-- 14 3-- 15 3-- 16 3-- 48 3-- 61
## [31] 3-- 65 3-- 73 3-- 75 3--101 3--107 4-- 6 4-- 12 4-- 27 4-- 41 4-- 53
## [41] 4-- 59 4-- 73 4-- 75 4-- 82 4-- 85 4--103 5-- 6 5-- 10 5-- 17 5-- 24
## [51] 5-- 29 5-- 42 5-- 70 5-- 94 5--105 5--109 6-- 11 6-- 12 6-- 53 6-- 75
## [61] 6-- 82 6-- 85 6-- 91 6-- 98 6-- 99 6--108 7-- 8 7-- 33 7-- 40 7-- 48
## [71] 7-- 56 7-- 59 7-- 61 7-- 65 7-- 86 7--101 7--107 8-- 9 8-- 22 8-- 23
## + ... omitted several edges
```

**plotGraph** function offers a graphical representation
of the network with the aid of *networkD3* package.

**methodCommunity** function detects communities using
all the algorithms implemented in *igraph* package: “walktrap”,
“edgeBetweenness”, “fastGreedy”, “spinglass”, “leadingEigen”,
“labelProp”, “infomap”, “optimal”, “other”.

```
## IGRAPH clustering fast greedy, groups: 6, mod: 0.55
## + groups:
## $`1`
## [1] 7 14 16 33 40 48 61 65 101 107
##
## $`2`
## [1] 8 9 10 17 22 23 24 42 47 50 52 54 68 69 74 78 79 89
## [19] 105 109 111 112 115
##
## $`3`
## [1] 1 2 20 26 30 31 34 36 38 46 56 80 81 83 90 94 95 102
## [19] 104 106 110
## + ... omitted several groups/vertices
```

**membershipCommunities** function detects the community
membership.

```
## [1] 3 3 5 5 5 5 1 2 2 2 5 5 4 1 4 1 2 6 4 3 6 2 2 2 5 3 4 6 5 3 3 4 1 3 4 3 6
## [38] 3 4 1 5 2 6 4 6 3 2 1 6 2 5 2 5 2 4 3 6 6 6 6 1 4 6 6 1 6 6 2 2 5 6 4 5 2
## [75] 5 6 6 2 2 3 3 5 3 5 5 4 6 6 2 3 5 6 6 3 3 6 6 6 5 4 1 3 5 3 2 3 1 5 2 3 2
## [112] 2 6 6 2
```

*robin* offers two choices for the null model:

it can be generated by using the function

**random**it can be built externally and passed directly to the argument

*graphRandom*of the**robinRobust**function.

The function **random** creates a random graph with the
same degree distribution of the original graph, but with completely
random edges. The *graph* argument must be the same returned by
**prepGraph** function.

```
## IGRAPH c15a93a U--- 115 613 --
## + attr: id (v/n), label (v/c), value (v/n)
## + edges from c15a93a:
## [1] 1-- 2 1-- 87 51-- 93 17--114 54-- 62 1-- 49 1--102 1-- 96
## [9] 1-- 66 1-- 91 17--113 1--105 2-- 44 2--108 2-- 34 73--100
## [17] 46-- 87 2-- 58 77--103 12-- 79 44--114 2-- 92 2-- 47 4-- 22
## [25] 3-- 7 3--106 3--109 3--104 3-- 30 2-- 3 3-- 88 3-- 64
## [33] 3-- 15 18-- 75 17--107 6-- 74 4-- 34 27-- 40 4-- 41 4-- 44
## [41] 5-- 74 24-- 90 3-- 4 72--104 69--113 4-- 87 7-- 95 5-- 89
## [49] 16-- 39 5-- 42 5-- 17 42-- 51 45-- 91 5-- 94 5-- 24 63--110
## [57] 6-- 26 6--105 53-- 95 6-- 48 6-- 37 6-- 10 6--115 6-- 98
## + ... omitted several edges
```

**robinRobust** function implements the validation of
the community robustness. In this example we used “vi” distance as
stability measure, “independent” type procedure and “louvain” as
community detection algorithm.

Users can choose also different measures as: “nmi”,“split.join”, “adjusted.rand”.

The *graph* argument must be the one returned by
**prepGraph** function. The *graphRandom* must be
the one returned by **random** function or own random
graph.

```
proc <- robinRobust(graph=graph, graphRandom=graphRandom, measure="vi",
method="louvain", type="independent")
```

```
## Detected robin method independent type
## Perturbed 31 edges
## Perturbed 61 edges
## Perturbed 92 edges
## Perturbed 123 edges
## Perturbed 153 edges
## Perturbed 184 edges
## Perturbed 215 edges
## Perturbed 245 edges
## Perturbed 276 edges
## Perturbed 306 edges
## Perturbed 337 edges
## Perturbed 368 edges
```

As output **robinRobust** will give all the measures at
different level of perturbation from 0% to 60% for the real and random
graph.

**plotRobin** function plots the curves. The (x,y)-axes
represents the percentage of perturbation and the average of the
stability measure, respectively. The arguments of *model1* and
*model2* must be the measures for the real graph and the random
graph that are the outputs of the **robinRobust**
function.

*We will expect that with a robust algorithm the behavior of the
two curves is different. We expect that the curve of the real graph vary
less than the curve of the random graph, this visually means that the
curve of the real graph is lower than the one of the random graph, so it
is more stable than a random graph.*

```
plotRobin(graph=graph, model1=proc$Mean, model2=proc$MeanRandom,
legend=c("real data", "null model"))
```

The procedure implemented depends on the network of interest. In
this example, the Louvain algorithm fits good the network of interest,as
the curve of the stability measure assumes lower values than the one
obtained by the null model.

The differences between the stability measure curves are tested using:

Functional Data Analysis (FDA);

Gaussian Process (GP).

Moreover to quantify the differences between the curves when they are very close the Area Under the Curves (AUC) are evaluated.

**robinFDATest** function implements a test giving a
p-value for different intervals of the curves. It tests in which
interval the two curves are different.

```
robinFDATest(graph=graph, model1=proc$Mean, model2=proc$MeanRandom,
legend=c("real graph", "random graph"))
```

```
## [1] "First step: basis expansion"
## Swapping 'y' and 'argvals', because 'y' is simpler,
## and 'argvals' should be; now dim(argvals) = 13 ; dim(y) = 13 x 20
## [1] "Second step: joint univariate tests"
## [1] "Third step: interval-wise combination and correction"
## [1] "creating the p-value matrix: end of row 2 out of 9"
## [1] "creating the p-value matrix: end of row 3 out of 9"
## [1] "creating the p-value matrix: end of row 4 out of 9"
## [1] "creating the p-value matrix: end of row 5 out of 9"
## [1] "creating the p-value matrix: end of row 6 out of 9"
## [1] "creating the p-value matrix: end of row 7 out of 9"
## [1] "creating the p-value matrix: end of row 8 out of 9"
## [1] "creating the p-value matrix: end of row 9 out of 9"
## [1] "Interval Testing Procedure completed"
```

```
## TableGrob (1 x 2) "arrange": 2 grobs
## z cells name grob
## 1 1 (1-1,1-1) arrange gtable[layout]
## 2 2 (1-1,2-2) arrange gtable[layout]
```

```
## $adj.pvalue
## [1] 0.8057 0.0097 0.0019 0.0018 0.0018 0.0018 0.0018 0.0039 0.0084
##
## $pvalues
## [1] 0.8057 0.0018 0.0018 0.0018 0.0018 0.0018 0.0018 0.0039 0.0018
```

The first figure represents the stability measure plot using Louvain algorithm for detecting communities. The second one represents the corresponding p-values and adjusted p-values of the Interval Testing procedure. Horizontal red line corresponds to the critical value 0.05.

**robinGPTest** function implements the GP testing.

```
## Profile 1
## Profile 2
```

`## [1] 145.0188`

It tests the two curves globally. The null hypothesis claims that the
two curves come from the same process, the alternative hypothesis that
they come from two different processes. The output is the Bayes Factor.
One of the most common interpretations is the one proposed by Harold
Jeffereys (1961) and slightly modified by Lee and Wagenmakers in 2013:

**IF B10 IS… THEN YOU HAVE…** ° > 100 Extreme
evidence for H1

° 30 – 100 Very strong evidence for H1

° 10 – 30 Strong evidence for H1

° 3 – 10 Moderate evidence for H1

° 1 – 3 Anecdotal evidence for H0

° 1 No evidence

° 1/3 – 1 Anecdotal evidence for H0

° 1/3 – 1/10 Moderate evidence for H0

° 1/10 – 1/30 Strong evidence for H0

° 1/30 – 1/100 Very strong evidence for H0

° < 1/100 Extreme evidence for H0

**robinAUC** function implements the AUC.

```
## $area1
## [1] 0.1185515
##
## $area2
## [1] 0.2641421
```

The outputs are the area under the two curves.

In this example we want to compare the “Fast Greedy” and the “Louvain” algorithms to see which is the best algorithm.

We firstly plot the communities detected by both algorithms.

```
membersFast <- membershipCommunities(graph=graph, method="fastGreedy")
membersLouv <- membershipCommunities(graph=graph, method="louvain")
plotComm(graph=graph, members=membersFast)
```

Secondly, we compare them with **robinCompare**
function.

**robinCompare** function compares two detection
algorithms on the same network to choose the one that better fits the
network of interest.

```
## Detected robin method independent type
## Perturbed 31 edges
## Perturbed 61 edges
## Perturbed 92 edges
## Perturbed 123 edges
## Perturbed 153 edges
## Perturbed 184 edges
## Perturbed 215 edges
## Perturbed 245 edges
## Perturbed 276 edges
## Perturbed 306 edges
## Perturbed 337 edges
## Perturbed 368 edges
```

Thirdly, we plot the curves of the compared methods.

```
plotRobin(graph=graph, model1=comp$Mean1, model2=comp$Mean2,
legend=c("fastGreedy", "louvain"), title="FastGreedy vs Louvain")
```

In this example, the Louvain algorithm fits better the network of interest, as the curve of the stability measure assumes lower values than the one obtained by the Fast greedy method.

Fourthly we test the statistical differences between these two curves that now are created on two different community detection algorithm. The tests are already explained with more detail above.

```
## [1] "First step: basis expansion"
## Swapping 'y' and 'argvals', because 'y' is simpler,
## and 'argvals' should be; now dim(argvals) = 13 ; dim(y) = 13 x 20
## [1] "Second step: joint univariate tests"
## [1] "Third step: interval-wise combination and correction"
## [1] "creating the p-value matrix: end of row 2 out of 9"
## [1] "creating the p-value matrix: end of row 3 out of 9"
## [1] "creating the p-value matrix: end of row 4 out of 9"
## [1] "creating the p-value matrix: end of row 5 out of 9"
## [1] "creating the p-value matrix: end of row 6 out of 9"
## [1] "creating the p-value matrix: end of row 7 out of 9"
## [1] "creating the p-value matrix: end of row 8 out of 9"
## [1] "creating the p-value matrix: end of row 9 out of 9"
## [1] "Interval Testing Procedure completed"
```

```
## TableGrob (1 x 2) "arrange": 2 grobs
## z cells name grob
## 1 1 (1-1,1-1) arrange gtable[layout]
## 2 2 (1-1,2-2) arrange gtable[layout]
```

```
## $adj.pvalue
## [1] 0.9818 0.2013 0.0230 0.0078 0.0412 0.2118 0.8114 0.8114 0.7171
##
## $pvalues
## [1] 0.9818 0.0108 0.0025 0.0025 0.0025 0.0134 0.7140 0.6549 0.1869
```

```
## Profile 1
## Profile 2
```

`## [1] 79.13094`

```
## $area1
## [1] 0.1657364
##
## $area2
## [1] 0.1203704
```