10 Link prediction

This chapter covers

Covering link prediction workflow
Introducing link prediction dataset split
Constructing link prediction features based on node pairs
Training and evaluating a supervised link prediction classification model

Most real-world networks are dynamic and evolve through time. Take, for example, a friendship network of people. People’s friends change over time. They might meet new people or cease to associate with others. You might assume that new connections are forming randomly in a friendship network. However, it turns out that most real-world networks have a profound organizing principle. The studies around link prediction are focused on identifying and understanding various network-evolving mechanisms and applying them to predict future links.

Figure 10.1. Link prediction.

10.1 Link prediction workflow

10.2 Dataset split

10.2.1 Time-based split

10.2.2 Random split

10.2.3 Negative samples

10.3 Network feature engineering

10.3.1 Network distance

10.3.2 Preferential attachment

10.3.3 Common neighbors

10.3.4 Adamic-Adar index

10.3.5 Clustering coefficient of common neighbors

10.4 Link prediction classification model

10.4.1 Missing values

10.4.2 Training the model

10.4.3 Evaluating the model

10.5 Summary

10.6 References

10.7 Solutions to exercises