10 Graph feature engineering: manual and semi-automated approaches

 

This chapter covers

  • Manual feature engineering techniques for nodes and relationships in graphs
  • Combining domain expertise with semi-automated extraction in graph representation
  • Real-world applications of feature engineering in fraud and drug discovery

The success of machine learning on graphs depends on a fundamental challenge: how to effectively represent graph elements (nodes, relationships, and entire graphs) as vectors that machine learning algorithms can process. This representation step, often called vectorization or featurization, determines how well our models can learn and make predictions.

While modern machine learning algorithms – from traditional approaches like logistic regression and random forests to sophisticated deep learning models – are well-established, they can't directly process graph structures. Instead, they require numerical input vectors. The quality of these vectors directly impacts the performance of any downstream task, whether it's classifying nodes, predicting relationships, or analyzing entire graphs.

10.1 Manual node features

10.1.1 Degree

10.1.2 Triangles

10.1.3 Density

10.1.4 Geodesic (or shorter) path

10.1.5 Closeness

10.1.6 Betweenness

10.1.7 PageRank

10.1.8 Prediction

10.2 Manual relationship features

10.2.1 Node-based representation

10.2.2 Path-based features

10.2.3 Bonus Track: Leveraging LLMs for Graph Feature Engineering

10.3 Semi-automated features extraction

10.4 Summary

10.5 Reference